GESIS Training Courses

Scientific Coordination

Marlene Mauk
Tel: +49 221 47694-579

Administrative Coordination

Noemi Hartung

Automated Image and Video Data Analysis

Online via Zoom
Course duration:
9:00-16:00 CEST
General Topics:
Course Level:
Software used:
Students: 550 €
Academics: 825 €
Commercial: 1650 €
Additional links
Lecturer(s): Andreu Casas, Felicia Loecherbach

About the lecturer - Andreu Casas

About the lecturer - Felicia Loecherbach

Course description

Social scientists have long argued that images play a crucial role in shaping and reflecting political life. This role is heightened by the bombardment of images that people experience today through many communications channels, from television to social media. Digitization has both increased the presence of images in daily life and made it easier for scholars to access and collect large quantities of pictures. However, using images collected in observational settings as data for social science inference is an arduous task. Fortunately, recent innovations in computer vision, the subfield of computer science concerned with automated image analysis, can reduce the costs of using images as data.
In this course, we'll dig into the necessary theoretical and methodological expertise needed to apply machine learning methods to address social science questions based on image and video data. We will combine theoretical sessions where we'll discuss research using computer vision methods for the study of politics, communication, etc., with sessions where we'll cover in detail key methodological advances needed to fully understand state-of-the-art computer vision methods (deep learning, neural networks, convolutional neural networks, multimodal models, visual language models, etc.), as well as practical sessions where we'll go over several Python tutorials implementing different computer vision techniques, for image processing (e.g. splitting videos into analytical frames), object and face detection, image (supervised and unsupervised) classification, facial trait analysis, and multimodal modeling. In addition, we'll also have a session on cloud computing, providing students with an overview of the options available to them if they need to train and deploy computer vision models on large amounts of data, as well as concrete examples on how to use some particular cloud computing services.
Students with basic programming skills/experience in Python and some machine learning background will get the most out of the course. In the cloud computing session, we will also use some bash (terminal coding), but it will be very minimal and no prior knowledge is required. However, we'll also take the time to briefly review some key machine learning concepts necessary to implement machine learning methods, and students will be provided with clear and easy-to-follow sample code for each of the practical tutorials. By the end of the course, students will have a good understanding of the kind of research questions that can be answered using computer vision methods, as well as a good understanding of several techniques and how to apply them in their own research.
For additional details on the course and a day-to-day schedule, please download the full-length syllabus.
Organizational Structure of the Course
The course will be organized around three different types of sessions:
  • Lectures in which the instructors will present relevant literature, theory, concepts, and methods and discuss them with the students.
  • Tutorials in which the instructors will provide, run, and discuss sample code designed to implement different computer vision techniques. Students will also run the code on their own and can ask as many clarifying questions as needed.
  • Consulting sessions in which students will be able to discuss their own projects with the instructors, who will help them think through the different methods/resources needed and how to adapt the material/skills learned throughout the workshop for their own project.

Target group

You will find the course useful if:
  • you are a PhD student, early career scholar, industry professional, or generally interested in using computational methods to automatically analyze large quantities of video/image (or multimodal: e.g. text + image) data.

Learning objectives

By the end of the course, you will:
  • have a good overview of the existing images-as-data literature in the social sciences
  • have a good understanding of key deep learning concepts relevant for the implementation of computer vision methods
  • have a good understanding of several computer vision techniques (object and face detection/recognition, image classification, facial trait analysis, etc.)
  • have a good understanding of the many options and techniques available to store and compute visual data
  • be able to implement different computer vision techniques in Python
  • be able to use/adapt different computer vision techniques for their own research projects
  • be able to use/adapt different multimodal modeling approaches


  • basic programming skills/experience in Python (e.g. e.g. data loading, pandas data frames, loops and basic data operations)
  • basic machine learning knowledge (e.g. distinction between supervised and unsupervised learning, familiarity with the training process in machine learning - such as train/test/validation split, cross-validation, etc. - although these concepts will be reviewed in more detail during the course)
  • a Google account: we will use Google Colab in the course tutorials.
Software and Hardware Requirements
The course will use Google Colab, so participants need a Google account. There is no need to install Python or any Python packages locally.


Recommended readings