GESIS Training Courses

Scientific Coordination

Sebastian E. Wenz
Tel: +49 221 47694-159

Administrative Coordination

Jacqueline Schüller
Tel: +49 0221 47694-160

Course 8: Data Science Techniques for Survey Researchers

Cologne/Unter Sachsenhausen 6-8
Course Duration
Mo: 10:00-17:00 CEST
Tu-Fr: 9:00 - 16:00 CEST
General Topics:
Course Level:
Software used:
Students: 550 €
Academics: 825 €
Commercial: 1650 €
Additional links
Lecturer(s): Anna-Carolina Haensch

About the lecturer - Anna-Carolina Haensch

Course description

A variety of digital data sources are providing new avenues for empirical social science research. In order to effectively utilize these data for answering substantive research questions, a modern methodological toolkit paired with a critical perspective on data quality is needed. This course introduces state-of-the-art data science techniques that are suited for collecting and analyzing digital behavioral data, so-called "big data", and traditional survey data. In addition, aspects of data quality and error frameworks for digital (big) data sources are discussed.  
The course will cover the following topics and techniques:
  • Overview of Big Data: What is it and why does it matter?
  • Total Survey Error for Big Data
  • Web Scraping
  • Machine Learning for Social Scientists
  • Regularized regression
  • Tree-based methods
  • Support vector machines
  • LLMs
  • After the course, you will have a profound understanding of important methods from the data science toolkit for collecting and analyzing the data types mentioned. You will be able to apply these methods and techniques in your research using statistical software.
    For additional details on the course and a day-to-day schedule, please download the full-length syllabus.

    Target group

    You will find the course useful if:
    • you are interested in learning some fundamental techniques in data science,
    • you want to collect and work with digital behavioral data, be it administrative data or data found online,
    • you want to understand what machine learning is.

    Learning objectives

    By the end of the course, you will:
    • understand the challenges when analyzing digital behavioral data,
    • know the promises and benefits of (supervised) machine learning,
    • be able to use (supervised) machine learning for data analysis,
    • learn some of the metrics used to assess data quality for gathered data types.
    Organizational structure of the course
    The course is partly theoretical, partly practical. Each topic will be introduced in a lecture. The best way to deepen one´s understanding is with practical hands-on exercises. Files written in R Markdown will be provided to help you execute the prepared scripts on your own computer and complete the assignments. The teacher will be available to assist and answer questions during the practical sessions.


    • General knowledge of statistics and statistical modelling (i.e., regression)
    • Prior experiences with syntax-based software (like R, Stata, or Python)
    Some basic experience with programming in R is very helpful, but not strictly necessary. For those without prior exposure to R, we will ensure everyone is able to execute R markdown files. If you have no previous R knowledge, we encourage you to work through one or more R tutorials prior of the course. Some resources can be found here:
    Software and hardware requirements
    You will need to bring a laptop with R ( and RStudio installed ( to successfully participate in this course.
    Both programs are free and open source. We will use R for the practical sessions. We will inform you a few days before the course starts about recommended steps to set up your system. You should be able to access the internet and install additional packages during the course. Wi-Fi is provided by GESIS.