GESIS Training Courses

Scientific Coordination

Sebastian E. Wenz
Tel: +49 221 47694-159

Administrative Coordination

Jacqueline Schüller
Tel: +49 0221 47694-160

Course 10: Data Science Techniques for Survey Researchers

Cologne / Unter Sachsenhausen 6-8
Course duration:
Mo: 10:00-17:00 CEST
Tu-Fr: 9:00-16:00 CEST
General Topics:
Course Level:
Software used:
Students: 500 €
Academics: 750 €
Commercial: 1500 €
Additional links
Lecturer(s): Anna-Carolina Haensch

About the lecturer - Anna-Carolina Haensch

Course description

Please note: There is a trade fair in Cologne during this week. We recommend that you book your hotel accommodation early.
A variety of digital data sources are providing new avenues for empirical social science research. In order to effectively utilize these data for answering substantive research questions, a modern methodological toolkit paired with a critical perspective on data quality is needed. This course introduces state-of-the-art data science techniques that are suited for collecting and analyzing digital behavioral data, so-called "big data", and traditional survey data. In addition, aspects of data quality and error frameworks for digital (big) data sources are discussed.  
The course will cover the following topics and techniques:
  • Overview of Big Data: What is it and why does it matter?
  • Total Survey Error for Big Data
  • Web Scraping
  • Data quality for gathered data types
  • Sampling from online material
  • (Supervised) Machine Learning for Social Scientists
  • Working with textual data: Text Mining and Topic Models
    After the course, participants will have a profound understanding of important methods from the data science toolkit for collecting and analyzing the data types mentioned. They will be able to apply these methods and techniques in their research using statistical software.
    A detailed syllabus will soon be available for download here.

    Target group

    Participants will find the course useful if:
  • they are interested in learning some fundamental techniques in data science.
  • they want to collect and work with digital behavioral data, be it administrative data or data found online.
  • they want to understand what (supervised) machine learning is.

  • Learning objectives

    By the end of the course participants will:
  • understand the challenges when analyzing digital behavioural data.
  • know the promises and benefits of (supervised) machine learning.
  • be able to use (supervised) machine learning for data analysis.
  • be able to use common routines for analyzing textual data.
  • learn some of the metrics used to assess data quality for gathered data types.
    Organizational structure of the course
    The course is partly theoretical, partly practical. Each topic will be introduced in a lecture. The best way to deepen one´s understanding is with practical hands-on exercises. Files written in R Markdown will be provided to help participants execute the prepared scripts on their own computer and complete the assignments. The teacher will be available to assist and answer questions during the practical sessions.


  • General knowledge of statistics and statistical modelling (i.e., regression)
  • Prior experiences with syntax-based software (like R, Stata, or Python)
  • Some basic experience with programming in R is helpful, but not strictly necessary. For those without prior exposure to R, we will ensure everyone is able to execute R markdown files. Students without any R knowledge are encouraged to work through one or more R tutorials prior of the course. Some resources can be found here:
    Software and hardware requirements
    Course participants will need to bring a laptop with R ( and RStudio installed ( Both programs are free and open source. We will use R for the practical sessions. We will inform you a few days before the course starts about recommended steps to setup your system. You should be able to access the internet and install additional packages during the course (Wifi is provided by GESIS).