GESIS Training Courses

Scientific Coordination

Kathrin Busch
Tel: +49 221 47694-226

Administrative Coordination

Loretta Langendörfer M.A.
Tel: +49 221 47694-143

Big Data Module I: Introduction to Data Science with Python

Dr. Arnim Bleier, Dr. Fabian Flöck, Dr. Juhi Kulshrestha

About the lecturer - Dr. Arnim Bleier

About the lecturer - Dr. Fabian Flöck

About the lecturer - Dr. Juhi Kulshrestha

Course description

Data Science is the interdisciplinary science of the extraction of interpretable and useful knowledge from potentially large datasets. Due to the rapid surge of digital trace data (often as “Big Data”) in a wide range of application areas, Data Science is also increasingly utilized in the social sciences and humanities. In contrast to empirical social science, Data Science methods often serve purposes of exploration and inductive inference. In this course, we aim to provide an introduction into Data Science for practitioners. In particular, we want to impart basic understanding of the main methods and algorithms and understand how these can be deployed in practical application scenarios, focusing on the analysis of digital behavioral data found on the Web. For that purpose, our schedule alternates between lecture sessions that present the theoretical and technical background of data analysis and practical sessions that allow participants to directly apply acquired knowledge with code in the Python programming language. We cover aspects of data collection, preprocessing, exploration, visualization, and machine learning, using basic Python and key packages like pandas, numpy and scikit-learn. Data used will cover a large array of sources, from "native Web" data such as Social Media data to more "traditional" survey data.


Learning objectives

Participants will obtain profound knowledge about typical data types and structures encountered when dealing with digital behavioral data, state-of-the art data analysis methods and tools in Python, and they will learn how this approach differs from those typically encountered in survey-based or experimental research. This will enable them to identify benefits and pitfalls of these data types and methods in their field of interest and will thus allow them to select and appropriately apply data analysis and machine-learning methods for large datasets in their own research. The knowledge obtained in this course provides a starting point for participants to investigate specialized methods for their individual research projects.


Participants should be willing to study algorithmic approaches on abstract and applied levels. Some previous knowledge on (i) statistics as well as (ii) programming in Python, another programming language (like R, Java) or at least scripting language (Syntax-Code in SPSS, Stata) is very advantageous to follow the coursework - otherwise the learning curve will be quite steep. To ensure a common starting level between participants, we expect attendants to familiarize themselves with the basic concepts of Python such as variables, lists, and loops via provided learning materials beforehand, which will be refreshed at the beginning of the course. Please note that participants have to bring their own laptop for this course, with all necessary software pre-installed. All utilized software is available without cost as open source under Windows, MacOS, and Linux systems. Detailed installation instructions for the suggested development environments will be provided before the start of the course.