GESIS Training Courses

Wiss. Koordination

Sebastian E. Wenz
Tel: +49 (0)221-47694-166

Administrative Koordination

Loretta Langendörfer M.A.
Tel: +49 221 47694-143

Big Data Module I: Introduction to Data Science with Python

Dr. Florian Lemmerich, Dr. Philipp Singer, Dr. Fabian Flöck, Dr. Haiko Lietz

Datum: 17.07 - 21.07.2017 ics-Datei


Data Science is the interdisciplinary science of the extraction of interpretable and useful knowledge from digital datasets. Due to the rapid surge of digital trace data (often as “Big Data”) in a wide range of application areas, Data Science is also increasingly utilized in the social sciences and humanities. In contrast to empirical social science, Data Science methods often serve purposes of exploration and inductive inference. In this course, we aim to provide an introductory overview on the field of Data Science for practitioners. In particular, we want to impart basic understanding of the main methods and algorithms and understand how these can be deployed in practical application scenarios, focusing on the analysis of large behavioral data found on the Web. For that purpose, our schedule alternates between lecture sessions that present the theoretical and technical background of data analysis and practical sessions that allow participants to directly apply acquired knowledge with simple code in the Python programming language. We cover aspects of data collection, preprocessing, interactive exploration, regression analysis, hypothesis testing, machine learning, and network analysis using basic Python and key packages.


Participants will obtain profound knowledge about typical data types and structures encountered when dealing with behavioral traces from the Web, state-of-the art data analysis methods, and they will learn how this approach differs from those typically encountered in survey-based or experimental research. This will enable them to identify benefits and pitfalls of these methods in their field of interest and will, thus, allow them to select and appropriately apply data analysis and machine-learning methods for large datasets in their own research. The knowledge obtained in this course provides a starting point that enables participants to investigate specialized methods for their individual research projects.


Participants should be willing to study algorithmic approaches on abstract and applied levels. Previous knowledge on statistics and programming in Python or another programming language is advantageous, but not necessarily required. We would, however, recommend participants to familiarize themselves with the very basic concepts of Python such as variables, lists, and loops. Please note that participants have to bring their own laptop for this course. All utilized software is available without cost as open source under Windows, IOS, and Linux systems. Detailed installation instructions for the suggested development environments will be provided before the start of the course.


Referenteninformationen - Dr. Florian Lemmerich

Referenteninformationen - Dr. Philipp Singer

Referenteninformationen - Dr. Fabian Flöck

Referenteninformationen - Dr. Haiko Lietz