GESIS Training Courses

Wiss. Koordination

Sebastian E. Wenz
Tel: +49 221 47694-166

Administrative Koordination

Loretta Langendörfer M.A.
Tel: +49 221 47694-143

Big Data Module I: Introduction to Data Science with Python

Dr. Arnim Bleier, Dr. Fabian Flöck, Dr. Florian Lemmerich, Dr. Haiko Lietz

Datum: 09.07 - 13.07.2018 ics-Datei

Referenteninformationen - Dr. Arnim Bleier

Referenteninformationen - Dr. Fabian Flöck

Referenteninformationen - Dr. Florian Lemmerich

Referenteninformationen - Dr. Haiko Lietz


Data Science is the interdisciplinary science of the extraction of interpretable and useful knowledge from digital datasets. Due to the rapid surge of digital trace data (often as “Big Data”) in a wide range of application areas, Data Science is also increasingly utilized in the social sciences and humanities. In contrast to empirical social science, Data Science methods often serve purposes of exploration and inductive inference. In this course, we aim to provide an introductory overview on the field of Data Science for practitioners. In particular, we want to impart basic understanding of the main methods and algorithms and understand how these can be deployed in practical application scenarios, focusing on the analysis of large behavioral data found on the Web. For that purpose, our schedule alternates between lecture sessions that present the theoretical and technical background of data analysis and practical sessions that allow participants to directly apply acquired knowledge with simple code in the Python programming language. We cover aspects of data collection, preprocessing, interactive exploration, regression analysis, hypothesis testing, machine learning, and network analysis using basic Python and key packages.



Participants will obtain profound knowledge about typical data types and structures encountered when dealing with behavioral traces from the Web, state-of-the art data analysis methods, and they will learn how this approach differs from those typically encountered in survey-based or experimental research. This will enable them to identify benefits and pitfalls of these methods in their field of interest and will, thus, allow them to select and appropriately apply data analysis and machine-learning methods for large datasets in their own research. The knowledge obtained in this course provides a starting point that enables participants to investigate specialized methods for their individual research projects.


Previous knowledge on (i) basic inference statistics (e.g., linear Regression, T-Tests),  and (ii) a programming or at least scripting language (e.g., R, Syntax-Code in SPSS, Stata) is very advantageous to follow the coursework. In any case, to ensure a common starting level between participants attendants will be asked to familiarize themselves with the most basic concepts of Python such as variables, lists, and loops via material that will be provided to all participants through the e-learning platform ILIAS beforehand. This material will be recapitulated briefly in the beginning of the course. 
Please note that participants have to bring their own laptop for this course. All utilized software is available without cost as open source under Windows, MacOS, and Linux systems. Detailed instructions for installing the needed software and doing the introductory exercises will be provided before the start of the course.



Weitere Informationen