GESIS Training Courses

Scientific Coordination

Sebastian E. Wenz
Tel: +49 221 47694-159

Administrative Coordination

Loretta Langendörfer M.A.
Tel: +49 221 47694-143

Introduction to Data Science with Python

Dr. Arnim Bleier, Dr. Fabian Flöck, Dr. Juhi Kulshrestha

Date: 16.09 - 18.09.2020 ics-file

Location: Online via Zoom

About the lecturer - Dr. Arnim Bleier

About the lecturer - Dr. Fabian Flöck

About the lecturer - Dr. Juhi Kulshrestha

Course description

[This is a 18 hour class.]
Data Science is the interdisciplinary science of the extraction of interpretable and useful knowledge from potentially large datasets. Due to the rapid surge of digital trace data in a wide range of application areas, data science is also increasingly utilized in the social sciences and humanities. In contrast to empirical social science, data science methods often serve purposes of exploration and inductive inference. In this course, we aim to provide an introduction into data science for practitioners with Python. In particular, we want to impart basic understanding of the main methods and algorithms and understand how these can be deployed in practical application scenarios, focusing on the analysis of digital behavioral data (or "digital traces of humans") found on the Web. For that purpose, our schedule alternates between lecture sessions that present the theoretical and technical background of data analysis and practical sessions that allow participants to directly apply acquired knowledge with code in the Python programming language. We cover foundational aspects of data collection, visualization, and machine learning, using basic Python and key packages like pandas, numpy and scikit-learn. Data used will cover a broad array of sources, from "native Web" data such as Social Media data to more "traditional" survey data.

Target group

Participants will obtain profound knowledge about typical data types and structures encountered when dealing with digital behavioral data, state-of-the art data analysis methods and tools in Python, and they will learn how this approach differs from those typically encountered in survey-based or experimental research. This will enable them to identify benefits and pitfalls of these data types and methods in their field of interest and will thus allow them to select and appropriately apply data analysis and machine-learning methods for large datasets in their own research. The knowledge obtained in this course provides a starting point for participants to investigate specialized methods for their individual research projects.


Participants should be willing to study algorithmic approaches on abstract and applied levels. The minimum requirements for attending this course are a functional knowledge of Python and Pandas. We expect the participants to be familiar with exploring preprocessing data using Python and to have working knowledge of Python data structures like lists, dictionaries and Pandas data frames. If the participants are unfamiliar with these basic concepts of Python programming, we strongly recommend them to attend the course "Introduction to Python for Social Scientists" in preparation. Some previous knowledge of statistics would also be highly beneficial. Detailed installation instructions on how to access the development environment will be provided before the start of the course.


More Information