GESIS Training Courses

Scientific Coordination

Sabina Haveric
Tel: +49 (0221) 47694 - 166

Administrative Coordination

Claudia O'Donovan-Bellante
Tel: +49 621 1246-221

Workshop - Big Data: Introduction to Data Science with Python

Dr. Fabian Flöck, Dr. Arnim Bleier

Date: 16.09 - 18.09.2019 ics-file

Location: Mannheim B2,8 / Course language: English

About the lecturer - Dr. Fabian Flöck

About the lecturer - Dr. Arnim Bleier

Course description

Data Science is the interdisciplinary science of the extraction of interpretable and useful knowledge from potentially large datasets. Due to the rapid surge of digital trace data (often as “Big Data”) in a wide range of application areas, Data Science is also increasingly utilized in the social sciences and humanities. In contrast to empirical social science, Data Science methods often serve purposes of exploration and inductive inference. In this course, we aim to provide an introduction into Data Science for practitioners. In particular, we want to impart basic understanding of the main methods and algorithms and understand how these can be deployed in practical application scenarios, focusing on the analysis of digital behavioral data found on the Web. We cover aspects of data collection, preprocessing, exploration, visualization and machine learning using basic Python and key packages like pandas, numpy and scikit- learn.
We would like to call your attention to our symposium which will take place following the second workshop day, on Tuesday, 17th September 2019. The topic of the discussion will be "Legal Challenges of Web Scraping in the Data Science Context". The venue will be Mannheim University. Further information and the possibility to sign on will be given shortly.


Target group

The course is targeted at social scientists and researchers from the humanities who are interested in analyzing digital trace data.

Learning objectives

Participants will learn about typical data types and structures encountered when dealing with digital behavioral data, state-of-the art data analysis methods and tools in Python. This will enable them to identify benefits and pitfalls in their field of interest and will thus allow them to select and appropriately apply data analysis and machine-learning methods for large datasets in their own research. The knowledge obtained in this course provides a starting point for participants to investigate specialized methods for their individual research projects.


Participants should be willing to study algorithmic approaches on abstract and applied levels. Previous knowledge of (i) statistics as well as (ii) programming in Python, another programming language (like R, Java) or at least scripting language (Syntax-Code in SPSS, Stata) is very advantageous to follow the coursework. To ensure a common starting level between participants, it is mandatory for attendants to familiarize themselves with the most basic concepts of Python such as variables, lists, and loops via learning materials provided beforehand, which will be refreshed at the beginning of the course. Please note that participants have to bring their own laptop for this course. All utilized software is available without cost as open source under Windows, MacOS, and Linux systems. Detailed installation instructions for the suggested development environments will be provided before the start of the course.


Recommended readings

More Information