GESIS Training Courses

Scientific Coordination

Verena Kunz

Administrative Coordination

Claudia O'Donovan-Bellante
Tel: +49 621 1246-221

Introduction to Using Social Media Data for Research: Potentials and Pitfalls

Online via Zoom
General Topics
Student: 250 €
Akademisch: 375 €
Kommerziell: 750 €
Additional links
Lecturer(s): Indira Sen, Dr. Katrin Weller

About the lecturer - Indira Sen

About the lecturer - Dr. Katrin Weller

Course description

Please note: There is an additional session on the 12th - 13th December 2022. Please check the schedule for further information.
In this workshop, we provide an introductory overview of the possibilities and limitations of using data collected from social media platforms for research, structured along a theoretical framework and illustrated with practical examples.
On social media platforms, the activities and interactions of hundreds of millions of people worldwide are recorded as digital traces, for example, on websites like Facebook, Twitter, Instagram, Reddit, and more. To researchers across various disciplines, these data offer increasingly comprehensive pictures of both individuals and groups on different platforms but also allow inferences about broader target populations beyond those platforms. Notwithstanding the many potentials, this new type of data is accompanied by challenges. Therefore, studying the errors that can occur when digital traces are used to learn about humans and social phenomena is essential. With this workshop, we want to equip researchers new to working with social media with some structured guidance for better determining the limits of specific research ideas.
For this, we combine theory, data, and methods to demonstrate both the pitfalls and potentials of digital traces from social media users. The theoretical part is based on the idea of using Error Frameworks in the study design process. We will be using an error framework tailored to the specifics of digital traces collected from social media and online platforms (Sen et al., 2021), that is based on and inspired by concepts and guidelines of the Total Survey Error Framework (TSE) used by survey researchers and practitioners in the social sciences. Both the TSE and our adaptation to the specific characteristics of social media data will help to diagnose, understand, and avoid errors that may occur in studies that are based on digital traces of humans from the web.
To help understand the utility of the error framework for digital traces, we apply it to diagnose and document errors in existing computational social science. During interactive parts of the workshop, participants will learn to apply the error framework to hypothetical research scenarios (illustratively using an example dataset provided by us). Participants are invited to also propose their own case studies before or during the workshop so that the group may jointly explore the potentials and limitations of these and help to advance the research idea of the participants.
The workshop is structured along a prototypical research workflow consisting of study design, data collection, preprocessing, and analysis. For these steps, we will also provide practical hands-on exercises building on the example datasets. For the hands-on part, we will provide participants with examples of Python code that can be run in execution environments such as Google Colab. Please note though that this is not a full programming course - we will jointly work through the hands-on examples to give participants a general understanding of the processes needed for collecting, processing, and analyzing (mainly textual) social media datasets. Examples may for example include the collection of posts from Reddit and analyzing the sentiments in them.

Target group

Our target audience is researchers from across disciplines and at all career levels who want to learn how to more systematically assess the potentials and limitations of social media research. Specifically, we think of researchers
  • … who have some prior experience in survey research and want to extend their knowledge on how digital behavioral data might be suitable additional data sources for their research questions
  • … who are in the process of planning a research study based on data from social media platforms and want to assess or improve their research design
  • … who have already worked with digital traces and social media and want to learn about more systematic ways to critically reflect on research designs and their limitations.
  • This course focuses on working with textual data collected from social media platforms.

    Learning objectives

    By the end of the course participants will:
  • know typical scenarios for research based on digital trace data from the web and will be better equipped to judge the potential of social media data for specific research ideas
  • have a basic understanding of collecting, processing, and analyzing social media data
  • have access to structured resources that help them to critically reflect on research design in social media or web data-based studies
  • know how to systematically spot and document errors in their studies
    Organizational structure of the course
    The workshop will be organized around two parts, each of them happening on two half-days over two weeks.
    In the first part, we start with a general introduction to current approaches in social media research and to types of data used for research, including hands-on insights into example datasets. Furthermore, we will demonstrate exemplary automated options available to conduct computational social science research, especially providing examples for collecting, preprocessing, and analyzing (textual) data.
    In the second part, we will introduce the audience to a conceptual framework that helps to identify potential sources of errors in digital trace-based research, organized by the different phases in a research process such as data collection, data preprocessing, and data analysis. Participants are invited to share their own research ideas in order to improve their study design.
    Both parts will include interactive elements. Participants will run example code to learn about the potential of approaches for collecting and analyzing social media data. We will also include group work sessions to sketch out hypothetical research designs and jointly reflect on their potential and limitations.


    The course is suitable for researchers new to working with social media data and computational methods. This is not a programming course, but we will use some hands-on examples based on Python code that should illustrate some of the chances and challenges in social media research and should give a basic understanding of how automatic data collection, processing, and analysis may look like. Example code can be run in execution environments such as Google Colab and we will provide some information on how to get these environments set up prior to the course.
    Some first prior experience with Python and/or programming basics in another language is beneficial for interacting with the hands-on examples.
    Software requirements
    This workshop will use Python executed through Google Colab environments, which participants can access through their web browser. Information on how to prepare the setup will be sent to participants prior to the workshop.


    Recommended readings