Tel: +49 621 1246-221
Tel: +49 621 1246-221
Introduction to Using Social Media Data for Research: Potentials and Pitfalls
05.12 - 06.12.2022
05.12 - 06.12.2022
Online via Zoom
Online via Zoom
Student: 250 €
Akademisch: 375 €
Kommerziell: 750 €
Lecturer(s): Indira Sen, Dr. Katrin Weller
Please note: There is an additional session on the 12th - 13th December 2022. Please check the schedule for further information.
In this workshop, we provide an introductory overview of the possibilities and limitations of using data collected from social media platforms for research, structured along a theoretical framework and illustrated with practical examples.
On social media platforms, the activities and interactions of hundreds of millions of people worldwide are recorded as digital traces, for example, on websites like Facebook, Twitter, Instagram, Reddit, and more. To researchers across various disciplines, these data offer increasingly comprehensive pictures of both individuals and groups on different platforms but also allow inferences about broader target populations beyond those platforms. Notwithstanding the many potentials, this new type of data is accompanied by challenges. Therefore, studying the errors that can occur when digital traces are used to learn about humans and social phenomena is essential. With this workshop, we want to equip researchers new to working with social media with some structured guidance for better determining the limits of specific research ideas.
For this, we combine theory, data, and methods to demonstrate both the pitfalls and potentials of digital traces from social media users. The theoretical part is based on the idea of using Error Frameworks in the study design process. We will be using an error framework tailored to the specifics of digital traces collected from social media and online platforms (Sen et al., 2021), that is based on and inspired by concepts and guidelines of the Total Survey Error Framework (TSE) used by survey researchers and practitioners in the social sciences. Both the TSE and our adaptation to the specific characteristics of social media data will help to diagnose, understand, and avoid errors that may occur in studies that are based on digital traces of humans from the web.
To help understand the utility of the error framework for digital traces, we apply it to diagnose and document errors in existing computational social science. During interactive parts of the workshop, participants will learn to apply the error framework to hypothetical research scenarios (illustratively using an example dataset provided by us). Participants are invited to also propose their own case studies before or during the workshop so that the group may jointly explore the potentials and limitations of these and help to advance the research idea of the participants.
The workshop is structured along a prototypical research workflow consisting of study design, data collection, preprocessing, and analysis. For these steps, we will also provide practical hands-on exercises building on the example datasets. For the hands-on part, we will provide participants with examples of Python code that can be run in execution environments such as Google Colab. Please note though that this is not a full programming course - we will jointly work through the hands-on examples to give participants a general understanding of the processes needed for collecting, processing, and analyzing (mainly textual) social media datasets. Examples may for example include the collection of posts from Reddit and analyzing the sentiments in them.
Our target audience is researchers from across disciplines and at all career levels who want to learn how to more systematically assess the potentials and limitations of social media research. Specifically, we think of researchers
This course focuses on working with textual data collected from social media platforms.
By the end of the course participants will:
Organizational structure of the course
The workshop will be organized around two parts, each of them happening on two half-days over two weeks.
In the first part, we start with a general introduction to current approaches in social media research and to types of data used for research, including hands-on insights into example datasets. Furthermore, we will demonstrate exemplary automated options available to conduct computational social science research, especially providing examples for collecting, preprocessing, and analyzing (textual) data.
In the second part, we will introduce the audience to a conceptual framework that helps to identify potential sources of errors in digital trace-based research, organized by the different phases in a research process such as data collection, data preprocessing, and data analysis. Participants are invited to share their own research ideas in order to improve their study design.
Both parts will include interactive elements. Participants will run example code to learn about the potential of approaches for collecting and analyzing social media data. We will also include group work sessions to sketch out hypothetical research designs and jointly reflect on their potential and limitations.
The course is suitable for researchers new to working with social media data and computational methods. This is not a programming course, but we will use some hands-on examples based on Python code that should illustrate some of the chances and challenges in social media research and should give a basic understanding of how automatic data collection, processing, and analysis may look like. Example code can be run in execution environments such as Google Colab and we will provide some information on how to get these environments set up prior to the course.
Some first prior experience with Python and/or programming basics in another language is beneficial for interacting with the hands-on examples.
This workshop will use Python executed through Google Colab environments, which participants can access through their web browser. Information on how to prepare the setup will be sent to participants prior to the workshop.