Collecting Social Media Data with the Twitter API
27.03 - 29.03.2023
27.03 - 29.03.2023
Online via Zoom
Online via Zoom
Students: 200 €
Academics: 300 €
Commercial: 600 €
Lecturer(s): Dennis Assenmacher, Leon Fröhling
Social media platforms are central hubs of public discourse and are thus not only subject to various data-centric research endeavors but are also valuable information sources for companies (e.g., customer information, market research). A central prerequisite to conducting analyses on these platforms is access to the data created by their users. In this course, we focus on one specific social media platform that proliferated itself as being quite open to third-party access: Twitter. Twitter has a well-documented and generally well-running API (short for Application Programming Interface) through which they grant researchers access to their data in a structured and lightweight format, distinguishing them from many of the other social media platforms.
Participants of this workshop learn (a) what types of data can be collected from Twitter's different API endpoints, (b) how those different data types can be fetched, processed, and stored efficiently, considering factors like rate limits and quotas, and (c) how the data can be analyzed exploratively. Additionally, we discuss Twitter's Terms of Service, focusing on how the collected data can be used and shared with peers. In this context, we give insights into the process of data rehydration.
The course will include lectures on each topic, introducing the basic theoretical concepts of working with APIs, conducting preprocessing steps to prepare the data, and analyzing the data on the content and account level. To be flexible and account for different programming backgrounds, the course is inherently designed to be bilingual; the practical parts of the lectures and the exercises can be followed either in Python or in R.
Each topic is covered by a dedicated lecture that explains the basic theoretical concepts and thus provides the necessary background for working with the Twitter API. The theory is then put into practice during the following sessions in which the instructors present and discuss code snippets to automate the data collection process. Participants may then practice their new skills during exercises in small groups. The instructors will be available for questions and discussions during the exercises. Participants will also be able to discuss their research ideas and projects and how they may best apply the newly learned methods.
Please note that due to the current developments surrounding Twitter it is possible that this workshop will take place in a slightly different form or has to be canceled in case the API access ceases to exist. Registered participants will be informed of any changes in due course and will be refunded the full course fee if the workshop must be canceled.
Participants will find the course useful if:
By the end of the course participants will:
Organizational structure of the course
The workshop will be structured around the different endpoints of the Twitter API and the available data types. We will discuss some theoretical details and characteristics for each endpoint, jointly going through the relevant documentation. With rate limits and quotas in mind, we will present possible collection strategies and show how they can be implemented. Based on the code examples we provide, participants are encouraged to test their understanding of the API and the implementation of the collection strategies during the exercises.
The presentation of the different endpoints and the theoretical discussions will be in a more traditional lecture style. The implementations of the collection strategies will be prepared in Jupyter Notebooks. We will comment on the code necessary to automatically collect data from the different Twitter API endpoints while simultaneously executing the different steps in the Jupyter Notebooks, allowing for questions and discussions of the implementation and its logic.
We will split the participants into smaller focus groups for the exercises and send them to break-out rooms, where they can work on the prepared questions and problems together or on their own. The instructors will jump from break-out room to break-out room to answer questions and offer hints for the exercises.
To get access to the API, a Twitter Developer Account is required - you may follow the instructions here to create one. This gives you “Essential access”. If you would like to get the full Twitter API experience, you may apply for the “Academic Research Access” from within your Developer Account - note, however, that this includes writing a brief description of your project, and that access is subject to Twitter's vetting and approval of the project. This takes some time, so make sure to apply for it in advance (approx. 1-2 month(s) before the workshop at the latest). To be able to follow the workshop, “Essential access” is sufficient.
We will use Jupyter Notebooks in Google Colab, which requires a Google account and can be used for Python and R (to use R in Google Colab, open https://colab.to/r with a browser in which you are logged in to a Google account). We will provide pre-course information on how to set up everything. Note, however, that there is no specific time allotted to setting up or fixing programming environments during the workshop. Therefore, please contact the instructors in advance if you have any questions or problems with setting up your working environment for the course following the pre-course instructions.
This includes, but is not limited to, the knowledge of essential objects like lists and dictionaries, the ability to read in and save to different data formats, and a basic understanding of if-conditions and for-loops.