GESIS Training Courses
user_jsdisabled
Search

Scientific Coordination

Verena Kunz

Administrative Coordination

Janina Götsche

Collecting Social Media Data with the Twitter API

About
Location:
Online via Zoom
 
General Topics:
Course Level:
Format:
Software used:
Duration:
Language:
Fees:
Students: 200 €
Academics: 300 €
Commercial: 600 €
 
Keywords
Additional links
Lecturer(s): Dennis Assenmacher, Leon Fröhling

About the lecturer - Dennis Assenmacher

About the lecturer - Leon Fröhling

Course description

Social media platforms are central hubs of public discourse and are thus not only subject to various data-centric research endeavors but are also valuable information sources for companies (e.g., customer information, market research). A central prerequisite to conducting analyses on these platforms is access to the data created by their users. In this course, we focus on one specific social media platform that proliferated itself as being quite open to third-party access: Twitter. Twitter has a well-documented and generally well-running API (short for Application Programming Interface) through which they grant researchers access to their data in a structured and lightweight format, distinguishing them from many of the other social media platforms.  
 
Participants of this workshop learn (a) what types of data can be collected from Twitter's different API endpoints, (b) how those different data types can be fetched, processed, and stored efficiently, considering factors like rate limits and quotas, and (c) how the data can be analyzed exploratively. Additionally, we discuss Twitter's Terms of Service, focusing on how the collected data can be used and shared with peers. In this context, we give insights into the process of data rehydration.
 
The course will include lectures on each topic, introducing the basic theoretical concepts of working with APIs, conducting preprocessing steps to prepare the data, and analyzing the data on the content and account level. To be flexible and account for different programming backgrounds, the course is inherently designed to be bilingual; the practical parts of the lectures and the exercises can be followed either in Python or in R.
Each topic is covered by a dedicated lecture that explains the basic theoretical concepts and thus provides the necessary background for working with the Twitter API. The theory is then put into practice during the following sessions in which the instructors present and discuss code snippets to automate the data collection process. Participants may then practice their new skills during exercises in small groups. The instructors will be available for questions and discussions during the exercises. Participants will also be able to discuss their research ideas and projects and how they may best apply the newly learned methods.
 
Please note that due to the current developments surrounding Twitter it is possible that this workshop will take place in a slightly different form or has to be canceled in case the API access ceases to exist. Registered participants will be informed of any changes in due course and will be refunded the full course fee if the workshop must be canceled.


Target group

Participants will find the course useful if:
  • They want to learn about using the Twitter API to automate the collection of Twitter data
  • They want to study the characteristics of Twitter, for example, the discourse on certain topics or the follower networks of its users
  • They want to learn about the characteristics of Tweets and the language on Twitter and how to account for these peculiarities during preprocessing


  • Learning objectives

    By the end of the course participants will:
  • Know about the different types of data available from the different API endpoints, and how to access them
  • Be able to automatically collect large amounts of Tweets from the Twitter archive (via the Historic Search Endpoint) and process and store them in different file formats
  • Be able to automatically collect Follower Networks of Twitter users
  • Know about the characteristics and specificities of texts on Twitter (including hashtags, URLs, and emojis) and be able to process and prepare them for analysis
  • Understand the limitations of the Twitter API as well as Twitter's Terms of Service and the community's data sharing practices
  •  
    Organizational structure of the course
    The workshop will be structured around the different endpoints of the Twitter API and the available data types. We will discuss some theoretical details and characteristics for each endpoint, jointly going through the relevant documentation. With rate limits and quotas in mind, we will present possible collection strategies and show how they can be implemented. Based on the code examples we provide, participants are encouraged to test their understanding of the API and the implementation of the collection strategies during the exercises.
     
    The presentation of the different endpoints and the theoretical discussions will be in a more traditional lecture style. The implementations of the collection strategies will be prepared in Jupyter Notebooks. We will comment on the code necessary to automatically collect data from the different Twitter API endpoints while simultaneously executing the different steps in the Jupyter Notebooks, allowing for questions and discussions of the implementation and its logic.
     
    We will split the participants into smaller focus groups for the exercises and send them to break-out rooms, where they can work on the prepared questions and problems together or on their own. The instructors will jump from break-out room to break-out room to answer questions and offer hints for the exercises.


    Prerequisites

  • Twitter Access Tokens - Absolutely necessary to follow the practical parts of the workshop, as the instructors cannot provide you with access tokens!
    To get access to the API, a Twitter Developer Account is required - you may follow the instructions here to create one. This gives you “Essential access”. If you would like to get the full Twitter API experience, you may apply for the “Academic Research Access” from within your Developer Account - note, however, that this includes writing a brief description of your project, and that access is subject to Twitter's vetting and approval of the project. This takes some time, so make sure to apply for it in advance (approx. 1-2 month(s) before the workshop at the latest). To be able to follow the workshop, “Essential access” is sufficient.
  • A working programming environment
    We will use Jupyter Notebooks in Google Colab, which requires a Google account and can be used for Python and R (to use R in Google Colab, open https://colab.to/r with a browser in which you are logged in to a Google account). We will provide pre-course information on how to set up everything. Note, however, that there is no specific time allotted to setting up or fixing programming environments during the workshop. Therefore, please contact the instructors in advance if you have any questions or problems with setting up your working environment for the course following the pre-course instructions.
  • Basic Python or R programming skills.
  • This includes, but is not limited to, the knowledge of essential objects like lists and dictionaries, the ability to read in and save to different data formats, and a basic understanding of if-conditions and for-loops.
     
    Software requirements
  •  Twitter API Access Tokens (see “Course Prerequisites” above)
  • A working programming (R/Python) environment in Google Colab.


  • Schedule