Scientific Coordination
Verena Kunz
Administrative Coordination
Janina Götsche
Please wait...
Automatic Sampling and Analysis of YouTube Data
About
Location:
Online via Zoom
Online via Zoom
General Topics:
Course Level:
Format:
Software used:
Duration:
Language:
Fees:
Students: 220 €
Academics: 330 €
Commercial: 660 €
Keywords
Additional links
Lecturer(s): Johannes Breuer, Rohangis Mohseni , Annika Deubel
Course description
YouTube is the largest and most popular video platform on the internet. The producers and users of YouTube content generate huge amounts of data. These data are also of interest to researchers (in the social sciences as well as other disciplines) for studying different aspects of online media use and communication. Accessing and working with these data, however, can be challenging. In this workshop, we will first discuss the potential of YouTube data for research in the social sciences and then introduce participants to the YouTube API as well as different tools for the automated collection of YouTube data. Our focus for the main part of the workshop will be on using R for collecting, processing, and analyzing data from YouTube (using various packages). Regarding the type of data, we will focus on user comments but also look into other YouTube data, such as video statistics and subtitles. For the comments, we will show how to clean and process them in R, how to deal with emojis, and how to do some basic forms of (semi-)automated text analysis (e.g., word frequencies, sentiment analysis). While we believe that YouTube data has great potential for research in the social sciences (and other disciplines), we will also discuss the challenges and limitations of these data.
Target group
Participants will find the course useful if:
- They want to work with YouTube data (esp. user comments) in their research.
Learning objectives
By the end of the course participants will:
Organizational structure of the course
The workshop is structured into segments of instructive lectures and interactive hands-on sessions. The lecturers will be available for support during hands-on segments and can also consult on participants' own (planned) research projects with YouTube data.
Prerequisites
Participants should have experience with using R. Specifically, they should be familiar with installing and loading packages, importing and processing data, as well as basic exploratory analyses in R. It is also helpful if participants have basic knowledge or some initial experience of/with working with text data and have at least heard about the tidyverse collection of packages and how they can be used for data wrangling.
Software requirements
R (at least version 4.0.0), RStudio, and the following R packages: remotes, tidyverse, tuber, vosonSML, quanteda, tm, qdapRegex, syuzhet, lexicon, subtools, stm, youtubecaption (optional)
Agenda
Wednesday, February 14th, 2024 | |
09:00 - 10:00 | Introduction: Why is YouTube data interesting for research? |
10:00 - 11:00 | The YouTube API |
11:00 - 11:15 | Coffee Break |
11:15 - 12:15 | Tools for collecting YouTube data |
12:15 - 13:15 | Lunch Break |
13:15 - 14:45 | Collecting YouTube data with R |
14:45 - 15:00 | Coffee Break |
15:00 - 16:30 | Processing and cleaning user comments |
Thursday, February 15th, 2024 | |
09:00 - 10:30 | Basic text analysis of user comments |
10:30 - 10:45 | Coffee Break |
10:45 - 12:15 | Sentiment analysis of user comments |
12:15 - 13:15 | Lunch Break |
13:15 - 14:45 | Excursus: Retrieving video subtitles |
14:45 - 15:00 | Coffee Break |
15:00 - 16:30 | Practice session, questions, and outlook |