Tel: +49 621 1246-221
Tel: +49 621 1246-221
Please note that due to the current situation, participation in our on-site training events is restricted. You can find further information here.
Topic Modeling in R
Dr. Wouter van Atteveldt, Dr. Kasper Welbers
Date: 11.11 - 13.11.2020 ics-file
Location: Online via Zoom / Course language: English
Attention: Due to the currently constantly changing situation due to Covid-19, there is a possibility that the course may have to be held online at short notice! We would inform you about any changes two weeks in advance.
This workshop will give an introduction to topic modeling using R. The first day of the workshop we will introduce R, Rstudio, and tidyverse. This day can be safely skipped by students or researchers with experience in using R.
In the second day, we will introduce topic modeling and the principles of automatic text analysis and topic modeling. We will explain the basic assumptions of bag-of-words analysis, unsupervised clustering, and the dirichlet distribution. We will use the quanteda and topicmodels packages for doing the analyses and LDAviz and corpustools for visualization and validation.
The third and final day we will first look in depth at how fitting an LDA model with Gibbs sampling actually works and look at the various parameters and choices. We will also look at linguistic preprocessing using the spacy package. Finally, we will introduce alternative topic models, from Dynamic and Correlated topic models to Structural Topic Models. We will use the stm package to show how to estimate a structural topic model with time or source as covariates, and show how to analyse and interpret the results.
Students participating in the first day will learn the basics of R. All students will understand the principles and working of topic modeling and (unsupservised) text analysis in general. Students will be able to use R for running LDA and Structural Topic Models, and interpret and visualize the results.
No specific prior knowledge is required, but a basic knowledge of math and statistics will help understand the algorithms. Participants without knowledge of R are strongly advised to install R and RStudio beforehand and make themselves familiar with the software. All participants are advised to browse through chapters 9-16 of R4DS (https://r4ds.had.co.nz/.).