Scientific Coordination
Dr.
Marlene Mauk
Tel: +49 221 47694-579
Marlene Mauk
Tel: +49 221 47694-579
Administrative Coordination
Claudia O'Donovan-Bellante
Tel: +49 621 1246-221
Tel: +49 621 1246-221
Please wait...
Automated Web Data Collection with R
About
Location:
Mannheim B6, 4-5
Mannheim B6, 4-5
General Topics
Course Level
Format
Software used
Duration
Language
Fees
Students: 500 €
Academics: 750 €
Commercial: 1500 €
Keywords
Additional links
Lecturer(s): Dr. Theresa Gessler, Dr. Hauke Licht
Course description
The increasing availability of large amounts of data on the internet enables new lines of research in the social sciences. Although it has become easier to find data online that is relevant to social science research, such as social media content, election results, or organizations' press statements, extracting these data and bringing it into formats ready for downstream analyses can be challenging. Web data collection is thus an essential skill for researchers.
The goal of this course is to enable participants to collect web data and process it in R for their research. Course participants will learn about the characteristics of web data and their use in social science research, how to harvest content from different types of webpages, and how to collect social media data from application programming interfaces (APIs), such as the Twitters API.
We will cover tools and techniques that enable participants to collect web data relevant to their research and focus on two common scenarios in particular: (i) automating the collection of data presented on multiple web pages (e.g., several pages) of both static and dynamic websites (with RSelenium), and (ii) interacting with APIs to, for example, collect social media data or datasets from institutions, companies, and organizations. In addition, we will cover advanced topics such as using web sessions, interacting with HTML forms (e.g., login), managing user agents, error handling, and headless browsing.
The course is hands-on, with daily lectures followed by exercises where participants can practice their newly learned skills.
For additional details on the course and a day-to-day schedule, please download the full-length syllabus.
Target group
Participants will find the course useful if they want to:
Learning objectives
By the end of the course participants will:
Organizational structure of the course
The course will be organized as a mixture of lectures (morning sessions) and exercises (afternoon sessions). In the lectures, we will focus on explaining core concepts and methods in web scraping. In exercise sessions, participants will apply their newly acquired knowledge while the instructors will be available for individual consultations and support work on assignments.
Prerequisites
Software and hardware requirements
Participants should bring their own laptops and pre-install the following software/packages:
Recommended related courses