GESIS Training Courses
user_jsdisabled
Search

Scientific Coordination

Verena Kunz

Administrative Coordination

Claudia O'Donovan-Bellante

Applied Machine Learning with R

About
Location:
Online via Zoom
 
General Topics:
Course Level:
Format:
Software used:
Duration:
Language:
Fees:
Students: 330 €
Academics: 495 €
Commercial: 990 €
Keywords
Additional links
Lecturer(s): Paul Bauer

About the lecturer - Paul Bauer

Course description

Please note: This workshop has two parts 1) a self learning phase (03.-07.02.2025) and 2) an online phase (11.-14.02.2025).
 
In today's rapidly evolving research landscape, machine learning has emerged as an indispensable tool, enabling unparalleled insights and efficiencies across diverse domains. Social scientists are studying the implications of the widespread use of machine learning for society, but they have also begun to use machine learning tools in their own research. This workshop introduces machine learning with R using the tidymodels framework, a collection of packages for predictive modeling using tidyverse principles. We will build, evaluate, compare, and tune predictive models. Thereby, we will start with classic models (e.g., linear, or logistic regression) but also discuss models that are particularly useful for the predictive context, which often involves many variables (e.g., random forests, LASSO). Along the way, we'll learn about key concepts in machine learning, including training, validation, and test data, overfitting, resampling, feature engineering, and tuning. Participants will gain knowledge about good predictive modeling practices, as well as hands-on experience using tidymodels packages like parsnip, rsample, recipes, yardstick, tune, and workflows. Finally, we will discuss how we can visualize and evaluate both the performance and fairness of machine learning models.
 
Self-learning phase
The self-learning phase takes place from 03.-07.02.2025. Participants will receive material and instructions for the self-learning phase at the start of the respective week. Self-learning sessions contain both thematic inputs in the form of explanatory readings, and/or coding examples, and practical exercises in the form of quizzes and/or individual assignments in which participants will work on examples. Participants are expected to thoroughly read the thematic inputs as well as to work through the exercises.
 
During this first part of the course, participants will learn how to explore a dataset in R visually and with descriptive statistics using newer R packages, understand why data exploration is a necessary step in building predictive models, get to know the datasets we use in this workshop (e.g., ESS data, COMPAS data), and understand the logic of recipes to preprocess data before building predictive models.


Target group

Regular R users who are interested in learning about machine learning concepts and models and applying machine learning in their own research.


Learning objectives

By the end of the course, participants will
 
  • understand key concepts underlying machine learning.
  • be able to interpret and evaluate machine learning models.
  • be able to critically assess model performance on different dimensions of quality.
  • be able to use various machine learning models for predictive and classification purposes.
  • have learned how to use the tidymodels framework for machine learning in R.
  • have learned how to evaluate and visualize model performance using packages like ggplot2 and Plotly.


Prerequisites

  • Understanding of R programming and RStudio, including key tidyverse packages like dplyr, tidyr, and ggplot2.
  • Familiarity with tidyverse data management functions (e.g., group_by, summarize, …), if necessary reinforced by DataCamp courses that will be organized for participants (e.g., Introduction to the Tidyverse/Dplyr: Chapter 1 - Data wrangling & 3 - Grouping and summarizing).
  • Basic knowledge of introductory statistics, including regression analysis (e.g. logistic regression) and frequentist methods.
  •  
    Software requirements
    The workshop will be based on the open-source programming language R. We follow the principles of 'Open Data,' 'Open Code,' and the integration of narrative text and code (no commercial software is needed). Please install R and RStudio before the workshop. Participants will receive an email with further installation instructions before the workshop (e.g., regarding required R packages).


    Schedule

    Recommended readings