GESIS Training Courses

Scientific Coordination

Verena Kunz

Administrative Coordination

Claudia O'Donovan-Bellante
Tel: +49 621 1246-221

Applied Machine Learning with R

Online via Zoom
General Topics:
Course Level:
Software used:
Students: 330 €
Academics: 495 €
Commercial: 990 €
Additional links
Lecturer(s): Paul Bauer

About the lecturer - Paul Bauer

Course description

In today's rapidly evolving research landscape, machine learning has emerged as an indispensable tool, enabling unparalleled insights and efficiencies across diverse domains. Social scientists are studying the implications of the widespread use of machine learning for society, but they have also begun to use machine learning tools in their own research. This workshop introduces machine learning with R using the tidymodels framework, a collection of packages for predictive modeling using tidyverse principles. We will build, evaluate, compare, and tune predictive models. Thereby, we will start with classic models (e.g., linear, or logistic regression) but also discuss models that are particularly useful for the predictive context, which often involves many variables (e.g., random forests, LASSO). Along the way, we'll learn about key concepts in machine learning, including training, validation, and test data, overfitting, resampling, and feature engineering. Participants will gain knowledge about good predictive modeling practices, as well as hands-on experience using tidymodels packages like parsnip, rsample, recipes, yardstick, tune, and workflows. Participants will also learn how to access Python-based models from within R. Finally, we will discuss how we can visualize and evaluate both the accuracy and fairness of ML models.

Target group

Regular R users who are interested in learning about machine learning concepts and models and using machine learning for their own research.

Learning objectives

By the end of the course, participants will
  • understand key concepts underlying machine learning.
  • be able to interpret and evaluate machine learning models.
  • be able to critically assess model performance on different dimensions of quality.
  • be able to use various machine learning models for predictive and classification purposes.
  • have learned how to use the tidymodels framework for machine learning in R.
  • have learned how to evaluate and visualize model performance using packages like ggplot2 and Plotly.


  • Understanding of R programming and RStudio, including key packages like dplyr, tidyr, and ggplot2.
  • Familiarity with tidyverse data management functions (e.g., group_by, summarize, …), if necessary reinforced by   DataCamp courses that will be organized for participants (e.g., Introduction to the Tidyverse/Dplyr: Chapter 1 - Data          wrangling & 3 - Grouping and summarizing).
  • Experience with introductory statistics, including regression analysis and frequentist methods.
  • Basic knowledge of introductory statistics, including regression analysis and logistic regression.
    Software requirements
    The workshop will be based on the open-source programming language R. We follow the principles of 'Open Data,' 'Open Code,' and the integration of narrative text and code (no commercial software is needed). Please install R and RStudio before the workshop. Participants will receive an email with further installation instructions (e.g., regarding required R packages).


    Recommended readings