Scientific Coordination
Sebastian E. Wenz
Tel: +49 221 47694-159
Tel: +49 221 47694-159
Administrative Coordination
Jacqueline Schüller
Tel: +49 0221 47694-160
Tel: +49 0221 47694-160
Please wait...
Course 7: Missing Data and Multiple Imputation
About
Location:
Cologne / Unter Sachsenhausen 6-8
Cologne / Unter Sachsenhausen 6-8
General Topics:
Course Level:
Format:
Software used:
Duration:
Language:
Fees:
Students: 550 €
Academics: 825 €
Commercial: 1650 €
Keywords
Additional links
Lecturer(s): Florian Meinfelder, Doris Stingl
Course description
This course will provide an introduction to the theory and application of Multiple Imputation (MI) (Rubin 1987), which has become a very popular way of handling missing data because it allows for correct statistical inference in the presence of missing data. With the advent of MI algorithms implemented in standard statistical software (such as R, SAS, Stata, or SPSS), the method has become more accessible to data analysts. For didactic purposes, we will start the course by introducing some naive ways of handling missing data, and we will use the examination of their weaknesses to create an understanding of the MI framework. The first day of this course will be of a somewhat theoretical nature, as we believe that a fundamental understanding of the MI principle helps adapt to a wider range of practical problems, rather than focusing on only a few specific situations. We will subsequently shift to the more practical aspects of statistical analysis with missing data, and we will address frequent problems like regression with missing data. Further examples will also be covered throughout the course, and they will be predominantly based on the statistical programming language R. We recommend basic R skills for this course, but it is possible to understand the course contents without prior knowledge in R, as the main MI algorithms are almost identical across all major software packages.
For additional details on the course and a day-to-day schedule, please download the full-length syllabus.
Organizational structure of the course
A typical day will consists of three hours of classroom instruction and three hours of lab sessions. Since some of you might have been motivated to take the course because you had a missing data problem in your research, the lecturers will offer consultation slots during lab sessions. If the problem is straightforward to describe, the lecturers might offer to treat it as a model case for discussion in class.
Lab sessions will be based on R Markdown documents, which will be provided prior to the course. You are expected to work on the problems using R and relevant packages introduced in the course, either alone or in groups. The lecturers will provide guidance and can be contacted for questions.
Target group
You will find the course useful if:
Learning objectives
By the end of the course, you will:
Prerequisites
Software and hardware requirements
You will need to bring a laptop computer to successfully participate in this course.
Before the course, you should install R version 4.2.3 or higher (https://cran.r-project.org/) and RStudio (https://posit.co/download/rstudio-desktop/) or VS Code (https://code.visualstudio.com/) as IDE. These are free and open source.
For an introduction or refresher in R programming, you might consider enrolling in GESIS's two-day onsite course, “Introduction to R for Data Analysis” held in the first week of the Summer School in Cologne, or the four-day online workshop, “Introduction to R” offered in May