GESIS Training Courses
user_jsdisabled
Search

Scientific Coordination

Sebastian E. Wenz
Tel: +49 221 47694-159

Administrative Coordination

Jacqueline Schüller
Tel: +49 0221 47694-160

Course 7: Missing Data and Multiple Imputation

About
Location:
Cologne / Unter Sachsenhausen 6-8
 
General Topics:
Course Level:
Format:
Software used:
Duration:
Language:
Fees:
Students: 550 €
Academics: 825 €
Commercial: 1650 €
 
Keywords
Additional links
Lecturer(s): Florian Meinfelder, Doris Stingl

About the lecturer - Florian Meinfelder

About the lecturer - Doris Stingl

Course description

This course will provide an introduction to the theory and application of Multiple Imputation (MI) (Rubin 1987), which has become a very popular way of handling missing data because it allows for correct statistical inference in the presence of missing data. With the advent of MI algorithms implemented in standard statistical software (such as R, SAS, Stata, or SPSS), the method has become more accessible to data analysts. For didactic purposes, we will start the course by introducing some naive ways of handling missing data, and we will use the examination of their weaknesses to create an understanding of the MI framework. The first day of this course will be of a somewhat theoretical nature, as we believe that a fundamental understanding of the MI principle helps adapt to a wider range of practical problems, rather than focusing on only a few specific situations. We will subsequently shift to the more practical aspects of statistical analysis with missing data, and we will address frequent problems like regression with missing data. Further examples will also be covered throughout the course, and they will be predominantly based on the statistical programming language R. We recommend basic R skills for this course, but it is possible to understand the course contents without prior knowledge in R, as the main MI algorithms are almost identical across all major software packages.
 
For additional details on the course and a day-to-day schedule, please download the full-length syllabus.
 
Organizational structure of the course
A typical day will consists of three hours of classroom instruction and three hours of lab sessions. Since some of you might have been motivated to take the course because you had a missing data problem in your research, the lecturers will offer consultation slots during lab sessions. If the problem is straightforward to describe, the lecturers might offer to treat it as a model case for discussion in class.
Lab sessions will be based on R Markdown documents, which will be provided prior to the course. You are expected to work on the problems using R and relevant packages introduced in the course, either alone or in groups. The lecturers will provide guidance and can be contacted for questions.


Target group

You will find the course useful if:
  • you are a survey methodologist working with incomplete data,
  • you are a researcher who wants to learn more about the analysis of incomplete data in general,
  • you are already aware of MI and its benefits but still feel uncomfortable about using MI algorithms implemented in statistical software 


  • Learning objectives

    By the end of the course, you will:
  • be familiar with the theoretical implications of the MI framework and aware of its explicit and implicit assumptions (e.g. you will be able to explain within an article why MAR was assumed, etc.),
  • know when to use MI (and when not!),
  • know how to specify a "good" imputation model and how to use diagnostics,
  • be familiar with the availability of the various MI algorithms,
  • be able to not only replicate situations akin to the case studies covered in the course but also know how to handle incomplete data in general. 


  • Prerequisites

  • General knowledge of data preparation and analysis
  • An advanced understanding of the (generalized) linear model
  • Familiarity with statistical distributions
  • Basic knowledge of matrix algebra is helpful
  • Solid skills in R are recommended for exercises. You can also work in Stata (or another software of your choice), but the solutions to the exercises will be discussed and provided exclusively in R.
  •  
    Software and hardware requirements
    You will need to bring a laptop computer to successfully participate in this course.
     
    Before the course, you should install R version 4.2.3 or higher (https://cran.r-project.org/)  and RStudio (https://posit.co/download/rstudio-desktop/) or VS Code (https://code.visualstudio.com/) as IDE. These are free and open source.  
     
    For an introduction or refresher in R programming, you might consider enrolling in GESIS's two-day onsite course, Introduction to R for Data Analysis held in the first week of the Summer School in Cologne, or the four-day online workshop, Introduction to R offered in May


    Schedule

    Recommended readings