Scientific Coordination

Missing data are a pervasive problem in the social sciences. Data for a given unit may be missing entirely, for example, because a sampled respondent refused to participate in a survey (survey nonresponse). Alternatively, information may be missing only for a subset of variables (item nonresponse), for example, because a respondent refused to answer some of the questions in a survey. The traditional way of dealing with item nonresponse, referred to as “complete case analysis” (CCA) or “listwise deletion”, excludes every observation with missing information from the analysis. While easy to implement, complete case analysis is wasteful and can lead to biased estimates. Multiple imputation (MI) seeks to address these issues and provides more efficient and unbiased estimates when certain conditions are met. Over the past decades, it has therefore become a widely used alternative to CCA across the social sciences.

The goals of the course are to introduce you to the basic concepts and statistical foundations of missing data analysis and MI, and to enable you to use MI in your own work. The course puts heavy emphasis on the practical application of MI and on the complex decisions and challenges researchers are facing in its course. The focus is on MI using iterated chained equations (aka “fully conditional specification”) and its implementation in the software package Stata. You should have a good working knowledge of Stata to follow the applied parts of the course and to successfully master the exercises. If you are not familiar with Stata, you may still benefit from the course but will likely find the exercises quite challenging.

The full syllabus of the course including the day-to-day schedule will be published here in April.

Target group

You will find the course useful if:

you use survey or other types of quantitative data and want to learn about MI as an alternative to CCA,
you are already using MI but want to gain a better understanding of the underlying assumptions, of current best practice recommendations, and/or of how to solve specific problems that arise in its application (e.g., imputation diagnostics, convergence problems, imputation of transformed variables such as interactions, imputation of hierarchical and longitudinal data).

Learning objectives

By the end of the course, you will:

understand basic concepts of missing data analysis such as “missing at random”,
be familiar with different approaches of how to handle item nonresponse and with their advantages and drawbacks,
have a solid understanding of the main assumptions and statistical theory underlying MI and of the main steps of an analysis involving MI (imputation, diagnostics, and analysis),
know how to implement MI using chained equations in Stata,
know how to deal with various (Stata-specific and general) practical complications that arise in the application of MI using chained equations.

Organizational structure of the course

The course will feature four hours of classroom instruction and two hours of hands-on exercises and/or group work each day. Exercises and group work will usually take place in the afternoon and the lecturers will be present to answer questions and provide assistance.

Lecturers will also be available for individual consultations during the group work/exercise sessions. This opportunity could be used to further discuss specific questions and issues that could not be addressed sufficiently in class, including questions that relate to your own ongoing research projects. If you are interested in individual consultations concerning your ongoing projects, you are encouraged to contact the lecturers before the course and to provide a short description of the issues you would like to discuss.

Prerequisites

Experience in the analysis of quantitative data
Good knowledge of regression analysis
Good working knowledge of Stata
Basic understanding of probability theory and sampling

Software and hardware requirements

You will need to bring a laptop computer to successfully participate in this course.

For exercises, this course relies on Stata. Stata short term licenses will be provided by GESIS for the duration of the course if needed. If you own a Stata license, should make sure that you have a recent version (at least Stata 13) of Stata installed.

Stata ados used during the course (and, ideally, installed before the course) include:

parmby
how_many_imputations
midiagplots
mimrgns