GESIS Training Courses

Scientific Coordination

André Ernst
Tel: +49 221 4703736

Administrative Coordination

Noemi Hartung
Tel: +49 621 1246-211

Data Quality Assessment for Online Survey Responses: Be Careful of the Careless

Mannheim, B6 4-5
General Topics:
Course Level:
Software used:
Students: 220 €
Academics: 330 €
Commercial: 660 €
Additional links
Lecturer(s): Nivedita Bhaktha, Thomas Knopf

About the lecturer - Nivedita Bhaktha

About the lecturer - Thomas Knopf

Course description

In survey research, some of the responses collected might be of low quality. This can have various reasons, such as survey design issues, respondent fatigue, lack of motivation, lack of understanding, carelessness, and response biases. Low-quality responses can threaten the validity and reliability of research findings and data analysis. They can also impact the effectiveness of statistical analyses, such as factor analyses and hypothesis testing, leading to inaccurate interpretations or misleading conclusions. Additionally, low-quality responses can hinder the identification of patterns, trends, and relationships within the data, limiting the researchers' ability to draw meaningful insights and make informed decisions based on the findings. Thus, identifying and addressing low-quality responses is crucial to ensure the integrity and robustness of research outcomes.
This is an introductory workshop on detecting and handling quality issues in data collected from online survey research. We will equip the audience with the knowledge and practical skills to perform basic and advanced data quality assessment procedures.
Focusing on response scales (e.g., Likert), the following topics will be covered:
• Classification and discussion of the importance and relevance of data quality assessment for online survey research (case studies)
• Recap of terminology in survey research
• Introduction to the taxonomy of Response Quality Indicators (RQIs) - theoretical framework as a guiding scheme
• Introduction to a set of response quality indicators both theoretically and practically. In more detail:
• Paradata - usage of, e.g., timing variables/response latencies
• Check items - usage of, e.g., directed and undirected attention checks (Bogus items) and similar
• Item responses - usage of three classes of RQIs that address outlier, consistency, and response patterns
We will provide software demonstrations & hands-on exercises on data quality assessment using R. We will also provide example datasets and R-scripts. The course material will be presented as PDF slides in English.
Organizational structure of the course
During the exercise, participants will work on assignments, individual or group work either on their own projects, or hands-on exercises with provided cases.
Lecturers will be available for individual consultations on participants' projects (on 2nd day), support work on assignments, and facilitate discussions within group work assignments.

Target group

Participants will find the course useful if:
  • they are planning or have already conducted scientific online surveys or employee surveys in large organizations
  • they mainly use single or multiple items/questionnaires with response formats that can be quantitatively analyzed
  • they aim to acquire robust skills that are necessary to understand, evaluate, and handle data quality issues

Learning objectives

By the end of the course participants will:
  • have learned about data quality taxonomy, issues, and concepts
  • have evaluated the quality of online survey responses and  detected  low quality data in the sample
  • have gained hands-on experience of data quality assessment in R, using commonly applied data quality indicators in online survey research


  • Basic knowledge of survey research terminology (survey mode, questionnaire, measurement instrument, item, type of question format, response options)
  • Basic understanding of descriptive statistics (counts, proportions, means, (co-)variances, correlations)
  • Basic skills in R (RStudio) for data handling and analysis (above all: package handling, data import and data wrangling/inspection of data frames)
    Software and hardware requirements
  • R version >= 4.2, for convenience, ideally in combination with the most recent RStudio version
  • R-packages for data import: (readr, sjmisc etc.); for data wrangling: tidyverse; for data analysis: e.g., psych
  • When bringing your own data, please prepare a wide-format data file (one row per participant)
    Note: Pay attention that you have sufficient (administrator) rights to install packages on your computer during the workshop!


    Recommended readings