Scientific Coordination
Verena Kunz
Administrative Coordination
Janina Götsche
Please wait...
Going Cross-Lingual: Computational Methods for Multilingual Text Analysis
About
Location:
Cologne / Unter Sachsenhausen 6-8
Cologne / Unter Sachsenhausen 6-8
General Topics:
Course Level:
Format:
Software used:
Duration:
Language:
Fees:
Students: 300 €
Academics: 450 €
Commercial: 900 €
Keywords
Additional links
Lecturer(s): Hauke Licht, Fabienne Lind
Course description
The wide-reaching and still growing digitalization of communication in the form of text data raises demands for international, cross-lingual comparative research. For example, large, multilingual text collections of political parties' campaign materials or politicians' parliamentary speeches invite cross-country comparative analysis of political behavior. Likewise, the availability of large collections of national news outlets' coverage about internationally highly relevant topics like economic inequality, climate change, or immigration allow the comparative analysis of various national perspectives.
Fortunately, an increasing number of contributions to the (computational) social science literature present approaches to analyze multilingual text collections with text-as-data methods. In this workshop, participants will learn about these approaches and strategies for studying social science-related concepts in multilingual text collections with automated content analysis methods. Specifically, we will focus on (machine) translation, multilingual embedding and transfer learning approaches.
We will focus on aspects relevant for applying these methods to compare concepts across socio-political contexts. Through a combination of theoretical discussions and practical exercises, participants will learn how to effectively apply (neural) machine translation and multilingual embedding techniques to analyze texts quantitatively across languages. Additionally, we will delve into the underlying assumptions that motivate these approaches and practice validating cross-lingual measurements.
By the end of the workshop, participants will have a strong understanding of key concepts and approaches in the existing multilingual text analysis literature, as well as the ability to implement them in R and/or Python through hands-on exercises.
Target group
Participants will find the course useful if they
Learning objectives
Throughout the course, participants will
Organizational structure of the course
The workshop introduces all topics through a lecture format followed by practical examples to illustrate the concepts. During the lecture, the instructors provide a thorough overview of each topic, highlighting the latest research methods, and introducing relevant resources. The lectures also include shorter interactive parts where students can participate in reflections of and discussions on different methods in both plenary and small group settings. The practical part of the course consists of hands-on exercises in the lab. Here, participants work together on preselected data to apply the concepts learned. In addition, the instructors offer small coding challenges that students can complete on their own or in groups after class hours on a voluntary basis. Solutions for both the in-class exercises and voluntary take-home assignments are provided.
Prerequisites
Software and hardware requirements
Agenda
Wednesday, 06.12. | |
10:00 - 11:30 | Introduction to the topic, overview about applications and main problems (input by instructors) |
11:30 - 11:45 | Coffee break |
11:45 - 13:00 | Introduction to the main solutions approaches (input by instructors + group discussion) |
13:00 - 14:00 | Lunch break |
14:00 - 14:30 | Valid data selection in multilingual & multi-context scenarios (input by instructors) |
14:30 - 15:30 | Data source selection (group exercise) |
15:30 - 15:45 | Coffee break |
15:45 - 17:00 | Search string/keyword selection and testing (hands-on exercise in the lab with preselected data ) |
Thursday, 07.12. | |
09:30 - 11:00 | Machine translation, multilingual embeddings, large language models (input by instructors) |
11:00 - 11:15 | Coffee break |
11:15 - 12:30 | Implementing the main solutions for supervised machine learning (hands-on exercise in the lab with preselected data, code with solutions is prepared) |
12:30 - 13:30 | Lunch break |
13:30 - 15:00 | Implementing the main solutions for unsupervised machine learning (hands-on exercise in the lab with preselected data, code with solutions is prepared) |
15:00 - 15:15 | Coffee break |
15:15 - 15:45 | Valid outputs in multilingual & multi-context scenarios (input by instructors) |
15:45 - 16:30 | Creation of a validation benchmark (group exercise) |
Friday, 08.12. | |
09:30 - 10:15 | Valid inputs and processes in multilingual & multi-context scenarios (input by instructors) |
10:15 - 11:00 | Pre-processing of multilingual data (hands-on exercise in the lab with preselected data, code with solutions is prepared) |
10:00 - 11:15 | Coffee break |
11:15 - 12:30 | Process monitoring of multilingual data (hands-on exercise in the lab with preselected data, code with solutions is prepared) |
12:30 - 13:30 | Lunch break |
13:30 - 15:00 | Lecturers are available for individual consultations on participants' projects. Time can also be used by participants to work on their projects. We further prepare case studies for participants who prefer to work on prepared datasets and questions. |
15:00 - 15:15 | Coffee break |
15:15 - 16:30 | Lecturers are available for individual consultations on participants' projects. Time can also be used by participants to work on their projects or the prepared examples the instructors provide. |