Scientific Coordination
Dr.
Marlene Mauk
Tel: +49 221 47694-579
Marlene Mauk
Tel: +49 221 47694-579
Administrative Coordination
Claudia O'Donovan-Bellante
Tel: +49 621 1246-221
Tel: +49 621 1246-221
Please wait...
Introduction to Machine Learning for Text Analysis with Python
About
Location:
Mannheim B6, 4-5
Mannheim B6, 4-5
General Topics
Course Level
Format
Software used
Duration
Language
Fees
Students: 500 €
Academics: 750 €
Commercial: 1500 €
Keywords
Additional links
Lecturer(s): Prof. Dr. Damian Trilling, Prof. Dr. Anne Kroon
Course description
The course will provide insights into the concepts, challenges, and opportunities associated with data so large that traditional research methods (like manual content analysis) cannot be applied anymore and traditional inferential statistics start to lose their meaning. Participants are introduced to strategies and techniques for capturing and analyzing digital data in communication contexts using Python. The course offers hands-on instructions regarding the several stages of computer-aided content analysis. More, in particular, students will be familiarized with pre-processing methods, analysis strategies, and the visualization and presentation of findings. The focus will be in particular on machine learning techniques to analyze quantitative textual data, amongst which both deductive (e.g., supervised machine learning and inductive (e.g., unsupervised machine learning) approaches will be discussed.
This is a beginner's course. Participants who are looking to learn about the latest developments in machine learning for textual data (such as transformer models) should consider taking a different course. These techniques will be (briefly) discussed towards the end of the course, but the focus lies on the basics of natural language processing and classical machine learning in Python.
For additional details on the course and a day-to-day schedule, please download the full-length syllabus.
Target group
Participants will find the course useful if:
Learning objectives
By the end of the course participants will:
Organizational structure of the course
In the morning, we will have lectures, in which we will explain the topic of the day both from a theoretical-conceptual point of view as well as from a practical point of view (i.e., walking you through code examples). We may have small in-class exercises in between, if necessary.
In the afternoon, students work on larger exercises in which they implement the techniques we covered. We provide example datasets, but it is also possible (and encouraged) to try to apply the techniques to own datasets. Due Participants can either opt to work on their own or try to solve problems together with one or multiple classmates. Lecturers will provide feedback on the (attempted) solutions of participants, and also provide example solutions.
Prerequisites
Software and hardware requirements
Participants need to have a current Python environment installed and need to be able to install and update packages on their own. All relatively recent versions of Python (in general, 3.8 or higher) should be fine. If you still have an older version, you may not be able to run the example code 1:1 but need to adapt it. Make sure you have recent versions of crucial packages such as pandas, numpy, scipy, scikit-learn, gensim, and keras installed. If in doubt, check how to update them. One option to achieve all of this is to simply install the newest version of the so-called Anaconda distribution, even though this is by no means necessary (in fact, both of us usually install our packages by hand instead of using Anaconda).
Participants should bring their own laptops and pre-install the following software/packages:
Recommended related courses