Schedule
Day 1: Introduction to Computational Social Science
Self-learning Session 1: to be completed before Live Session 1 (see below), estimated workload: 3h
This session will give a broad overview of the field of computational social science, related fields and its development. The material will also provide background on the ethical conduct of CSS research and give participants practical guidelines on how to make sure their research adheres to ethical (and legal) standards. Additionally, it will provide basic skills in Exploratory Data Analysis (EDA) and visualization and going over how to set up things properly for the course.
Literature (preliminary; details on what is mandatory and what is recommended will follow):
- Salganik, M. J. (2019). Bit by bit: Social research in the digital age. https://www.bitbybitbook.com/en/1st-ed/introduction/
- Lazer, D. M., Pentland, A., Watts, D. J., Aral, S., Athey, S., Contractor, N., ... & Wagner, C. (2020). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060-1062. https://doi.org/10.1126/science.aaz8170
- Theocharis, Y. & Jungherr, A. (2021). Computational Social Science and the Study of Political Communication. Political Communication, 38(1-2), 1-22. https://doi.org/10.1080/10584609.2020.1833121
- van Atteveldt, W. & Peng, T.-Q. (2018). When Communication Meets Computation: Opportunities, Challenges, and Pitfalls in Computational Communication Science. Communication Methods and Measures, 12(2-3), 81-92. https://doi.org/10.1080/19312458.2018.1458084
Live Session 1: Friday, August 30th, 2:00-5:30 pm CEST
In this session we will get to know each other. There will also be room for questions and we will make sure that everyone has the software up and running. We will then break into groups to work on a small task.
Day 2: Obtaining Data
Self-learning Session 2: to be completed before Live Session 2 (see below), estimated workload: 3h
A core idea of CSS is that you work with found, rather than with designed data. Designed data would be data that was specifically collected through a survey or experiment, where the researcher controls what questions to ask and in what format the data is returned. Found data, on the other hand, are often traces left behind by people who were doing something that had nothing to do with research. Like writing, liking, sharing or deleting a post on social media, using a website, an app or a service or doing the jobs, for example, as politicians, journalists or book authors. These data can often tell us more about the actual behavior or people and institutions than what they would share in a survey or experiment. They are also often cheaply available on mass. But to download, wrangle and clean them can be difficult. This session gives and overview of web scraping, which is exactly the process of downloading, wrangling and cleaning data to make it possible to analyze it.
Literature (preliminary):
- Freelon, D. (2018). Computational Research in the Post-API Age. Political Communication, 35(4), 665-668. https://doi.org/10.1080/10584609.2018.1477506
- Hennesy, C., and Samberg, R. (2019). “Law and Literacy in Non-Consumptive Text Mining: Guiding Researchers Through the Landscape of Computational Text Analysis.” Copyright Conversations: Rights Literacy in a Digital World. https://escholarship.org/uc/item/55j0h74g
- Luscombe, A., Dick, K. & Walby, K. (2022). Algorithmic thinking in the public interest: navigating technical, legal, and ethical hurdles to web scraping in the social sciences. Qual Quant, 56, 1023-1044. https://doi.org/10.1007/s11135-021-01164-0
- Tromble, R. (2021). Where have all the data gone? A critical reflection on academic digital research in the post-API age. Social Media+ Society, 7(1), 2056305121988929. https://doi.org/10.1177/2056305121988929
Live Session 2: Monday, September 2nd, 2:00-5:30 pm CEST
After answering some questions, you will split up into breakout groups to work on a small web scraping project.
Day 3: Computational Text Analysis
Self-learning Session 3: to be completed before Live Session 3 (see below), estimated workload: 3h
The advent of computational text analysis methods has reinvented the field of CSS. A lot of human interaction online is happening through textual data - at a scale that make efforts to manually analyze it to answer theoretical questions essentially impossible. Computational text analysis thus takes up a prominent role in CSS. In this session, you will get an overview of the most important approaches, namely dictionary methods, and supervised and unsupervised machine learning. We will also introduce deep and transfer learning with a focus on using generative Large Language Models (LLMs) for topic modelling and sentiment analysis.
Literature (preliminary):
- Grimmer, J., & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267-297. https://doi.org/10.1093/pan/mps028
- Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: an overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4(1), 8-23. https://doi.org/10.1080/21670811.2015.109659
- Atteveldt, W. van, Velden, M. A. C. G. van der, & Boukes, M. (2021). The Validity of Sentiment Analysis: Comparing Manual Annotation, Crowd-Coding, Dictionary Approaches, and Machine Learning Algorithms. Communication Methods and Measures, online first. https://doi.org/10.1080/19312458.2020.1869198
- Welbers, K., Atteveldt, W. V., & Benoit, K. (2017). Text Analysis in R. Communication Methods and Measures, 11(4), 245-265. https://doi.org/10.1080/19312458.2017.1387238
- Spirling, A. (2023). Why open-source generative AI models are an ethical way forward for science. Nature. https://doi.org/10.1038/d41586-023-01295-4
- Weber, M., Reichardt, M. (2024). Evaluation is all you need. Prompting Generative Large Language Models for Annotation Tasks in the Social Sciences. A Primer using Open Models. https://doi.org/10.48550/arXiv.2401.00284
Live Session 3: Tuesday, September 3rd, 2:00-5:30 pm CEST
After answering some questions, you will split up into breakout groups to work on a small text analysis project.
Day 4: Computational Network Analysis
Self-learning Session 4: to be completed before Live Session 4 (see below), estimated workload: 3h
Social and political network analysis has been heavily influenced by developments in computational social science, such as the availability of massive timestamped related data (e.g., from social media) and the development of new computationally-intensive modelling frameworks. This session will introduce you to the basic language, data, methods and models used in network analysis. We will emphasize working with social media data and generative approaches to inferential network analysis.
Literature (preliminary):
- Atteveldt, W.v., Trilling D. & Arcila, C (2022). Computational Analysis of Communication. Chapter 13. https://v2.cssbook.net/content/chapter13
- Leifeld, P. (2017). “Discourse Network Analysis: Policy Debates as Dynamic Networks” in Victor, J.N., Montgomery, A. H., & Lubell, M. N. (eds). The Oxford Handbook of Political Networks, pp. 301-325. Oxford University Press. http://dx.doi.org/10.1093/oxfordhb/9780190228217.013.25
- Kitts, J., Grogan, H., & Lewis, K. (2023). “Social Networks and Computational Social Science” in McLevey, J., Scott, J., & Carrington, P. (eds). The Sage Handbook of Social Network Analysis, pp. 44-54. London, UK: Sage.
Live Session 4: Wednesday, September 4th, 2:00-5:30 pm CEST
After answering some questions, you will split up into breakout groups to work on a small network analysis project.
Day 5: Social Simulation & Agent-based Models
Self-learning Session 5: to be completed before Live Session 4 (see below), estimated workload: 3h
In network analysis, we analyze how different actors relate to each other in terms of what they say, do and are. In agent-based modeling, we simulate these connections, given a set of theoretical assumptions. This can be used to study social phenomena, such as a topic going viral, a social network becoming polarized or an actual disease spreading. If data is unavailable or would be unethical to obtain, agent-based models offer a valid approach to produce synthetic data to test theories.
Literature (preliminary):
- Wettstein, M. (2020). Simulating hidden dynamics: Introducing Agent-Based Models as a tool for linkage analysis. Computational Communication Research 2 (1): 1-33. https://doi.org/10.5117/CCR2020.1.001.WETT
- Chapters 1 and 3 from Railsback, S. and Grimm, V. (2011). Agent-based and Individual Modelling: A Practical Introduction. Princeton University Press.
- Bruch, E. & Atwell, J. (2015). Agent-based models in empirical social research. Sociological Methods and Research. 44(2): 186-221. https://doi.org/10.1177/0049124113506405
Live Session 5: Thursday, September 5th, 2:00-5:30 pm CEST
After answering some questions, you will split up into breakout groups to work on a small simulation project.