
Welcome! I am an Associate Professor of English Linguistics and Corpus Linguistics at the Department of Language Science and Technology at Saarland University and President of the ACL Special Interest Group on Language Technologies for the Socio-Economic Sciences and Humanities (SIGHUM). My research interests lie in text mining and data analytics for research questions within the Digital Humanities, in particular from sociolinguistics, register/language variation as well as change in language use. My recent focus has been on modeling variation and change in language use considering linguistic as well as other possible variables that might be at play by using probabilistic models.
Recent News
May 2025 – Our group is going to be at NAACL 2025 and the SIGHUM LaTeCH-CLfL Workshop in Albuquerque NM, where we are going to present work on propagandistic narratives, personality traits from the Big Fives in LLMs, and Multi-Word Expressions in English Scientific writing.
April 2025 – Sofia Aguilar and I participated and gave a talk at the Large Language Models for the History, Philosophy, and Sociology of Science at TU Berlin organized by Gerd Graßhoff, Arno Simons, Adrian Wüthrich, and Michael Zichert. It has been a very valuable experience presenting our CASCADE work on modeling change in language use with information-theoretic methods and Graph Neural Nets and being able to exchange thoughts with historians, philosophers and sociologist next to LLM experts.
March 2025 – We successfully organized the 2nd Camp in our CASCADE project (EU Horizon MSCA Doctoral Network). Two packed days of activities included ESR group work & brainstorming sessions; poster presentations and even a guided tour of 500 years of Saarbrücken history at the SAAR Historical Museum. Read more here
LaTeCH-CLfL 2025 workshop accepted at NAACL (in Albuquerque). CfP is out: Consider submitting! Deadline. Jan 30th 2025.
Attending with the CASCADE group Camp 1 of our EU MSCA Network in Sheffield. Looking forward to our first in-person meeting of all ECRs and PI in the project! (Nov. 25th-28th 2024)
Invited to join an international reading group on Studying Science with Digital Methods organized by Catherine Herfeld (Prof. of Philosophy at Leibniz Hannover Universität leading an ERC on Model Transfer) and Charles H. Pence (Assoc. Prof. for the History of Science at UC Louvain). Excited to exchange thoughts with historians of science! (Winter term 2024/25)
From the SFB1102-B1 Project and the CASCADE project, we’re going to visit our colleagues in Stuttgart at IMS for the workshop on Semantic change of MultiWordExpressions. (Oct. 26-27, 2024)
I’ve been invited to talk at Schloss Dagstuhl – Leibniz Zentrum für Informatik in the Seminar Interdisciplinary Perspectives on Information (Oct. 2-4, 2024)
We’ve kick-started our MSCA CASCADE projects this month (Sept. 2024) with my team members Sofía Aguilar (MA in Bioengineering and Intelligent Computing) and Anastasiia Vestel (MA in Technology for Translation and Interpreting).
Current Projects
EU Horizon MSCA on Computational Analysis of Semantic Change Across Different Environments (CASCADE)
Official website: https://www.horizoncascade.net/
PhD Project 1: Modeling context for the analysis of language variation and change
https://www.uni-saarland.de/fileadmin/upload/verwaltung/stellen/Wissenschaftler/W2439.pdf
PhD Project 2: Diachronic development of text types in the English Language
Klicke, um auf W2440.pdf zuzugreifen
In CASCADE I collaborate with James O’Sullivan from University College Cork Ireland, Dirk Spee Katholieke Universiteit Leuven Belgium, Mikko Tolonen from Helsingin Yliopisto Finland and 9 partners throughout Europe. We aim to train early-stage researchers to develop and apply innovative methodologies for Computational Analysis of Semantic Change Across Different Environments (CASCADE), i.e. to identify, analyse and interrogate how meaning is expressed in language in diverse contexts, with a shared focus on the impact of time (diachronic text analytics). CASCADE responds to a skills deficit within the academic, public and commercial sectors: the need for people able to retrieve, critically evaluate and make better use of the large volumes of textual data that characterize our contemporary information society (the ‘data deluge’), thus directly contributing to empowering Digital Humanists.
Information Density in Englisch Scientific Writing: A Diachronic Perspective (SFB1102, B1)
I’m a Principal Investigator with Elke Teich in the Collaborative Research Center (SFB 1102) Information Density and Linguistic Encoding for Project B1, where we investigate register formation and linguistic densification in the evolution of scientific writing in English (17th century to present) (PostDoc Diego Alves and PhD Student Isabell Landwehr).
The overarching goal of B1 is to gain insights into the role of rational communicative concerns in diachronic language change. Specifically, we are interested in the emergence of sublanguages or registers, i.e. distinctive, fairly persistent functional varieties, focusing on scientific English and its development in the late modern period (1700–1900) up to recent times. We started with the overall hypothesis of communicative optimization, stating that scientific English developed an optimal code for expert-to-expert communication over time. Based on a comprehensive corpus compiled from the publications of the Royal Society of London, we applied selected types of computational language models (e.g. topic models, n-gram models, word embeddings) and combined them with information-based measures (e.g. entropy, surprisal) to capture diachronic variation. Across different models, we observe the same trend of overall decreasing entropy with temporary peaks of high entropy/surprisal (innovation) and a continuous re-assessment of existing linguistic options, manifested by discarding options, shifting options to other contexts of use (diversification), or giving strong preference to one option over alternative ones (conventionalization). The choice-constraining effects associated with diversification and conventionalization point to a general diachronic mechanism formaintaining communicative function, which is a major novel insight arising from our studies.
Previous projects:
Project: Overcoming the computational hurdle and exploiting opportunities for humanities students in linguistic research for the Digital Humanities (DH) with Chat-GPT
I’ve received funding (April 2023 – March 2024) from the Data-Pin project „Innovative use of AI in education“ to exploit how AI-Systems such as ChatGPT can be used for programming tasks and beyond by humanities students. With my Research Assistant Sergei Bagdasarov, we design teaching modules for programming tasks and statistical analysis using R and Python.
I received funding from Saarland University (2021-2022) with my colleague Francesca Delogu to work on Impact of register and sociolinguistic factors on textual coherence. We combine corpus-based and experimental approaches to investigate how social factors (e.g. expert knowledge in a domain) may affect comprehension. We assume that the use of cohesive devices in a text highly depends on (a) a text’s function in a situational context, i.e. the register (e.g. narrative vs expository texts), and (b) social factors, e.g. the comprehender’s age, education or expertise in a particular domain.
Previously to this, I had received UdS funding to investigate Linguistic Profiles of Social Variables in Diachrony (SLingPro). To observe possible linguistic profiles of social variables in diachrony my team and I have used the Old Bailey Corpus (Huber et al., 2016) — a digital collection of spoken texts based on the proceedings of the London’s central criminal court from the 18th and 19th century, which is annotated for social variables such as age, gender, and social class.
In my PhD I focused on combining macro- and micro-analytical methods for register analysis on evaluative language (PhD Thesis).
