Current projects
Better-Mods, a research project developing new tools for citizens to allow them to be informed better about current debates on online platforms. A collaboration with and the TULP research group at Tilburg University, funded by NWO.
Cultural AI, a lab for culturally valued AI. Cultural AI is the study, design and development of socio-technological AI systems that are implicitly or explicitly aware of the subtle and subjective complexity of human culture.
Current (co-) supervised Ph.D. students
- Ryan Brate (with Marieke van Erp and Laura Hollink)
- Shane Kaszefski Yashuk (with Massimo Poesio and Dong Nguyen)
- Golshid Shekoufandeh (with Paul Boersma)
- Joris Veerbeek (with Karin van Es and Mirko Schäfer)
- Cedric Waterschoot (with Ernst van den Hemel)
- Xiao Xu (with Anne Gauthier and Gert Stulp)
- Ronja van Zijverden (with Marloes van Moort, Karin Fikkers, and Hans Hoeken)
Former (co-) supervised Ph.D. students
Defended in the 2020s
- Martijn Bentum, Radboud University
- Lucas van der Deijl, University of Groningen
- Maarten van Gompel, Digital Infrastructure, KNAW Humanities Cluster
- Alessandro Lopopolo, Universität Potsdam
- Roel Smeets, Radboud University
- Eric Sanders, Radboud University
- Robbert De Troij, Leuven University
- Chara Tsoukala, Athena Research and Innovation Center
- Jinbiao Yang, Max Planck Institute for Psycholinguistics
- Sara Ahmadi, Radboud University
- Peter Berck, Lund University, Sweden
- Matje van de Camp, De Taalmonsters
- Marieke van Erp, Digital Humanities Lab, KNAW Humanities Cluster
- Ali Hürriyetoğlu, Digital Humanities Lab, KNAW Humanities Cluster
- Folgert Karsdorp, KNAW Meertens Instituut
- Florian Kunneman, Vrije Universiteit Amsterdam
- Maria Mos, Tilburg University
- Dong Nguyen, Utrecht University
- Herman Stehouwer, Valcon
- Sander Wubben, Devil Toad Studio
- Toine Bogers, Aalborg University Copenhagen, Denmark
- Sabine Buchholz, Capito Systems, UK
- Sander Canisius, The Netherlands Cancer Institute
- Iris Hendrickx, Radboud University
- Piroska Lendvai, Bavarian Academy of Science and Arts
- Laura Maruster, University of Groningen
- Stephan Raaijmakers, TNO and Leiden University
- Martin Reynaert, University of Amsterdam
Software and Digital Infrastructure
Aside from papers and dissertations, our projects tend to produce software.
We make a point of maximizing the availability of this software by releasing the best software projects under
open source licenses. Some of our software, particularly the packages that perform some natural language
processing function, are available as webservices,
usually with a web interface. Much of this software has been integrated in larger digital research
infrastructures at the national and European level.
As part of past and ongoing projects with many colleagues I was involved in developing the following software
and digital research infrastructure:
Natural Language Processing
Frog: Dutch tagger-lemmatizer, morphological analyzer,
and dependency parser. With the Frog development team.
T-Scan (also available as
web interface and web service): Dutch text analyzer for computing text
features indicative of text complexity. With Rogier Kraf, Henk Pander Maat, Ko van der Sloot, Maarten van Gompel,
Florian Kunneman, Susanne Kleijn, Ted Sanders, Maarten van der Klis, Sheean Spoel, Luka van der Plas. and
Dutch and English context-sensitive spelling correctors. With Maarten van Gompel, Wessel Stoop,
Tanja Gaustad van Zaanen, and Monica Hajek.
Colibri Core: Efficient n-gram and skipgram
modeling. With Maarten van Gompel.
Mbt: Memory-based tagger-generator and tagger.
With Ko van der Sloot, Jakub Zavrel, and Walter Daelemans.
WOPR: Memory-based word prediction, language modeling,
and spelling correction. With Peter Berck. Download the manual here if you cannot compile it. Below is a screenshot from March 2010 in which you see WOPR
in generative mode, predicting "Word Salads": consecutive next words based on an initial prompt.
This is how predecessors of ChatGPT looked like
at a time when we had little data and limited high-performance computing facilities.

Machine Learning
- Timbl: Tilburg memory-based learner. With Ko van der Sloot, Walter Daelemans, and Jakub Zavrel.
Digital Research Infastructure
CLARIAH, Common Lab Research Infrastructure for the Arts and Humanities.
Nederlab, bringing together massive amounts of digitized Dutch texts from the Middle Ages to the present in one user-friendly and tool-enriched web interface. Funded by NWO.
FutureTDM, a Horizon 2020 coordinate and support action on reducing barriers and increasing uptake of text and data mining for research environments.
TwiNL, A deeper understanding of society, a Pathfinding project of the Netherlands eScience Center. With Erik Tjong Kim Sang.
ISHER, Integrated Social History Environment for Research: Digging into Social Unrest, a Digging Into Data project.
Selected media items
English media items:
Dutch media items:
Mogelijk verbod op AI voor ambtenaren, Universiteit Utrecht, november 2023
UU-onderzoekers adviseren overheid over regulering ChatGPT en andere generatieve AI, Universiteit Utrecht
september 2023
De toverkracht van AI.
Els Aarts, Studium Generale, Universiteit Utrecht 14 juli 2023.
De Jortcast: Wordt Jort vervangen door een computer?. Podcastserie "De Jortcast", AvroTros.
2 februari 2023
Waarom AI de Nederlandse taal nog niet voldoende beheerst. Podcastserie "BNR Eyeopeners", BNR Nieuwsradio,
September 2022
Twitterfilter heeft moeite met het herkennen van meningen, Mathilde Jansen, Kennislink, April 2021
Kunstmatige intelligentie kan moderatie van online reacties ondersteunen, Britte Schilt, Kennislink,
January 2021
Hoe ontcijfer je
een onbekende boodschap? Emmeke Bos, Quest, June 2020
Het geheim van het Voynich-manuscript. Robin de Wever, Trouw, July 2018
Vertalen met neurale
netwerken. Steven Beek, Kennislink, August 2017
Google Translate in het onderwijs? Erica Renckens, Kennislink, October 2014
een nieuw vakgebied? - opinion piece (in Dutch) on the Google Books Ngram corpus and the Science
article in e-data & research. March 1, 2011
"Hacktivist", column (in Dutch) by Antal van den Bosch on pop sci website December 16, 2010
Article on SMS texting with T9, featuring our work on context-sensitive text completion, on pop sci website Kennislink (in Dutch). May 19, 2010