Professor of Language, Communication and Computation at Utrecht University

Board member and domain chair Social Sciences and Humanities, NWO | Dutch Research Council

a.p.j.vandenbosch@uu.nl

visiting
Utrecht University, Faculty of Humanities
Trans 10
3512 JK Utrecht
The Netherlands

postal
Utrecht University, Faculty of Humanities
P.O. Box 80125
3508 TC Utrecht
The Netherlands


 

Click and explore the following interactive publications:

Catching Cyberbullies with Neural Networks, Wessel Stoop, Florian Kunneman, Antal van den Bosch, Ben Miller. The Gradient, 2021

How Algorithms Know What You'll Type Next, Wessel Stoop, Antal van den Bosch. The Pudding, 2019

Research interests

In my research I develop machine learning and language technology. Most of my work involves the intersection of the two fields: computers that learn to understand and generate natural language, nowadays known as Generative AI and Large Language Models. The computational models that this work produces have all kinds of applications in other areas of scholarly research as well as in society and industry. They also link in interesting ways to theories and developments in linguistics, psycholinguistics, neurolinguistics, and sociolinguistics. I love multidisciplinary collaborations to make advances in all these areas.

I am involved in consulting academic and governmental bodies on the implications of generative AI, including the types of large language models I have been developing and working with. Answering the question how to regulate these technologies requires synergistic collaborations between AI and governance researchers. Read more on this topic: here and here - with colleagues Fabian Ferrari and José van Dijck.

My CV has more detailed information. Also, check out my blog for some loose thoughts and sketches.

Publications

 
   

Affiliations

Since September 2022 I work at the Faculty of Humanities at Utrecht University as professor in Language, Communication and Computation. On July 1, 2023 I joined the Board of NWO, the Dutch Research Council, as domain chair Social Sciences and Humanities.

From 2017 to 2022 I was director of the Meertens Institute, an institute of the Royal Netherlands Academy for Arts and Sciences. Before that, I was professor of Language and Speech Technology at Radboud University, Nijmegen, the Netherlands, within the Centre for Language Studies (CLS) and the the Centre for Language and Speech Technology of the Faculty of Arts. From 1997 to 2011 I worked at what is now called the Tilburg School of Humanities and Digital Sciences of Tilburg University, in the ILK Research Group. I obtained my PhD at Maastricht University at what is now the Department of Advanced Computing Sciences.

I am guest professor at CLiPS, the Computational Linguistics, Psycholinguistics and Sociolinguistics Research Centre at the University of Antwerp. I am fellow of the European Association for Artificial Intelligence (EurAI), member of the Royal Netherlands Academy of Arts and Sciences, and of the Koninklijke Hollandsche Maatschappij der Wetenschappen.

 

Public Talks

It is customary for newly appointed professors in the Netherlands to give a public lecture at the onset of a new professorship. I had the privilege to do so at Tilburg University (2008) and Radboud University, Nijmegen (2012). The same happened in 2023 when I was honored to be awarded the KULAK Francqui Leerstoel 2023 on Language and AI at KULAK, Leuven University campus Kortrijk.

All three public lectures evolve around my amazement with the deceptively mundane task of predicting the next word, the magical trick task that makes GPT-based language models work. A point I made in 2008 is that data-driven word predictors continue to be more accurate when trained on more data. This relation is log-linear: every ten-fold increase of training data leads to an approximately constant improvement in terms of the percentage of correctly predicted words. This is expected given Heaps' Law.

The lectures were in Dutch:

 

Past projects & highlights

Research projects

Current projects

  • Better-Mods, a research project developing new tools for citizens to allow them to be informed better about current debates on online platforms. A collaboration with nu.nl and the TULP research group at Tilburg University, funded by NWO.

  • Cultural AI, a lab for culturally valued AI. Cultural AI is the study, design and development of socio-technological AI systems that are implicitly or explicitly aware of the subtle and subjective complexity of human culture.

  •  

    Courses

     

    Current (co-) supervised Ph.D. students

    • Ryan Brate (with Marieke van Erp and Laura Hollink)
    • Shane Kaszefski Yashuk (with Massimo Poesio and Dong Nguyen)
    • Golshid Shekoufandeh (with Paul Boersma)
    • Joris Veerbeek (with Karin van Es and Mirko Schäfer)
    • Cedric Waterschoot (with Ernst van den Hemel)
    • Xiao Xu (with Anne Gauthier and Gert Stulp)
    • Ronja van Zijverden (with Marloes van Moort, Karin Fikkers, and Hans Hoeken)

    Former (co-) supervised Ph.D. students

    Defended in the 2020s

    2010s

    2000s

    • Toine Bogers, Aalborg University Copenhagen, Denmark
    • Sabine Buchholz, Capito Systems, UK
    • Sander Canisius, The Netherlands Cancer Institute
    • Iris Hendrickx, Radboud University
    • Piroska Lendvai, Bavarian Academy of Science and Arts
    • Laura Maruster, University of Groningen
    • Stephan Raaijmakers, TNO and Leiden University
    • Martin Reynaert, University of Amsterdam

     

    Software and Digital Infrastructure

    Aside from papers and dissertations, our projects tend to produce software. We make a point of maximizing the availability of this software by releasing the best software projects under open source licenses. Some of our software, particularly the packages that perform some natural language processing function, are available as webservices, usually with a web interface. Much of this software has been integrated in larger digital research infrastructures at the national and European level.

    As part of past and ongoing projects with many colleagues I was involved in developing the following software and digital research infrastructure:

    Natural Language Processing

    • Frog: Dutch tagger-lemmatizer, morphological analyzer, and dependency parser. With the Frog development team.

    • T-Scan (also available as web interface and web service): Dutch text analyzer for computing text features indicative of text complexity. With Rogier Kraf, Henk Pander Maat, Ko van der Sloot, Maarten van Gompel, Florian Kunneman, Susanne Kleijn, Ted Sanders, Maarten van der Klis, Sheean Spoel, Luka van der Plas.

    • Valkuil.net and Fowlt.net: Dutch and English context-sensitive spelling correctors. With Maarten van Gompel, Wessel Stoop, Tanja Gaustad van Zaanen, and Monica Hajek.

    • Colibri Core: Efficient n-gram and skipgram modeling. With Maarten van Gompel.

    • Mbt: Memory-based tagger-generator and tagger. With Ko van der Sloot, Jakub Zavrel, and Walter Daelemans.

    • WOPR: Memory-based word prediction, language modeling, and spelling correction. With Peter Berck. Download the manual here if you cannot compile it. Below is a screenshot from March 2010 in which you see WOPR in generative mode, predicting "Word Salads": consecutive next words based on an initial prompt. This is how predecessors of ChatGPT looked like at a time when we had little data and limited high-performance computing facilities.

    Machine Learning

    • Timbl: Tilburg memory-based learner. With Ko van der Sloot, Walter Daelemans, and Jakub Zavrel.

    Digital Research Infastructure

    • CLARIAH, Common Lab Research Infrastructure for the Arts and Humanities.
    • Nederlab, bringing together massive amounts of digitized Dutch texts from the Middle Ages to the present in one user-friendly and tool-enriched web interface. Funded by NWO.
    • FutureTDM, a Horizon 2020 coordinate and support action on reducing barriers and increasing uptake of text and data mining for research environments.
    • TwiNL, A deeper understanding of society, a Pathfinding project of the Netherlands eScience Center. With Erik Tjong Kim Sang.
    • ISHER, Integrated Social History Environment for Research: Digging into Social Unrest, a Digging Into Data project.

     

    Selected media items

    English media items:

    Dutch media items:

    Games

    a.p.j.vandenbosch@uu.nl