Course 11: Natural Language Processing

Rico Sennrich · University of Zurich

Lecturer

Rico Sennrich is Associate Professor of Natural Language Processing at the Department of Computational Linguistics, University of Zurich, and an Honorary Fellow at the University of Edinburgh and an ELLIS Fellow. His research focuses on machine translation, multilingual NLP, tokenization, neural architectures, and data augmentation — contributions that have influenced language models including ChatGPT. He is widely recognized for his work on byte pair encoding (BPE) for neural machine translation and for advancing low-resource and efficient NLP methods. Sennrich has supported the DARIAH-CH consortium and contributes to bridging computational linguistics and digital humanities research in Switzerland.

🤖 Bio generated by AI from public academic profile. Homepage · ORCID

Lecture Overview

Overview

Sennrich presents natural language processing as a flexible toolkit for making text collections more accessible and analyzable. He focuses especially on language modeling and machine translation, but keeps returning to the broader question of how NLP can support digital humanities workflows.

Main Points

NLP can assist with OCR post-processing, normalization, translation, named entity recognition, sentiment analysis, and other large-scale text tasks.
The lecture uses machine translation to explain the broader logic of modern NLP models and why many of these methods generalize across tasks.
Sennrich contrasts rule-based systems with later statistical and neural approaches, showing why data-driven methods became dominant.
Parallel corpora are central for training translation systems, and the lecture highlights how large such datasets can be.
A recurring theme is practical application: digital humanities researchers need to understand both what NLP makes possible and what preparation historical text collections require.

Examples Mentioned

Historical machine translation
Language modeling as a general technique
Europarl, OpenSubtitles, and web-crawled parallel corpora
OCR, normalization, translation, and annotation pipelines

Source transcript: transcripts/Course 11_Sennrich_NLP.txt

Reuse

CC BY-SA 4.0