Natural Language Processing

The 2019 episode

Labs (I-8 or any other room upon previous appointment)
Tuesday, 12:20 - 14:00 (voluntary)
Lectures (I-8)
Tuesday, 14:00 - 15:40 (voluntary)

Table of Contents


22nd of February
Discussed material:
Supplementary resources:
  • Eliza Bot Demo: one of the most famous use cases of regular expressions. It is really worth trying out -- you may end up having some surprisingly good conversations.
  • Unix for poets: a nice 25 pages worth of examples on how to process text on the Unix command line. Here is a shorter version by the authors of the SLP book.
  • Scriptio continua: the reason why English also nearly ended up without word and sentence separators.
26th of February
Discussed material:
Supplementary resources:
  • subsync: a tool for automatically synchronizing subtitles with video (a nice use-case of using alignment in a not-so-ordinary context)
  • Autocomplete using Markov chains: a nice example (along with code in Python) that shows how Language Models can be used to generate "text resembling language" and build a simple 'autocomplete' engine.
5th of March
Discussed material:
  • Language modeling
    • Estimating N-gram probabilities
    • Perplexity and Language Model Evaluation
    • Dealing with zeros
    • Smoothing, backoff and interpolation
    • "Stupid backoff"
    • Most of SLP Chapter III
Supplementary resources:
  • Google Ngram Viewer -- a nice way of visualizing the rate of use of n-grams in books written in various languages. Check out this quick example for instance.
  • kenlm -- an open-source language modeling toolkit. Probably best in class when it comes to speed and memory efficiency.


Assignments: 50%
Project: 50%

Assignments are available via the Moodle e-learning system but they are also available in the following repository on GitHub:

A list of project ideas can be found here.

Points Grade
(90, inf] A
(80, 90] B
(70, 80] C
(60, 70] D
(50, 60] E
[0, 50) FX