The 2020 episode at Faculty of Mathematics, Physics and Informatics of Comenius University
- Lectures (I-23)
Wednesday, 9:50 - 11:30 (voluntary)
- Labs (H-6)
Tuesday, 11:30 - 13:00 (voluntary)
Lectures#
Lesson 1: Intro#
28th of February
- Discussed material:
- Course information (see below)
- Basic Text Processing (slides)
- Regular Expressions
- Tokenization
- Text normalization
- Basically the first part of SLP Chapter II
- Supplementary resources:
- Eliza Bot Demo: one of the most famous use cases of regular expressions. It is really worth trying out -- you may end up having some surprisingly good conversations.
- Unix for poets: a nice 25 pages worth of examples on how to process text on the Unix command line. Here is a shorter version by the authors of the SLP book.
- Scriptio continua: the reason why English also nearly ended up without word and sentence separators.
- Regex101: a very nice application for working with (and specifically trying out) various regular expression. Note that the link goes to the Python flavor of regular expressions.
Lesson 2: Edit Distance#
4th of March
- Discussed material:
- Edit Distance (slides)
- Edit Distance
- Weighted Edit Distance
- Alignment
- The last part of SLP Chapter II
- (we did not go into the bio applications …)
- Intro to Language Modeling (slides)
- The first part of SLP Chapter III
- Edit Distance (slides)
- Supplementary resources:
- subsync: a tool for automatically synchronizing subtitles with video (a nice use-case of using alignment in a not-so-ordinary context)
- Autocomplete using Markov chains: a nice example (along with code in Python) that shows how Language Models can be used to generate "text resembling language" and build a simple ‘autocomplete’ engine.
Resources#
Introduction to Natural Language Processing ---- Jacob Eisenstein
Speech and Language Processing, 3rd Edition -- Daniel Jurafsky, James H Martin
A Primer on Neural Network Models for Natural Language Processing -- Yoav Goldberg
Neural Network Methods for Natural Language Processing -- Yoav Goldberg
Grading#
| Component | Weight |
|---|---|
| Assignments | 50% |
| Project | 50% |
Assignments are available via Google Classroom (the class code is yyozsin) but they are also available in the following repository on GitHub: https://github.com/NaiveNeuron/nlp-exercises
A list of project ideas can be found here.
| Points | Grade |
|---|---|
| (90, inf] | A |
| (80, 90] | B |
| (70, 80] | C |
| (60, 70] | D |
| (50, 60] | E |
| [0, 50) | FX |