Natural Language Processing 2020
The 2020 episode at Faculty of Mathematics, Physics and Informatics of Comenius University
- Lectures (I-23)
- Wednesday, 9:50 - 11:30 (voluntary)
- Labs (H-6)
- Tuesday, 11:30 - 13:00 (voluntary)
Lectures
Lesson 1: Intro
28th of February
- Discussed material:
- Course information (see below)
- Basic Text Processing (slides)
- Regular Expressions
- Tokenization
- Text normalization
- Basically the first part of SLP Chapter II
- Supplementary resources:
- Eliza Bot Demo: one of the most famous use cases of regular expressions. It is really worth trying out -- you may end up having some surprisingly good conversations.
- Unix for poets: a nice 25 pages worth of examples on how to process text on the Unix command line. Here is a shorter version by the authors of the SLP book.
- Scriptio continua: the reason why English also nearly ended up without word and sentence separators.
- Regex101: a very nice application for working with (and specifically trying out) various regular expression. Note that the link goes to the Python flavor of regular expressions.
Lesson 2: Edit Distance
4th of March
- Discussed material:
- Edit Distance (slides)
- Edit Distance
- Weighted Edit Distance
- Alignment
- The last part of SLP Chapter II
- (we did not go into the bio applications ...)
- Intro to Language Modeling (slides)
- The first part of SLP Chapter III
- Edit Distance (slides)
- Supplementary resources:
- subsync: a tool for automatically synchronizing subtitles with video (a nice use-case of using alignment in a not-so-ordinary context)
- Autocomplete using Markov chains: a nice example (along with code in Python) that shows how Language Models can be used to generate "text resembling language" and build a simple 'autocomplete' engine.
Resources
Introduction to Natural Language Processing -- -- Jacob Eisenstein
Speech and Language Processing, 3rd Edition -- Daniel Jurafsky, James H Martin
A Primer on Neural Network Models for Natural Language Processing -- Yoav Goldberg
Neural Network Methods for Natural Language Processing -- Yoav Goldberg
Grading
Assignments: | 50% |
Project: | 50% |
Assignments are available via Google Classroom (the class code is yyozsin) but they are also available in the following repository on GitHub: https://github.com/NaiveNeuron/nlp-exercises
A list of project ideas can be found here.
Points | Grade |
---|---|
(90, inf] | A |
(80, 90] | B |
(70, 80] | C |
(60, 70] | D |
(50, 60] | E |
[0, 50) | FX |