Marek Šuppa

Natural Language Processing 2021

Last modified at: 2021-03-05 12:00:00+01:00

The 2021 episode at Faculty of Mathematics, Physics and Informatics of Comenius University

Lectures (virtual): Friday, 9:00 - 10:20 (voluntary)
Labs (virtual): Friday, 10:20 - whenever (voluntary)

Course description
Lectures
Resources
Similar Courses Elsewhere
Grading

Course description

This course tries to go deeper into how we can represent human language (say English or Slovak) in a way that can be processed by computational systems (a.k.a. computer programs), and how this representation can then be used to do interesting things, such as

question answering
translation
grammatical error correction
summarization
text (like poems or song lyrics) generation
and much more...

All of this combined is part of a field called Natural Language Processing, which ended up being the name of the course.

Lectures

Intro (Demos)

A few cool things we'll (probably) learn more about in this class:

Extracting relevant keywords from text
Translating text from (say) Slovak to English
Answering questions, given a specific paragraph of text, an image or even a table
Automatically autocomplete comprehensible text
Generate interesting text (like song lyrics)
Summarizing text (i.e. give it a paragraph and get a single sentence back)

Lecture I: Text Processing

Discussed material:

Basic Text Processing (slides)
- Regular Expressions
- Tokenization
- Text normalization
Basically the first part of SLP Chapter II

Supplementary resources:

Eliza Bot Demo: one of the most famous use cases of regular expressions. It is really worth trying out -- you may end up having some surprisingly good conversations.
Unix for poets: a nice 25 pages worth of examples on how to process text on the Unix command line. Here is a shorter version by the authors of the SLP book.
Scriptio continua: the reason why English also nearly ended up without word and sentence separators.

Lecture II: Edit Distance

Discussed material:

Edit Distance (slides)
- Edit Distance
- Weighted Edit Distance
- Alignment
- The last part of SLP Chapter II
- (we did not discuss the bio applications ...)

Supplementary resources:

subsync: a tool for automatically synchronizing subtitles with video (a nice use-case of using alignment in a not-so-ordinary context)
Seam carving: although not necessarily NLP related, seam carving is a very nice real-world example of how dynamic programming can still be a useful tool.

Lecture III: Language Modeling with N-grams

Discussed material:

Language modeling
- Estimating N-gram probabilities
- Perplexity and Language Model Evaluation
- Dealing with zeros
- Smoothing, backoff and interpolation
- Most of SLP Chapter III

Supplementary resources:

Google Ngram Viewer -- a nice way of visualizing the rate of use of n-grams in books written in various languages. Check out this quick example for instance. Note that the dataset on which the Ngrams are computed plays a big role -- here is a similar example for coronavirus.
kenlm -- an open-source language modeling toolkit. Probably best in class when it comes to speed and memory efficiency.

Lecture IV: Language Modeling and Word Embeddings

"From N-grams to Word2Vec"

Discussed material:

Language modeling and word embeddings
- Shortcomings of n-gram language models
- Neural language model
- Word2Vec
- CBOW + Skip-Gram
- Visualization of (vector) word spaces

Supplementary resources:

wevi -- visualizes what actually happens when word2vec-style embeddings are being learned in by a (relatively simple) neural network. The great thing about this demo is that it is implemented completely using in-browser technologies (i.e. JavaScript) and hence you get to "see the network close-up".
Embedding Projector -- a nice way of visualizing distributed representations obtained using Word2Vec in 2D and 3D from the TensorFlow project
Semantic Space Surfer -- a bit of a fun spin on word embeddings. While normally we'd like the word embeddings to work "intuitively" (i.e. king should be to man as queen is to woman), we know it's not always like that. In this quick game you're forced to "think like a word embedding" and pick the word that the pre-trained embeddings would have chosen. Not only is this quite fun, it'll also help you deepen your intuition around what's actually going on with word embeddings.

Lecture V: Language Modeling with RNNs

"From Word2Vec to Recurrent Networks"

Discussed material:

Language Modeling and RNNs Part 1

Recurrent Neural Networks (RNNs)

Back Propagation Through Time (BPTT)

Language Modeling and RNNs Part 2

Simple RNNs

Vanishing and Exploding Gradient

Long Short Term Memory (LSTM)

Gated Recurrent Unit (GRU)

Supplementary resources:

SCIgen - An Automatic CS Paper Generator -- exactly what it sounds like.
The Unreasonable Effectiveness of Recurrent Neural Networks -- at this point a classical blog post in Deep Learning blogosphere by Andrej Karpathy on what RNNs are capable of. Still worth checking out.
The unreasonable effectiveness of Character-level Language Models -- a quick "setting-the-record-straight" response to the previous blog post by Yoav Goldberg. Note that it is not entirely critical, as its subtitle is (and why RNNs are still cool). Definitely worth a read.
Unsupervised Sentiment Neuron -- a blogoduction (introduction-via-blog) of OpenAI for their work on training character-level language models on Amazon reviews, which happen to pick up the notion of "sentiment" on their own.
Understanding LSTM Networks -- another classic which does a great job introducing LSTMs from the ground up with nicely done visualizations using the "circuit board" metaphor.

Lecture VI: Spelling correction and text classification

Discussed material:

Spelling Correction and the Noisy Channel
- Spelling Correction task
- Noisy Channel model
- Damerau-Levenshtein edit distance
Text Classification and Naive Bayes
- Text Classification task
- Bag of Words representation
- Naive Bayes classifier
- Classification metrics: accuracy, precision, recall, F score
- Micro vs Macro averaging

Supplementary resources:

How to Write a Spelling Corrector: a classic article by Peter Norvig which describes what it takes to create a simple spelling corrector in practice.

Lecture VII: Sentiment Analysis and fastText

Discussed material:

Sentiment Analysis
- Sentiment Analysis Task
- Sentiment Analysis using Naive Bayes classification
- Lexicon-based approaches
Text Classification with fastText
- Text Classification
- Introduction to fastText
- Simple fastText classification model (see below)
- Combining word embeddings with embedding of n-grams
- Metrics used in context of classification tasks (accuracy / precision / recall)
- Importance of preprocessing

Supplementary resources:

Trump Tweet Bot: being able to estimate sentiment of text can have real-world implications: by assessing what a leader of a large country thinks about a publicly traded company and buying/selling its shares as a result, you can end up with a fairly interesting investment portfolio!
Bag of Tricks for Efficient Text Classification: the paper which has introduced the fastText classifier as a simple competitive baseline (compared to deep learning models), with much more effective training procedure.
Enriching Word Vectors with Subword Information: the paper which presents and interesting approach for dealing with Out Of Vocabulary words that is already present in fastText -- combining word vectors with embeddings of character n-grams.

Lecture VIII: Transformers

Discussed material:

Introduction to Transformers (guest lecture by Anton Osinenko)

The limits of attention
Values, Keys and Queries
Multi Head Attention
Tricks of the Transformer architecture (i.e. positional embeddings)

Supplementary resources:

The Illustrated Transformer

Possibly still the best visual explanation of how the Transformer architecture out there. You may have gotten much of this from the lecture itself but the visuals generally help reinforce the concepts so it is strongly recommended to check this one out.

Transformers from scratch:

If you ever happen to reimplement a Transformer from scratch, I strongly recommend following this tutorial. It assumes the understanding of the basic neural network concepts as well as backpropagation and some working knowledge of PyTorch, but other than that it is a self-contained, zero-to-hero guide that will walk you through the whole process from the smallest building blocks (like self-attention) to going big with the architecture.

Lecture IX: BERT

Discussed material:

Contextual representations via BERT

Word vectors and issues with using them in context-free manner
Representations with Language Models
ELMo: Deep Contextual Word Embeddings
Transformers and Self-Attention
Masked Language Models

Supplementary resources:

The Illustrated BERT:

Although the presentation above is very informative, a visual presentation is usually very useful. This is one of the best ones that you can find online.

Byte Pair Encoding:

Dealing with out of vocabulary words out-of-the-box was one of the big improvements BERT-style pre-trained models made very popular. This article goes into greater detail on how does that happen, what are the limits of this method and how one may go about fixing them.

(I actually recommend you read through the whole article -- it's a very nice introduction to the concept of Attention and sequence-to-sequence tasks in general)

The Dark Secrets of BERT:

It turns out BERT learns interesting things during training. Part of it may be due to its use of self-attention but as this article (and associated paper) shows, there may be some black magic going on.

Lecture X: PoS tagging and NER

Discussed material:

Part of Speech tagging

Parts of Speech

Part of Speech tagging as a NLP problem

Feature-based Part of Speech tagging

Named Entity Recognition

The task of extracting knowledge from text

Finding and Classifying Named Entities

Named Entity Recognition as a Sequence Modeling task

Inference in Sequence Modeling

Supplementary resources:

Penn Treebank tag set: a set of tags used in the Penn Treebank (there are roughly 45 of them -- those who've been doing NLP long enough allegedly know them by heart)
spaCy NER demo and AllenNLP NER demo: quick two demos of industry-strength and former-state-of-the-art NER systems. Notice that the latter one powered by a pretty big neural network is able to correctly pick up a location it almost certainly did not see in the training data.

Lecture XI: Machine Translation

Discussed material:

Machine Translation

History of Machine Translation

Neural Machine Translation approaches

Alignment vs. Attention

Resources

Introduction to Natural Language Processing by Jacob Eisenstein

Speech and Language Processing, 3rd Edition by Daniel Jurafsky, James H Martin

A Primer on Neural Network Models for Natural Language Processing by Yoav Goldberg

Neural Network Methods for Natural Language Processing by Yoav Goldberg

Similar Courses Elsewhere

There are more than a few similar (and often times even better) courses out there. Here is a sample:

Grading

Assignments:	50%
Project:	50%

Assignments are available via Google Classroom (the class code is jsn37az -- feel free to use the following invite link ) but they are also available in the following repository on GitHub: https://github.com/NaiveNeuron/nlp-exercises

A list of project ideas can be found here.

Points	Grade
(90, inf]	A
(80, 90]	B
(70, 80]	C
(60, 70]	D
(50, 60]	E
[0, 50)	FX