Natural Language Processing 2026

The 2026 episode at Faculty of Mathematics, Physics and Informatics of Comenius University

Lectures M-V: Friday, 08:10 - 09:50 (voluntary)
Labs H-3: Friday, 09:50 - 11:30 (voluntary)

Course description#

This course tries to go deeper into how we can represent human language (say English or Slovak) in a way that can be processed by computational systems (a.k.a. computer programs), and how this representation can then be used to do interesting things, such as

translation
question answering
grammatical error correction
summarization
text (like poems or song lyrics) generation
and much more…

All of this combined is part of a field called Natural Language Processing, which ended up being the name of the course.

Feel free to check out the previous year’s class webpage as well!

Lectures#

Lecture I: The NLP Wohoo#

What can language models do – and how did we get here?

Discussed material:

The intro lecture (course structure, grading, projects)
Basic Text Processing (slides)
- Eliza Bot Demo: one of the first chatbots ever built (1966)
- The Turing Test and its limits: ELIZA fooled human interrogators in 27% of games, outperforming all GPT-3.5 witnesses and several GPT-4 ones (Does GPT-4 Pass the Turing Test?) – modern LLMs are too helpful, friendly, and verbose to pass, while ELIZA’s conservative, evasive responses avoid giving itself away; some interrogators even thought it was “too bad” to be an AI and assumed it was a human being uncooperative
- Animated LLM: a visual demo of how language models generate text token by token
- Tokenization: how text gets split into tokens before a model ever sees it
  - Tiktokenizer: interactive demo for exploring how tokenization works in practice
A whirlwind tour of NLP: from ELIZA (1966) to GPT-5 (2026)
What this course will cover and how it all fits together

Supplementary resources:

Eliza Bot Demo: one of the most famous early NLP systems. Try having a conversation – you may be surprised.
Unix for poets: 25 pages of examples on how to process text on the Unix command line. Here is a shorter version by the authors of the SLP book.
Scriptio continua: the reason why English also nearly ended up without word and sentence separators.
Intro to Large Language Models by Andrej Karpathy: probably the best one-hour introduction to the current state of LLMs.
Arena Leaderboard: arguably the preference benchmark of choice for LLMs – built on real human pairwise comparisons rather than test sets that would be fixed. That being said, it has real limitations: it favors chatty, confident responses ("You are absolutely right!"), can be susceptible to what we call style bias (e.g. putting emojis everywhere might be all that’s necessary to win), and the user population may not be representative of all use cases (we do not really know, who votes). Still worth knowing about though!

Lecture II: Edit Distance and Language Modeling#

Discussed material:

Edit Distance (slides, covered briefly)
- Edit distance and the Levenshtein algorithm
- Weighted edit distance
- Alignment
Language Modeling (slides)
- The formal setup: a language model as a probability distribution over sequences
- N-gram language models: count-based probability estimation
- Perplexity: how surprised is the model by real text?
- The sparse data problem, smoothing, backoff and interpolation

Supplementary resources:

Language Modeling (Lena Voita): the best visual companion to the J&M material — covers the same topics with great diagrams and an interactive demo
Google Ngram Viewer: a nice way of visualizing the rate of use of n-grams in books written in various languages
kenlm: an open-source n-gram LM toolkit — best in class for speed and memory if you ever need one in practice

Lecture III: Text Classification and Word Embeddings#

From bag of words to Word2Vec — and what GPT-5.4 just dropped

Discussed material:

Introducing GPT-5.4: OpenAI’s latest model dropped the day before class — we discussed what’s new (1M-token context, native computer use, …), Simon Willison’s pelican benchmark (still trying to get that perfect SVG of a pelican riding a bicycle), and how Twitter reacted
Text Classification (Princeton COS 484 slides)
- Bag of Words representation
- Naive Bayes classifier
- Logistic regression for text
Word Embeddings (Princeton COS 484 slides)
- From sparse to dense representations
- Word2Vec: CBOW and Skip-Gram
- Negative sampling
Word Embeddings — word space visualization and MT (slides)
- Visualization of vector word spaces
- Word embeddings for machine translation

Supplementary resources:

Word Embeddings (Lena Voita): great visuals and the Semantic Space Surfer interactive demo — forces you to “think like a word embedding”
The Illustrated Word2vec (Jay Alammar): a visual walkthrough of how Word2Vec works, from the same author as The Illustrated Transformer
Text Classification (Lena Voita): visual companion covering Naive Bayes, logistic regression, and neural classifiers
TensorFlow Embedding Projector: interactive 3D visualization of word embeddings — great for building intuition about what these vector spaces actually look like
Claude Cycles (Donald Knuth): Knuth analyzes code that Claude wrote for directed Hamiltonian cycle decompositions, proves it works in general, and seems genuinely impressed — a sign that even the most demanding computer scientists are taking AI agents seriously
Wikipedia: Signs of AI Writing: while text classification has traditionally been used for authorship attribution, it now also serves to detect AI-generated text — Wikipedia’s crowd-sourced list of “LLM smells” (overuse of em dashes, hedging phrases like “it’s important to note”, present-participle padding) is a fascinating practical catalog

Lecture IV: Sequence Models — RNNs to Attention#

Why order matters, how we learned to remember, and the mechanism that changed everything

Discussed material:

Language Models and RNNs (Stanford CS224N W2025 slides)
- Recurrent Neural Networks (RNNs)
- Back Propagation Through Time (BPTT)
- Vanishing and Exploding Gradient problem
LSTMs and Fancy RNNs (Stanford CS224N W2025 slides)
- Long Short Term Memory (LSTM)
- Gated Recurrent Units (GRUs)
- Bidirectional and multi-layer RNNs
Seq2Seq + Attention (Princeton COS 484 slides)
- Encoder-decoder architecture and the information bottleneck
- Bahdanau (additive) attention
- Attention heatmaps and alignment Supplementary resources:

The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy: a classical blog post on what RNNs are capable of
Understanding LSTM Networks by Chris Olah: does a great job introducing LSTMs with the “circuit board” metaphor
Most Cited Neural Network Papers by Jürgen Schmidhuber: a curated list of the most cited neural network papers, maintained by the other co-inventor of LSTM — who would very much like you to know that
Sequence to Sequence and Attention (Lena Voita): one of the best written lecture notes on the topic, covering both Bahdanau and Luong attention with interactive visuals
Visualizing Neural Machine Translation (Jay Alammar): animated step-by-step walkthrough of how attention works in seq2seq models
A Visual Guide to Mamba and State Space Models (Maarten Grootendorst): the closest thing to “The Illustrated Mamba” — 50+ diagrams, deliberately minimizes equations in favor of intuition
xLSTM: Extended Long Short-Term Memory (Beck, Hochreiter et al., NeurIPS 2024): the co-inventor of LSTM modernizes his own 1997 architecture with exponential gating and matrix memory
On the Tradeoffs of SSMs and Transformers (Goomba Lab): excellent conceptual framing — “Transformers as databases, SSMs as brains”
The End of Transformers? (Fichtl et al., 2025): comprehensive survey of sub-quadratic architectures — SSMs, xLSTM, RWKV, Griffin, and hybrid models

More lectures coming soon…

Resources#

Introduction to Natural Language Processing by Jacob Eisenstein

Speech and Language Processing, 3rd Edition by Daniel Jurafsky, James H Martin

A Primer on Neural Network Models for Natural Language Processing by Yoav Goldberg

Neural Network Methods for Natural Language Processing by Yoav Goldberg

Similar Courses Elsewhere#

There are more than a few similar (and often times even better) courses out there. Here is a sample:

Grading#

Component	Weight
Assignments	50%
Project	50%

Assignments are available via Google Classroom (the class code is hcexzhmi -- feel free to use the following invite link ) but they are also available in the following repository on GitHub: https://github.com/NaiveNeuron/nlp-exercises

Check out the Project Ideas for 2026!

Points	Grade
(90, inf]	A
(80, 90]	B
(70, 80]	C
(60, 70]	D
(50, 60]	E
[0, 50)	FX