The 2025 episode at Faculty of Mathematics, Physics and Informatics of Comenius University
This site lists various ideas that could be explored as possible projects as part of the NLP course. The list is far from exhaustive, but represents (current) course instructor(’s/s) interest.
All of the ideas assume the final project will contain some "novelty bits", either in the explored idea itself, its execution, practical usability or the underlying dataset.
Creating a new dataset for any of the NLP tasks listed below (or any other, really) is a huge plus.
Shared tasks#
Shared tasks are essentially "academic Kaggle": you get a task, some data and produce a model that tries to do well on it. During the evaluation period, you normally produce a prediction on the test set. It’s a relatively straightforward way of going from a task to some solution, while not having to bother with the difficult part of finding an appropriate dataset. Furhter, there are highly likely other people working on the same task, so it’s really a bit of a "competition" (although that’s really not what it’s about and why it’s done).
A few examples:
- Subjectivity, Fact-Checking, Claim Extraction & Normalization, and (Web) Retrieval (https://checkthat.gitlab.io/clef2025/)
- Multilingual Text Detoxification Given a toxic piece of text, re-write it in a non-toxic way while saving the main content as much as possible. We did manage to get some pretty nice results on the last year’s version of this task! (https://pan.webis.de/clef25/pan25-web/text-detoxification.html)
- Generative AI Detection Given a generated and a human-written source document, identify the passages of reused text between them. (https://pan.webis.de/clef25/pan25-web/generated-content-analysis.html)
- Power Identification in Parliamentary Debates Given a parliamentary speech in one of several languages, identify whether the speaker’s party is currently governing or in opposition. (https://touche.webis.de/clef25/touche25-web/ideology-and-power-identification-in-parliamentary-debates.html)
- Job Title-Based Skill Prediction Given a job title, retrieve relevant skills from a list of candidates (https://talentclef.github.io/talentclef/docs/)
- Automatic Humor Analysis (https://www.joker-project.com/clef-2025/tasks)
Or any other CLEF 2025 tasks (https://clef2025.clef-initiative.eu/index.php?page=Pages/labs.html)!
Project MIMEDIS#
The project’s aim is to "study the impact of media discourse on attitudes towards migration, migrants and migration policy in Slovakia". As such, there are many classification tasks that can be explored in that regard.
You can find more about the project at https://cogsci.fmph.uniba.sk/MIMEDIS/index.html
Your own idea!#
Feel free to come up with an idea on your own -- if you are working on something NLP-related for your thesis, that would be a good candidate. But in general, I’d be happy to talk about any NLP-related idea you may have.
Alternatively feel free to check out the sites below, find a NLP task you find interesting and see if you can make an interesting project out of it!