Natural Language Processing: Projects
The 2019 episode at Faculty of Mathematics, Physics and Informatics of Comenius University
This site lists various ideas that could be explored as possible projects as part of the NLP course. The list is far from exhaustive, but represents (current) course instructor('s/s) interest.
All of the ideas assume the final project will contain some "novelty bits", either in the explored idea itself, its execution, practical usability or the underlying dataset.
Creating a new dataset for any of the NLP tasks listed below (or any other, really) is a huge plus.
Language Modeling
Can we build a state-of-the-art language model for Slovak, using some of the new tricks (https://aclweb.org/anthology/P18-1031), which can then be used in all the other tasks?
Text Classification
Pick any interesting text dataset, the text of which can be classified into various categories and try a couple of methods.
- Language identification
- Sentiment analysis of Anketa comments
- Detection of inappropriate comments in Anketa
- Adaptation of Label-wise attention to other non-Twitter data
- Applications of Active Learning in context of classification
Summarization
Can we build a Slovak summarization dataset that would allow us to use the extractive approach to summarize (say) news articles?
Can we use the TextRank algorithm to perform extractive summarization, or generate a list of keywords for a given body of text?
Portmanteau Generation
Given words A and B, can we create their portmanteau C?
A | B | Portmanteau |
---|---|---|
beef | buffalo | beefalo |
sheep | people | sheeple |
breakfast | lunch | brunch |
frozen | yogurt | froyo |
parachute | trooper | paratrooper |
emotion | icon | emoticon |
Or better yet, given the words A and B can we create a portmanteau C that would not directly feature A or B but would still be related? Such as for instance
A | B | Portmanteau |
---|---|---|
angry | Mozart | scaria (scary / aria) |
Here is a quick demo of such approach, which requires a pretty complex setup and external tools, such as the CMU Pronouncing Dictionary, which makes it pretty difficult to
Your own idea!
Check out the sites below, find a NLP task you'd like and see if you can make an interesting project out of it!