Natural Language Processing

Natural Language Processing

  • Teacher(s)
    Bas Donkers, Meike Morren
  • Research field
    Data Science
  • Dates
    Period 5 - May 06, 2024 to Jul 05, 2024
  • Course type
  • Program year
  • Credits

Course description

Natural Language Processing (NLP) comprises statistical and machine learning tools for automatically analyzing text data to derive useful insights from it. Vast amounts of information are stored in this form, and hence NLP has become one of the essential technologies of the big data age. In this course, core concepts and techniques from the area will be studied, with a focus on methods that are popular in business applications. These include n-gram models, word vectors, sentiment analysis, word embeddings and topic modelling.

This course offers students a theoretically informed understanding of NLP. It aims at broadening the knowledge of the methods involved in NLP, as well as a hands-on experience with the steps that need to be taken in an NLP project. We focus on three aspects:

a) to create deep(er) understanding of the main methods in NLP (n-gram, lexicon approach, word embeddings and other advanced machine learning methods);
b) to obtain an experience to scrape and clean the data yourself;
c) to apply this knowledge and experience in a group assignment which gives you the possibility to show your creativity.

By the end of this course, you will be able to analyse and evaluate NLP approaches. Moreover, you will apply this knowledge and skills in a real-life setting, enabling you to translate and apply theoretical knowledge into practice.

Topics covered:

  1. Information theory, regular expressions and scraping (tokenization, stemming, lemmatization, parsing).
  2. Word vectors and dimension reduction based on bag of words (n-grams)
  3. Sentiment analysis (lexicon-based vs model-based)
  4. Word embeddings (Word2Vec, GloVe, BERT)
  5. Topic models (LDA)

Course literature

Jurafsky, D., & Martin, J. H. (2022). Speech and language processing (Vol. 3). London: Pearson.