• Graduate Program
    • Why study Business Data Science?
    • Program outline
    • Courses
    • Course registration
    • Admissions
    • Facilities
      • Student Offices
      • Location
      • Housing
      • Student Council
  • Research
  • News
  • Events
    • Events Calendar
    • Events archive
    • Summer School
      • Behavioral Decision Making
      • Deep Learning
      • Econometrics and Data Science Methods for Business and Economics and Finance
      • Foundations of Data Analysis and Machine Learning in Python
      • Introduction to Genome-Wide Data Analysis
      • Reinforcement Learning
      • Tinbergen Institute Summer School Program
  • Summer School
  • Alumni
Home | Courses | Natural Language Processing
Course

Natural Language Processing


  • Teacher(s)
    Bas Donkers, Meike Morren
  • Research field
    Data Science
  • Dates
    Period 5 - May 02, 2022 to Jul 15, 2022
  • Course type
    Core
  • Program year
    First
  • Credits
    4

Course description

Natural Language Processing (NLP) comprises statistical and machine learning tools for automatically analyzing text data to derive useful insights from it. Vast amounts of information are stored in this form, and hence NLP has become one of the essential technologies of the big data age. In this course, core concepts and techniques from the area will be studied, with a focus on methods that are popular in business applications. These include n-gram models, word vectors, sentiment analysis, word embeddings and topic modelling.

This course offers students a theoretically informed understanding of NLP. It aims at broadening the knowledge of the methods involved in NLP, as well as a hands-on experience with the steps that need to be taken in an NLP project. We focus on three aspects:

a) to create deep(er) understanding of the main methods in NLP (n-gram, lexicon approach, word embeddings and other advanced machine learning methods);
b) to obtain an experience to scrape and clean the data yourself;
c) to apply this knowledge and experience in a group assignment which gives you the possibility to show your creativity.

By the end of this course, you will be able to analyse and evaluate NLP approaches. Moreover, you will apply this knowledge and skills in a real-life setting, enabling you to translate and apply theoretical knowledge into practice.

Topics covered:

  1. Information theory, regular expressions and scraping (tokenization, stemming, lemmatization, parsing).
  2. Word vectors and dimension reduction based on bag of words (n-grams)
  3. Sentiment analysis (lexicon-based vs model-based)
  4. Word embeddings (Word2Vec, GloVe, BERT)
  5. Topic models (LDA)

Course literature

    The following list of mandatory readings (presented in alphabetical order) are considered essential for your learning experience. These articles are also part of the exam material. Changes in the reading list will be communicated on CANVAS. Papers marked with ** are obligated to discuss in Feedback Fruits.
    Selected papers, per week: Week 2
    - Hu, M., & Liu, B. (2004, July). Mining opinion features in customer reviews. In AAAI (Vol. 4, No. 4, pp. 755- 760).**
    - Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics.**
    Week 3
    - Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine learning research, 3(Jan), 993-1022.**
    - Blei, David M, & John D Lafferty (2007). A correlated topic model of science. Annals of applied statistics. 1(1) 17-35.  Büschken, J., & Allenby, G. M. (2016). Sentence-based text analysis for customer reviews. Marketing Science, 35(6), 953-975. **
    Week 4
    - Mikolov, T.,Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111- 3119).**
    - Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).**
    - Rong, X. (2014). Word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.
    - Shi, T., & Liu, Z. (2014). Linking GloVe with word2vec. arXiv preprint arXiv:1411.5595. Week 5  Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).**
    - Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.**
    Week 6
    - Shen, D., Wang, G., Wang, W., Min, M. R., Su, Q., Zhang, Y., ... & Carin, L. (2018). Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms. arXiv preprint arXiv:1805.09843. **
    Books:
    - Jurafsky, D., & Martin, J. H. (2014). Speech and language processing (Vol. 3). London: Pearson.
    - Manning, C. D., Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT press.