• Graduate Program
    • Why study Business Data Science?
    • Program outline
    • Courses
    • Course registration
    • Admissions
    • Facilities
      • Student Offices
      • Location
      • Housing
      • Student Council
  • Research
  • News
  • Events
    • Events Calendar
    • Events archive
    • Summer School
      • Deep Learning
      • Tinbergen Institute Summer School Program
  • Alumni
Home | Courses | Reinforcement Learning (will be taught in 2022-2023)

Reinforcement Learning (will be taught in 2022-2023)

  • Teacher(s)
    Gui Liberali
  • Research field
  • Dates
    Period 3 - Jan 03, 2022 to Feb 25, 2022
  • Course type
  • Program year
  • Credits

Course description

External participants are invited to register for this course. (PhD) students register here, others register here. More information on course registration and course fees can be found here.

Reinforcement Learning course focuses on using machine learning methods to model and solve problems relevant to management science problems – in particular, those problems involving machines that autonomously make decisions on the behalf of the modeler, as in online settings.

The course is based mainly on reinforcement learning (when we model states and transitions) and multi-armed bandits (when states are not modelled). We will focus on the design, solution, and implementation of learning methods for sequential decision-making under uncertainty. Sequential decision problems involve a trade-off between exploitation (acting on the information already collected) and exploration (gathering more information). These problems arise in many important domains, ranging from online advertising, clinical trials, website optimization, marketing campaign and revenue management.

Course content:

1. Reinforcement Learning and multi-armed bandits: introduction and foundations

2. Stochastic Bandits and Regret Analysis

3. Optimality and lower bounds. The UCB algorithm

4. Thompson Sampling

5. Contextual Bandits, Adversarial Bandits and Bayesian bandits (e.g., Gittins and DAI)

6. MDP and Model-based State-based RL algorithms

7. Reinforcement Learning and Deep Learning


Business Foundations, Programming Basics, Mathematics, Statistics, Decision Theory for Business, Econometrics.

Course literature

The following list of mandatory readings (presented in alphabetical order) are considered essential for your learning experience. These articles are also part of the exam material. Changes in the reading list will be communicated on CANVAS.


  • Bandit Algorithms, by Tor Lattimore and Csaba Szepesvari, 2021.
  • Multi-armed bandit allocation indices. John Gittins, Kevin Glazebrook, and Richard Weber, 2011.
  • Optimal Learning, by Warren B. Powell and Ilya O. Ryzhov, Wiley, 2012.
  • Reinforcement Learning, by Richard Sutton and Andrew Barto, 2018.

Selected papers, including:

  • Hauser, J.; Liberali, G. and Urban, G. 2014. Website morphing 2.0: Switching costs, partial exposure, random exit, and when to morph. Management Science 60 (6): 1594–1616
  • Hauser, J.; Urban, G.; Liberali, G. and Braun, M. Website Morphing. Marketing Science 28 (2): 202-223
  • Liberali, G. and Ferecatu, A. 2022. Morphing for Consumer Dynamics: Bandits meet HMM. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3495518
  • Schwartz, E.; Bradlow, E. and Fader, P.S. 2017. Consumer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4)
  • Scott, S.L. 2010. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models Business and Industry 26 (6): 639–658.
  • Slivkins ,A. 2019. Introduction to Multi-Armed Bandits, Foundation and Trends in Machine Learning, 12 (1-2) 1-286.