Reinforcement Learning (will be taught in 2022-2023)
DatesPeriod 3 - Jan 03, 2022 to Feb 25, 2022
Reinforcement Learning course focuses on using machine learning methods to model and solve problems relevant to management science problems – in particular, those problems involving machines that autonomously make decisions on the behalf of the modeler, as in online settings.
The course is based mainly on reinforcement learning (when we model states and transitions) and multi-armed bandits (when states are not modelled). We will focus on the design, solution, and implementation of learning methods for sequential decision-making under uncertainty. Sequential decision problems involve a trade-off between exploitation (acting on the information already collected) and exploration (gathering more information). These problems arise in many important domains, ranging from online advertising, clinical trials, website optimization, marketing campaign and revenue management.
1. Reinforcement Learning and multi-armed bandits: introduction and foundations
2. Stochastic Bandits and Regret Analysis
3. Optimality and lower bounds. The UCB algorithm
4. Thompson Sampling
5. Contextual Bandits, Adversarial Bandits and Bayesian bandits (e.g., Gittins and DAI)
6. MDP and Model-based State-based RL algorithms
7. Reinforcement Learning and Deep Learning
The following list of mandatory readings (presented in alphabetical order) are considered essential for your learning experience. These articles are also part of the exam material. Changes in the reading list will be communicated on CANVAS.
- Bandit Algorithms, by Tor Lattimore and Csaba Szepesvari, 2021.
- Multi-armed bandit allocation indices. John Gittins, Kevin Glazebrook, and Richard Weber, 2011.
- Optimal Learning, by Warren B. Powell and Ilya O. Ryzhov, Wiley, 2012.
- Reinforcement Learning, by Richard Sutton and Andrew Barto, 2018.
Selected papers, including:
- Hauser, J.; Liberali, G. and Urban, G. 2014. Website morphing 2.0: Switching costs, partial exposure, random exit, and when to morph. Management Science 60 (6): 1594–1616
- Hauser, J.; Urban, G.; Liberali, G. and Braun, M. Website Morphing. Marketing Science 28 (2): 202-223
- Liberali, G. and Ferecatu, A. 2022. Morphing for Consumer Dynamics: Bandits meet HMM. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3495518
- Schwartz, E.; Bradlow, E. and Fader, P.S. 2017. Consumer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4)
- Scott, S.L. 2010. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models Business and Industry 26 (6): 639–658.
- Slivkins ,A. 2019. Introduction to Multi-Armed Bandits, Foundation and Trends in Machine Learning, 12 (1-2) 1-286.