Course

Reinforcement Learning

Teacher(s)

Gui Liberali
Research field

-
Dates

Period 3 - Jan 08, 2024 to Mar 01, 2024

Course type

Field
Program year

Second
Credits

3

Course description

External participants are invited to register for this course. (PhD) students register here, others register here. More information on course registration and course fees can be found here.

This Reinforcement Learning course focuses on using machine learning methods to model and solve problems relevant to management science problems – in particular, those problems involving machines that autonomously make decisions on the behalf of the modeler, as in online settings.

The course is based mainly on reinforcement learning (when we model states and transitions) and multi-armed bandits (when states are not modelled). We will focus on the design, solution, and implementation of learning methods for sequential decision-making under uncertainty. Sequential decision problems involve a trade-off between exploitation (acting on the information already collected) and exploration (gathering more information). These problems arise in many important domains, ranging from online advertising, clinical trials, website optimization, marketing campaign and revenue management.

Course content:

1. Reinforcement Learning and multi-armed bandits: introduction and foundations

2. Stochastic Bandits and Regret Analysis. Optimality and lower bounds

3. Adversarial methods. Offline evaluation

4. Thompson Sampling The UCB algorithm

5. Contextual Bandits, Adversarial Bandits and Bayesian bandits (e.g., Gittins and DAI)

6. MDP and Model-based State-based tabular methods: Dynamic Programming, Monte Carlo and Temporal Difference

@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-536870145 1107305727 0 0 415 0;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-536859905 -1073732485 9 0 511 0;}p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman",serif; mso-fareast-font-family:"Times New Roman";}p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph {mso-style-priority:34; mso-style-unhide:no; mso-style-qformat:yes; margin-top:0cm; margin-right:0cm; margin-bottom:0cm; margin-left:36.0pt; mso-add-space:auto; line-height:12.0pt; mso-pagination:widow-orphan; font-size:9.0pt; font-family:"Arial",sans-serif; mso-fareast-font-family:"Times New Roman"; mso-ansi-language:EN-GB; mso-fareast-language:EN-US; mso-bidi-language:EN-US;}p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst {mso-style-priority:34; mso-style-unhide:no; mso-style-qformat:yes; mso-style-type:export-only; margin-top:0cm; margin-right:0cm; margin-bottom:0cm; margin-left:36.0pt; mso-add-space:auto; line-height:12.0pt; mso-pagination:widow-orphan; font-size:9.0pt; font-family:"Arial",sans-serif; mso-fareast-font-family:"Times New Roman"; mso-ansi-language:EN-GB; mso-fareast-language:EN-US; mso-bidi-language:EN-US;}p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle {mso-style-priority:34; mso-style-unhide:no; mso-style-qformat:yes; mso-style-type:export-only; margin-top:0cm; margin-right:0cm; margin-bottom:0cm; margin-left:36.0pt; mso-add-space:auto; line-height:12.0pt; mso-pagination:widow-orphan; font-size:9.0pt; font-family:"Arial",sans-serif; mso-fareast-font-family:"Times New Roman"; mso-ansi-language:EN-GB; mso-fareast-language:EN-US; mso-bidi-language:EN-US;}p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast {mso-style-priority:34; mso-style-unhide:no; mso-style-qformat:yes; mso-style-type:export-only; margin-top:0cm; margin-right:0cm; margin-bottom:0cm; margin-left:36.0pt; mso-add-space:auto; line-height:12.0pt; mso-pagination:widow-orphan; font-size:9.0pt; font-family:"Arial",sans-serif; mso-fareast-font-family:"Times New Roman"; mso-ansi-language:EN-GB; mso-fareast-language:EN-US; mso-bidi-language:EN-US;}.MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-size:10.0pt; mso-ansi-font-size:10.0pt; mso-bidi-font-size:10.0pt; mso-ascii-font-family:Calibri; mso-hansi-font-family:Calibri; mso-font-kerning:0pt; mso-ligatures:none; mso-ansi-language:EN-US; mso-fareast-language:EN-US;}div.WordSection1 {page:WordSection1;}

Prerequisites

Business Foundations, Programming Basics, Mathematics, Statistics, Decision Theory for Business, Econometrics.

Course literature

The following list of mandatory readings (presented in alphabetical order) are considered essential for your learning experience. Changes in the reading list will be communicated on CANVAS.

Books:

Bandit Algorithms, by Tor Lattimore and Csaba Szepesvari, 2021.
Multi-armed bandit allocation indices. John Gittins, Kevin Glazebrook, and Richard Weber, 2011.
Reinforcement Learning, by Richard Sutton and Andrew Barto, 2018.

Selected papers, including:

Aramayo, N; Schiappacasse, M. and Goic, M. (2022) A Multiarmed Bandit Approach for House Ads Recommendations. Marketing Science. doi.org/10.1287/mksc.2022.1378
Hauser, J.; Liberali, G. and Urban, G. (2014). Website morphing 2.0: Switching costs, partial exposure, random exit, and when to morph. Management Science 60 (6): 1594–1616
Li L, Chu W, Langford J, Wang X (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. Proc. 4th ACM I. Conf. Web Search Data Mining (ACM, New York), 297–306.
Liberali, G. and Ferecatu, A. (2022). Morphing for Consumer Dynamics: Bandits meet HMM. . Marketing Science, 41(4): 769:794.
Russo, D;, Van Roy, B.; Kazerouni, A.,; Osband, I and Wen, Z. (2018) A Tutorial on Thompson Sampling, Foundations and Trends in Machine Learning: Vol 11, No.1, pp 1=96.
Slivkins ,A. (2019). Introduction to Multi-Armed Bandits, Foundation and Trends in Machine Learning, 12 (1-2) 1-286.