Skip to main navigationSkip to main content
The University of Southampton

COMP6247 Reinforcement and Online Learning

Module Overview

Aims and Objectives

Module Aims

To gain an in-depth theoretical and practical understanding of reinforcement and on-line learning, and their applications.

Learning Outcomes

Knowledge and Understanding

Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:

  • Underlying mathematical and algorithmic principles of reinforcement and online learning
  • The key factors that have made reinforcement and on-line learning successful for various applications
Subject Specific Intellectual and Research Skills

Having successfully completed this module you will be able to:

  • Critical appraisal of recent scientific literature in reinforcement and online learning
  • Critically appraise the merits and shortcomings of model architectures on specific problems
Subject Specific Practical Skills

Having successfully completed this module you will be able to:

  • Apply existing reinforcement and on-line learning models to real applications
  • Gain facility in working with reinforcement and on-line learning algorithms in order to create and evaluate their performance and applicability in different application domains


Classical Reinforcement • TD learning • Q learning • State Space Models • Example: TD-Gammon On-line Learning • Regret minimisation • Stochastic vs. adversarial • Full information, semi-bandit, and bandit feedback Monte Carlo Tree Search (MCTS) Applications • AlphaZero: combining MCTS, reinforcement learning and deep learning • Hyper-parameter search in deep learning with bandit theory • Playing no-limit poker with counterfactual regret minimisation

Learning and Teaching

Teaching and learning methods

Lectures and labs

Specialist Laboratory20
Wider reading or practice46
Completion of assessment task60
Total study time150

Resources & Reading list

Csaba Szepesvari (2010). Algorithms for Reinforcement Learning. 

Richard Sutton and Andrew Barto (2017).  Reinforcement Learning: An Introduction. 



MethodPercentage contribution
Final project 40%
In-class task 20%
Lab work 40%


MethodPercentage contribution
Assignment 100%


MethodPercentage contribution
Assignment 100%

Repeat Information

Repeat type: Internal & External

Linked modules

Pre-requisite: COMP3206 or COMP3223 or COMP6229 or COMP6245

Share this module Share this on Facebook Share this on Google+ Share this on Twitter Share this on Weibo

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.