Skip to main navigationSkip to main content
The University of Southampton

COMP6247 Reinforcement and Online Learning

Module Overview

Aims and Objectives

Learning Outcomes

Knowledge and Understanding

Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:

  • Underlying mathematical and algorithmic principles of reinforcement and online learning
  • The key factors that have made reinforcement and on-line learning successful for various applications
Subject Specific Intellectual and Research Skills

Having successfully completed this module you will be able to:

  • Critical appraisal of recent scientific literature in reinforcement and online learning
  • Critically appraise the merits and shortcomings of model architectures on specific problems
Subject Specific Practical Skills

Having successfully completed this module you will be able to:

  • Apply existing reinforcement and on-line learning models to real applications
  • Gain facility in working with reinforcement and on-line learning algorithms in order to create and evaluate their performance and applicability in different application domains


Classical Reinforcement • TD learning • Q learning • State Space Models • Example: TD-Gammon On-line Learning • Regret minimisation • Stochastic vs. adversarial • Full information, semi-bandit, and bandit feedback Monte Carlo Tree Search (MCTS) Applications • AlphaZero: combining MCTS, reinforcement learning and deep learning • Hyper-parameter search in deep learning with bandit theory • Playing no-limit poker with counterfactual regret minimisation

Learning and Teaching

Teaching and learning methods

Lectures and labs

Specialist Laboratory 20
Wider reading or practice46
Completion of assessment task60
Total study time150

Resources & Reading list

Csaba Szepesvari (2010). Algorithms for Reinforcement Learning. 

Richard Sutton and Andrew Barto (2017).  Reinforcement Learning: An Introduction. 



MethodPercentage contribution
Continuous Assessment 100%


MethodPercentage contribution
Set Task 100%


MethodPercentage contribution
Set Task 100%

Repeat Information

Repeat type: Internal & External

Linked modules

Pre-requisite: COMP3206 or COMP3223 or COMP6229 or COMP6245

Share this module Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings