Skip to main navigationSkip to main content
The University of Southampton

STAT6120 Data Mining

Module Overview

Data analysis is changing. New sources of data in a wide range of formats contain valuable information, but extracting this information is often challenging using traditional tools. This module introduces modern techniques for mining such data and demonstrates how they may be put into action. Methods for handling structured and unstructured data are discussed, including techniques for the analysis of textual data.

Aims and Objectives

Learning Outcomes

Knowledge and Understanding

Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:

  • storing, searching and analysing structured and unstructured data in a variety of formats.
Subject Specific Intellectual and Research Skills

Having successfully completed this module you will be able to:

  • select and apply methods for the analysis of textual data in Python, and interpret your results.
Transferable and Generic Skills

Having successfully completed this module you will be able to:

  • write technical reports that present results in a clear and reproducible manner.
Subject Specific Practical Skills

Having successfully completed this module you will be able to:

  • use appropriate techniques to obtain data from the web in an ethical manner;
  • manage and extract information from unstructured data.


This module will cover: - Data modalities - Analysing structured and unstructured datasets - Web scraping and Web crawling - Feature extraction - Indexing and information extraction - Analysing unstructured text data; topic modelling

Learning and Teaching

Teaching and learning methods

A variety of methods will be used including lectures and computer workshops in Python, mixed in a 5 day course designed for students on release from the workplace. Students are also expected to read wider than the lecture material as part of their individual study, and to critically appraise different approaches.

Independent Study73
Total study time100

Resources & Reading list

Toby Segaran (2007). Programming Collective Intelligence. 

Han, Jiawei; Kamber, Micheline; Pei, Jian (2012). Data Mining. 


Assessment Strategy

100% Coursework


MethodPercentage contribution
Project  (3000 words) 100%


MethodPercentage contribution
Project 100%

Repeat Information

Repeat type: Internal & External

Share this module Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings