Skip to main navigationSkip to main content
The University of Southampton

STAT6120 Data Mining

Module Overview

Data analysis is changing. New sources of data in a wide range of formats contain valuable information, but extracting this information is often challenging using traditional tools. This module introduces modern techniques for mining such data and demonstrates how they may be put into action. Methods for handling structured and unstructured data are discussed, including techniques for the analysis of textual data.

Aims and Objectives

Learning Outcomes

Knowledge and Understanding

Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:

  • storing, searching and analysing structured and unstructured data in a variety of formats.
Subject Specific Intellectual and Research Skills

Having successfully completed this module you will be able to:

  • select and apply methods for the analysis of textual data in Python, and interpret your results.
Transferable and Generic Skills

Having successfully completed this module you will be able to:

  • write technical reports that present results in a clear and reproducible manner.
Subject Specific Practical Skills

Having successfully completed this module you will be able to:

  • use appropriate techniques to obtain data from the web in an ethical manner;
  • manage and extract information from unstructured data.


This module will cover: - Data modalities - Analysing structured and unstructured datasets - Web scraping and Web crawling - Feature extraction - Indexing and information extraction - Analysing unstructured text data; topic modelling

Learning and Teaching

Teaching and learning methods

Depending on feasibility, teaching may be delivered face to face intensively over a week, or online using a mixture of synchronous and asynchronous online methods, which may include lectures, discussion boards, workshop activities, exercises, and videos. A range of resources will also be provided for further self-directed study.

Independent Study73
Total study time100

Resources & Reading list

Toby Segaran (2007). Programming Collective Intelligence. 

Han, Jiawei; Kamber, Micheline; Pei, Jian (2012). Data Mining. 


Assessment Strategy

100% Coursework


MethodPercentage contribution
Assignment  (3000 words) 100%


MethodPercentage contribution
Assignment 100%

Repeat Information

Repeat type: Internal & External

Share this module Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings