Skip to main navigationSkip to main content
The University of Southampton

COMP6237 Data Mining

Module Overview

The challenge of data mining is to transform raw data into useful information and actionable knowledge. Data mining is the computational process of discovering patterns in data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and data management. This course will introduce key concepts in data mining, information extraction and information indexing; including specific algorithms and techniques for feature extraction, clustering, outlier detection, topic modelling and prediction of complex unstructured data sets. By taking this course you will be given a broad view of the general issues surrounding unstructured and semi-structured data and the application of algorithms to such data. At a practical level you will have the chance to explore an assortment of data mining techniques which you will apply to problems involving real-world data.

Aims and Objectives

Learning Outcomes

Knowledge and Understanding

Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:

  • Key concepts, tools and approaches for data mining on complex unstructured data sets (including multimedia mining, Twitter analysis, etc)
  • Natural language processing techniques for extracting features from text
  • The theory behind modern data indexing systems
  • Techniques for modelling and extracting features from non-textual data
  • State-of-the-art data-mining techniques including topic modelling approaches such as LDA, clustering techniques and applications of matrix factorisations
  • Theoretical concepts and the motivations behind different data-mining approaches
Subject Specific Intellectual and Research Skills

Having successfully completed this module you will be able to:

  • Conceptually understand the role of data-mining, together with the mathematical techniques this requires
Subject Specific Practical Skills

Having successfully completed this module you will be able to:

  • Solve real-word data-mining, data-indexing and information extraction tasks


Key concepts: - The importance of data-mining - Real-world applications of data-mining (cyber-security, financial forecasting, trend prediction, etc) - What is unstructured data -- Modalities of data - Underlying techniques -- Inverted indexes -- Matrix factorisation -- Dimensionality reduction Modelling data: - Understanding Text -- Bags of Words -- TF-IDF - Dealing with non-textual data -- Feature extraction techniques -- Bags of features -- Encoding and embedding Modern data indexing at scale - Information retrieval models - Ranking models Unimodal data mining: - Topic modelling (techniques such as LSA, pLSA, LDA, NNMF) - Clustering (Hierarchical agglomerative, Spectral) - Multi-dimensional scaling - Mining graphs and networks (hubs and authorities [PageRank/HITS], spectral methods, etc.) - Finding outliers Multimodal data mining: - Finding independent features (e.g ICA, NNMF) - Finding correlations and making predictions (CL-LSI, classifiers, etc.) - Collaborative filtering and recommender systems

Learning and Teaching

Wider reading or practice46
Preparation for scheduled sessions12
Follow-up work12
Completion of assessment task20
Total study time150

Resources & Reading list

Toby Segaran (2007). Programming Collective Intelligence: Building Smart Web 2.0 Applications. 



MethodPercentage contribution
Continuous Assessment 30%
Final Assessment  70%


MethodPercentage contribution
Set Task 100%


MethodPercentage contribution
Set Task 100%

Repeat Information

Repeat type: Internal & External

Linked modules

Prerequisite: COMP3206 or COMP3222 or COMP3223 or COMP6229 or COMP6245 or COMP6246

Share this module Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings