The University of Southampton

COMP6237 Data Mining

Module Overview

The challenge of data mining is to transform raw data into useful information and actionable knowledge. Data mining is the computational process of discovering patterns in data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and data management. This course will introduce key concepts in data mining, information extraction and information indexing; including specific algorithms and techniques for feature extraction, clustering, outlier detection, topic modelling and prediction of complex unstructured data sets. By taking this course you will be given a broad view of the general issues surrounding unstructured and semi-structured data and the application of algorithms to such data. At a practical level you will have the chance to explore an assortment of data mining techniques which you will apply to problems involving real-world data.

Aims and Objectives

Module Aims

To explore the role of data mining in solving real-world problems

Learning Outcomes

Knowledge and Understanding

Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:

  • Key concepts, tools and approaches for data mining on complex unstructured data sets (including multimedia mining, Twitter analysis, etc)
  • Natural language processing techniques for extracting features from text
  • The theory behind modern data indexing systems
  • Techniques for modelling and extracting features from non-textual data
  • State-of-the-art data-mining techniques including topic modelling approaches such as LDA, clustering techniques and applications of matrix factorisations
  • Theoretical concepts and the motivations behind different data-mining approaches
Subject Specific Practical Skills

Having successfully completed this module you will be able to:

  • Solve real-word data-mining, data-indexing and information extraction tasks
Subject Specific Intellectual and Research Skills

Having successfully completed this module you will be able to:

  • Conceptually understand the role of data-mining, together with the mathematical techniques this requires


Key concepts: - The importance of data-mining - Real-world applications of data-mining (cyber-security, financial forecasting, trend prediction, etc) - What is unstructured data -- Modalities of data - Underlying techniques -- Inverted indexes -- Matrix factorisation -- Probabilistic graphical models -- Dimensionality reduction Modelling data: - Understanding Text -- Bags of Words -- TF-IDF -- Natural language processing --- POS Tagging --- Entity extraction - Dealing with non-textual data -- Feature extraction techniques -- Bags of features Modern data indexing at scale - Information retrieval models - Efficient indexing (one-pass versus two-pass; updatable indexes) - Index compression - Ranking models Unimodal data mining: - Dimensionality reduction - Topic modelling (techniques such as LSA, pLSA, LDA, NNMF) - Clustering (Hierarchical agglomerative, Spectral) - Multi-dimensional scaling - Mining graphs and networks (hubs and authorities [PageRank/HITS], spectral methods, etc.) Multimodal data mining: - Finding independent features (ICA, NNMF) - Finding correlations and making predictions (CL-LSI, classifiers, etc.) - Collaborative filtering and recommender systems

Learning and Teaching

Follow-up work12
Preparation for scheduled sessions12
Wider reading or practice46
Completion of assessment task30
Total study time150

Resources & Reading list

Toby Segaran (2007). Programming Collective Intelligence: Building Smart Web 2.0 Applications. 



MethodPercentage contribution
Exam  (2 hours) 50%
Group Coursework 30%
Practical assessment 20%


MethodPercentage contribution
Exam  (2 hours) 100%

Repeat Information

Repeat type: Internal & External

Share this module Facebook Google+ Twitter Weibo

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.