Data Mining

Module overview

The challenge of data mining is to transform raw data into useful information and actionable knowledge. Data mining is the computational process of discovering patterns in data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and data management. This course will introduce key concepts in data mining, information extraction and information indexing; including specific algorithms and techniques for feature extraction, clustering, outlier detection, topic modelling and prediction of complex unstructured data sets. By taking this course you will be given a broad view of the general issues surrounding unstructured and semi-structured data and the application of algorithms to such data. At a practical level you will have the chance to explore an assortment of data mining techniques which you will apply to problems involving real-world data.

Linked modules

Prerequisite: COMP3206 or COMP3222 or COMP3223 or COMP6229 or COMP6245 or COMP6246

Aims and Objectives

Learning Outcomes

Knowledge and Understanding

Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:

Theoretical concepts and the motivations behind different data-mining approaches
State-of-the-art data-mining techniques including topic modelling approaches such as LDA, clustering techniques and applications of matrix factorisations
The theory behind modern data indexing systems
Key concepts, tools and approaches for data mining on complex unstructured data sets (including multimedia mining, Twitter analysis, etc)
Natural language processing techniques for extracting features from text
Techniques for modelling and extracting features from non-textual data

Subject Specific Intellectual and Research Skills

Having successfully completed this module you will be able to:

Conceptually understand the role of data-mining, together with the mathematical techniques this requires

Subject Specific Practical Skills

Having successfully completed this module you will be able to:

Solve real-word data-mining, data-indexing and information extraction tasks

Learning and Teaching

Study time
Type	Hours
Preparation for scheduled sessions	12
Lecture	24
Wider reading or practice	46
Follow-up work	12
Revision	20
Tutorial	16
Completion of assessment task	20
Total study time	150

Resources & Reading list

Textbooks

Toby Segaran (2007). Programming Collective Intelligence: Building Smart Web 2.0 Applications. O'Reilly.

Assessment

Summative

This is how we’ll formally assess what you have learned in this module.

Breakdown
Method	Percentage contribution
Final Assessment	70%
Continuous Assessment	30%

Referral

This is how we’ll assess you if you don’t meet the criteria to pass this module.

Breakdown
Method	Percentage contribution
Set Task	100%

Repeat

An internal repeat is where you take all of your modules again, including any you passed. An external repeat is where you only re-take the modules you failed.

Breakdown
Method	Percentage contribution
Set Task	100%

Repeat Information

Repeat type: Internal & External