Skip to main navigationSkip to main content
The University of Southampton

COMP6235 Foundations of Data Science

Module Overview

Welcome to the Foundations of Data Science! 'Data scientist' has been described as the sexiest job of the 21st century, with the demand for highly skilled practitioners rising quickly to leverage the increasing amount of data available for study. As the amount of data increases, so too does the need for employees who can extract meaningful insights from this data. This course is designed to introduce you to a range of topics and concepts related to the data science process. It will cover the technical pipeline from data collection, to processing, analysis and visualisation. You will be introduced to and gain knowledge of various topics such as statistics, crawling data, data visualisation, advanced databases and cloud computing, along with a toolkit to use with data (including R, D3, Google Refine and Hadoop). The course will include a mix of lectures, tutorials, hands-on exercises and invited talks from expert data science practitioners. Coursework will allow you to gain experience using the theory and techniques delivered in the lectures, while the group project will give you the chance to apply knowledge of the data science process and toolkit in the development of a data science application.

Aims and Objectives

Learning Outcomes

Knowledge and Understanding

Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:

  • Key concepts in data science, including tools, approaches, and application scenarios
  • Topics in data collection, sampling, quality assessment and repair
  • Topics in statistical analysis and machine learning
  • Topics in data processing at scale
  • State-of-the-art tools to build data-science applications for different types of data, including text and CSV data
Subject Specific Intellectual and Research Skills

Having successfully completed this module you will be able to:

  • Understand and apply the fundamental concepts and techniques in data science
Subject Specific Practical Skills

Having successfully completed this module you will be able to:

  • Solve real-world data-science problems and build applications in this space


The course will introduce students to the data scientist toolkit and the underlying core concepts. It will cover the full technical pipeline from data collection (sampling methods, crawling) to processing and basic notions of statistical analysis and visualization. The module will also include advanced topics in high-performance computing, including non-relational databases and MapReduce. By taking this course the students will be provided with the basic toolkit to work with data (CSV, R, MongoDB). To support these learning objectives, the coursework will include exercises and a group project in which students will use existing open data sets and build their own application. The course will cover the following concepts: - Fundamentals and core terminology - Technology pipeline and methods - Application scenarios and state of the art - Data collection (sampling, crawling) - Data analytics (statistical modeling, basic concepts, experiment design, pitfalls, R) - Data interpretation and use (visualization techniques, pitfalls, D3) - High-performance computing (parallel databases, MapReduce, Hadoop, NoSQL) - Cloud computing (principles, architectures, existing technologies)

Learning and Teaching

Teaching and learning methods

Lectures and tutorials, as well as coursework (group project, exercises).

Follow-up work6
Wider reading or practice31
Preparation for scheduled sessions6
Completion of assessment task71
Total study time150



MethodPercentage contribution
Continuous Assessment 100%


MethodPercentage contribution
Set Task 100%


MethodPercentage contribution
Set Task 100%

Repeat Information

Repeat type: Internal & External

Share this module Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings