Skip to main navigationSkip to main content
The University of Southampton

BIOL6055 Computational methods for biological data analysis

Module Overview

Large-scale approaches at the molecular, cellular, organismal and ecological level are revolutionizing biology by enabling systems-level questions to be addressed. Whilst large-scale techniques such as genomics and proteomics are immensely powerful, they bring new challenges in data handling, processing, analysis and interpretation. This module aims to equip students with the necessary practical skills to fully utilize data from large-scale approaches. The module will make use of the Unix/Linux environment and the R language and guide students through data management, sorting, extraction and analysis. The student will be able to utilize the skills learned to ask biological question(s) of large-scale datasets.

Aims and Objectives

Learning Outcomes

Learning Outcomes

Having successfully completed this module you will be able to:

  • Navigate and organise files and data on the command line within a UNIX/LINUX environment.
  • Perform data filtering/manipulation using UNIX/LINUX tools and R
  • Perform analysis tasks using bash and R scripts
  • Write basic pipelines for automating and repeating analyses
  • Have the skills necessary to answer biological questions using complex datasets


This module aims to give students an introduction to medium- and large-scale biological data processing, manipulation and analysis using the command line, and scripting languages. Emphasis will be on scalable, repeatable and automated analysis workflows. The course will begin with an introduction to the challenges of large scale data in modern biological sciences. The module will then include; • An introduction to the UNIX/LINUX operating system; files, directories and navigation. • Command line and R approaches to file exploration, manipulation, data extraction and analysis. • Writing scripts and pipelines for automated and repeatable analysis tasks. • Producing basic summary statistics and plots for data exploration and presentation of analysis results using R.

Learning and Teaching

Teaching and learning methods

Study time allocation [Contact time includes: Lectures, seminars, tutorials, project supervision, demonstration, practicals/workshops/fieldwork/external visits/work based learning] Contact time: 14 hours. 2 x 1 hour lectures plus 4x 3 hours computer practical/workshops. Private study hours: 61 hours. Total study time: 75 hours. Teaching and Learning Methods [List all types of learning methods, with a brief description of what that entails, e.g. Formal Lectures will provide an introduction to …..; Practical sessions will exemplify the theory and allow you to develop …..] Formal lectures will introduce the fields of computational biology, bioinformatics and systems biology at the beginning of the module. These will be followed by 4 practical sessions of 3 hours duration. Each practical session will begin with a formal introduction of the theory and be followed by a hands-on session structured around a worksheet and/or online tutorials containing the core skills and some tasks that put these skills into practice. Following each workshop a further worksheet task will be set and assessed, each contributing 25% towards the final mark for the module.

Completion of assessment task10
Wider reading or practice43
Preparation for scheduled sessions8
Practical classes and workshops12
Total study time75

Resources & Reading list

Keith Bradnam & Ian Korf (2012). UNIX and Perl to the Rescue!: A Field Guide for the Life Sciences (and Other Data-rich Pursuits. 

Beginning R: the statistical programming language. 



MethodPercentage contribution
Exam 20%
Scripting assignment 40%
Scripting assignment 40%


MethodPercentage contribution
Coursework 100%
Share this module Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings