The University of Southampton

BIOL6055 Computational methods for biological data analysis

Module Overview

Large-scale approaches at the molecular, cellular, organismal and ecological level are revolutionizing biology by enabling systems-level questions to be addressed. Whilst large-scale techniques such as genomics and proteomics are immensely powerful, they bring new challenges in data handling, processing, analysis and interpretation. This module aims to equip students with the necessary practical skills to fully utilize data from large-scale approaches. The module will make use of the Unix/Linux environment and the R language and guide students through data management, sorting, extraction and analysis. The student will be able to utilize the skills learned to ask biological question(s) of large-scale datasets.

Aims and Objectives

Module Aims

• To introduce students to the UNIX/LINUX environment and R language and demonstrate command line and scripting tools for data manipulation • To describe different file formats • To illustrate and explore computational methods for accomplishing data analysis tasks, basic statistics and graphing of results • Demonstrate the value of data analysis pipelines that can automate analysis from raw data to publication-ready graphical output

Learning Outcomes

Learning Outcomes

Having successfully completed this module you will be able to:

  • Navigate and organise files and data on the command line within a UNIX/LINUX environment.
  • Perform data filtering/manipulation using UNIX/LINUX tools and R
  • Perform analysis tasks using bash and R scripts
  • Write basic pipelines for automating and repeating analyses
  • Have the skills necessary to answer biological questions using complex datasets


This module aims to give students an introduction to medium- and large-scale biological data processing, manipulation and analysis using the command line, and scripting languages. Emphasis will be on scalable, repeatable and automated analysis workflows. The course will begin with an introduction to the challenges of large scale data in modern biological sciences. The module will then include; • An introduction to the UNIX/LINUX operating system; files, directories and navigation. • Command line and R approaches to file exploration, manipulation, data extraction and analysis. • Writing scripts and pipelines for automated and repeatable analysis tasks. • Producing basic summary statistics and plots for data exploration and presentation of analysis results using R.

Special Features

For features such as field trips, information should be included as to how students with special needs will be enabled to benefit from this or an equivalent experience. This module will make extensive use of VDU facilities.

Learning and Teaching

Teaching and learning methods

Study time allocation [Contact time includes: Lectures, seminars, tutorials, project supervision, demonstration, practicals/workshops/fieldwork/external visits/work based learning] Contact time: 14 hours. 2 x 1 hour lectures plus 4x 3 hours computer practical/workshops. Private study hours: 61 hours. Total study time: 75 hours. Teaching and Learning Methods [List all types of learning methods, with a brief description of what that entails, e.g. Formal Lectures will provide an introduction to …..; Practical sessions will exemplify the theory and allow you to develop …..] Formal lectures will introduce the fields of computational biology, bioinformatics and systems biology at the beginning of the module. These will be followed by 4 practical sessions of 3 hours duration. Each practical session will begin with a formal introduction of the theory and be followed by a hands-on session structured around a worksheet and/or online tutorials containing the core skills and some tasks that put these skills into practice. Following each workshop a further worksheet task will be set and assessed, each contributing 25% towards the final mark for the module.

Practical classes and workshops12
Wider reading or practice43
Completion of assessment task10
Preparation for scheduled sessions8
Total study time75

Resources & Reading list

Keith Bradnam & Ian Korf (2012). UNIX and Perl to the Rescue!: A Field Guide for the Life Sciences (and Other Data-rich Pursuits. 

Beginning R: the statistical programming language. 



MethodPercentage contribution
Coursework 100%


MethodPercentage contribution
Coursework 100%
Share this module Facebook Google+ Twitter Weibo

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.