Skip to main navigationSkip to main content
The University of Southampton

MEDI6230 Advanced Bioinformatics

Module Overview

The module will cover the advanced principles of informatics and bioinformatics applied to clinical genomics including the analysis of whole-transcriptomes and tumour normal pairs. Sequencing transcriptomes and tumour normal pairs is quickly becoming the method of choice for determining differential gene expression, alternative splicing and somatic mutation profiling. The ability to analyse and interpret these data and knowledge of their intrinsic properties is essential for correct experimental design and appropriate analyses. Successful students will be equipped to implement and interpret genomic informatic analysis. The advanced informatics module develops the knowledge and skills gained in MEDI6215 (Bioinformatics, Interpretation and Data Quality Assurance in Genome Analysis) MEDI6131 (Omics techniques and their application to Genomic Medicine). Students who have not attended MEDI6215 and MEDI6131 must satisfy the module leaders that they possess the basic skills necessary to benefit from this module. If we do have insufficient numbers of students interested in this optional module, this may not be offered. If an optional module will not be run, we will advise you as soon as possible and help you choose an alternative module.

Aims and Objectives

Learning Outcomes

Knowledge and Understanding

Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:

  • Utilise the Linux environment to navigate, organise and manipulate large NGS datasets.
  • Create basic scripts and pipelines for the analysis of omic data.
  • Analyse omic data, demonstrating an understanding of the bioinformatics processes of identification of somatic driver mutations, alignment of RNAseq data to the reference human genome, identification of known and novel transcripts, quantification of gene expression and detection of differential expression.
  • Interpret results from RNAseq and tumour normal pairs for application in both diagnostic and research settings.
  • Discuss the strengths and limitations of using NGS data to identify somatic mutations and detect differential gene expression.
  • Compare and contrast analytical methodologies for multiple purposes.


This module will couple theoretical sessions with practical assignments using Linux command line tools to align DNA and RNA sequence data to the reference genome, identify and annotate sequence variation including somatic mutations, interrogate databases, perform differential expression analyses using RNAseq data and to critically assess and interpret the findings from NGS analysis. The Linux command line is one of the most important and frequently used tools for analysing NGS data. It enables visualisation and manipulation of large data files, provides access to all program options and it can be used create bioinformatic pipelines and run programs that do not have web interfaces. Indicative Content Introduction to the Linux command line and important commands. Combining commands and redirecting them, and writing basic scripts to document and replicate analyses Using command line tools for data pre-processing, manipulation of VCF files and customised assessments of sequence coverage Principles of sequencing tumour normal pairs to identify somatic mutations Somatic variant discovery, familiarity with the statistical significance of somatic mutations (somatic P-value), annotation using multiple databases including ClinVar and COSMIC and estimation of somatic driver mutations using in-silico tools Principles of RNA sequencing to determine gene expression profiles Understanding split-read mapping and aligning RNAseq data to the reference genome How to identify known and novel transcripts, quantify expression and perform differential expression analyses at various levels (genes, exons and transcripts) using appropriate software Introduction to pathway analysis, using basic tools for network analysis, network visualisation and modelling biological processes

Learning and Teaching

Teaching and learning methods

Lectures will introduce relevant topics and theory and give examples of significant initiatives and achievements. Practicals will be structured around a worksheet that addresses the core skills and tasks and will provide students with an opportunity to apply the information from lectures to real sequence data. Students will be assessed on two assignments that cover the learning outcomes.

Independent Study122
Total study time150

Resources & Reading list

Mandoiu, I (2016). Computational Methods for Next Generation Sequencing Data Analysis. 

Learn the Linux Command Line: The basics.

Sed oneliners.

VI Tutorial.

More on Unix.

AWK commands for NGS.

Perl oneliners.

Up and Running with Bash scripting.

Rodríguez-Ezpeleta, N (2012). Bioinformatics for High Throughput Sequencing. 

Unix introduction.

Reference manual for UNIX introduction.

Awk oneliners.

Lesk, A (2014). Introduction to Bioinformatics. 

Bash tutorial.


Assessment Strategy

The pass mark for the module and all assessed components is 50%. If you do not achieve the pass mark on this module by achieving 50% or more in all components, you may still pass by compensation. To do this, you must achieve a qualifying mark of 40% on each assessed component. Each of the component marks is then combined, using the appropriate weighting, to give an overall mark for the module. If this overall mark is greater than or equal to 50% you will have passed the module. If your overall mark is less than 50% when the weighting has been applied to the components, you will have failed the module. If you have not achieved 40% or more on all components, you cannot use compensation and have failed the module. If you have failed the module, you will have the opportunity to submit work at the next referral (re-sit) opportunity using the method outlined below. You must achieve the pass mark in all referred components. On passing your referrals, your final module mark will be capped at 50%.


MethodPercentage contribution
Data Analysis  (2000 words) 50%
Written assignment  (1500 words) 50%


MethodPercentage contribution
Written assignment  (2500 words) 100%

Repeat Information

Repeat type: Internal & External

Share this module Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings