Skip to main navigationSkip to main content
The University of Southampton

MEDI6230 Advanced Bioinformatics

Module Overview

The module will cover the advanced principles of informatics and bioinformatics applied to clinical genomics including the analysis of whole-transcriptomes and tumour normal pairs. Sequencing transcriptomes and tumour normal pairs is quickly becoming the method of choice for determining differential gene expression, alternative splicing and somatic mutation profiling. The ability to analyse and interpret these data and knowledge of their intrinsic properties is essential for correct experimental design and appropriate analyses. Successful students will be equipped to implement and interpret genomic informatic analysis. The advanced informatics module develops the knowledge and skills gained in MEDI6215 (Bioinformatics, Interpretation and Data Quality Assurance in Genome Analysis) MEDI6131 (Omics techniques and their application to Genomic Medicine). Students who have not attended MEDI6215 and MEDI6131 must satisfy the module leaders that they possess the basic skills necessary to benefit from this module.

Aims and Objectives

Module Aims

The aims of this module are to build on the knowledge and experience gained from the Bioinformatics module, introduce the Linux command line environment, familiarise students with whole-transcriptome sequencing (RNAseq) for differential gene expression, sequencing of tumour normal pairs to identify somatic mutations, perform pathway analysis, and provide hands on experience of the most popular command line tools for analysing these types of Next Generation Sequencing (NGS) data. Upon completing this module students will be in a strong position to base their MSc research project on data from the ‘100,000 Genomes project’.

Learning Outcomes

Knowledge and Understanding

Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:

  • Utilise the Linux environment to navigate, organise and manipulate large NGS datasets.
  • Create basic scripts and pipelines for the analysis of omic data.
  • Analyse omic data, demonstrating an understanding of the bioinformatics processes of identification of somatic driver mutations, alignment of RNAseq data to the reference human genome, identification of known and novel transcripts, quantification of gene expression and detection of differential expression.
  • Interpret results from RNAseq and tumour normal pairs for application in both diagnostic and research settings.
  • Discuss the strengths and limitations of using NGS data to identify somatic mutations and detect differential gene expression.
  • Compare and contrast analytical methodologies for multiple purposes.


This module will couple theoretical sessions with practical assignments using Linux command line tools to align DNA and RNA sequence data to the reference genome, identify and annotate sequence variation including somatic mutations, interrogate databases, perform differential expression analyses using RNAseq data and to critically assess and interpret the findings from NGS analysis. The Linux command line is one of the most important and frequently used tools for analysing NGS data. It enables visualisation and manipulation of large data files, provides access to all program options and it can be used create bioinformatic pipelines and run programs that do not have web interfaces. Indicative Content • Introduction to the Linux command line and important commands. Combining commands and redirecting them, and writing basic scripts to document and replicate analyses • Using command line tools for data pre-processing, manipulation of VCF files and customised assessments of sequence coverage • Principles of sequencing tumour normal pairs to identify somatic mutations • Somatic variant discovery, familiarity with the statistical significance of somatic mutations (somatic P-value), annotation using multiple databases including ClinVar and COSMIC and estimation of somatic driver mutations using in-silico tools • Principles of RNA sequencing to determine gene expression profiles • Understanding split-read mapping and aligning RNAseq data to the reference genome • How to identify known and novel transcripts, quantify expression and perform differential expression analyses at various levels (genes, exons and transcripts) using appropriate software • Introduction to pathway analysis, using basic tools for network analysis, network visualisation and modelling biological processes

Special Features

The module will be taught in the Faculty of medicine, University of Southampton, which is based in University Hospital Southampton. Module leads and staff are an international faculty, at the forefront of their respective academic disciplines and professions. Adult learning methods will be used throughout and an emphasis placed upon interactive learning, practical demonstration and the interpretation of clinical scenarios to reinforce learning. Extensive e-learning facilities will be available to foster independent study.

Learning and Teaching

Teaching and learning methods

Lectures will introduce relevant topics and theory and give examples of significant initiatives and achievements. Practicals will be structured around a worksheet that addresses the core skills and tasks and will provide students with an opportunity to apply the information from lectures to real sequence data. Students will be assessed on two assignments that cover the learning outcomes.

Independent Study122
Total study time150

Resources & Reading list

AWK commands for NGS.

Awk oneliners.

Rodríguez-Ezpeleta, N (2012). Bioinformatics for High Throughput Sequencing. 

Mandoiu, I (2016). Computational Methods for Next Generation Sequencing Data Analysis. 

Perl oneliners.

Reference manual for UNIX introduction.

Unix introduction.

Learn the Linux Command Line: The basics.

Sed oneliners.

Bash tutorial.

Up and Running with Bash scripting.

VI Tutorial.

More on Unix.

Lesk, A (2014). Introduction to Bioinformatics. 



MethodPercentage contribution
Data Analysis  (2000 words) 50%
Written assignment  (1500 words) 50%


MethodPercentage contribution
Written assignment  (2500 words) 100%

Repeat Information

Repeat type: Internal & External

Share this module Share this on Facebook Share this on Google+ Share this on Twitter Share this on Weibo

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.