Skip to main navigationSkip to main content
The University of Southampton

MEDI6235 Advanced Genomic Informatics

Module Overview

Aims and Objectives

Learning Outcomes

Learning Outcomes

Having successfully completed this module you will be able to:

  • Create basic scripts and pipelines for the automated analysis of NGS datasets in the Linux environment
  • Apply appropriate tools for quality control, sequence alignment, variant calling, annotation and variant filtration to identify potentially pathogenic variants, including somatic driver mutations, in the Linux environment
  • Apply appropriate tools for splice-aware cDNA sequence alignment, quantification of various aspects of transcription (for example gene, exon, transcript abundance) and differential gene expression (DGE) analysis in the Linux environment
  • Develop strategies to prioritise candidate genes from DGE for further study (eg enrichment analysis, co-expression analysis)
  • Design and apply appropriate machine learning approaches to analyse complex and high dimensional biological datasets (NGS, proteomics and clinical datasets)


• Introduction to the Linux command line and important commands. Combining commands and redirecting them, and writing basic scripts to document and replicate analyses • Using command line tools for data pre-processing, manipulation of VCF files and customised assessments of sequence coverage • Principles of sequencing tumour normal pairs to identify somatic mutations • Somatic variant discovery, familiarity with the statistical significance of somatic mutations (somatic P-value), annotation using multiple databases including ClinVar and COSMIC and estimation of somatic driver mutations using in-silico tools • Principles of RNA sequencing to determine gene expression profiles • Understanding split-read mapping and aligning RNAseq data to the reference genome • How to identify known and novel transcripts, quantify expression and perform differential expression analyses at various levels (genes, exons and transcripts) using appropriate software • Understand the perks and limitations of popular machine learning methods in the generation of new knowledge from complex and high dimensional biological datasets. • Introduction to pathway analysis, using basic tools for network analysis, network visualisation and modelling biological processes

Learning and Teaching

Teaching and learning methods

The module will comprise two blocks of intensive on-site teaching, each followed by approximately two weeks of independent study. A variety of learning and teaching methods will be adopted to promote a wide range of skills and meet the differing learning styles of the group. The on-site teaching will include seminars, practical demonstrations, discussions and exercises surrounding interpretation of data and clinical scenarios, and specialist lectures given by a range of academic and health care professionals. This will ensure a breadth and depth of perspective, giving a good balance between background theories and principles and practical experience. Off-site independent learning will take place on the virtual learning environment hosted by the UoS.

Independent Study122
Total study time150


Assessment Strategy

The assessment for the module provides you with the opportunity to demonstrate achievement of the learning outcomes. In addition to the summative assessments, during the course of the module there will be opportunities to obtain feedback in the form of unassessed, formative activities.


Workshop activities


MethodPercentage contribution
Data analysis project 50%
Short answer questions 50%


MethodPercentage contribution
Written assignment 100%

Repeat Information

Repeat type: Internal & External

Linked modules


To study this module, you will need to have studied the following module(s):

MEDI6237Genomic Technologies and Basic Informatics
Share this module Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings