BIOL6055 Computational methods for biological data analysis
Large-scale approaches at the molecular, cellular, organismal and ecological level are revolutionizing biology by enabling systems-level questions to be addressed. Whilst large-scale techniques such as genomics and proteomics are immensely powerful, they bring new challenges in data handling, processing, analysis and interpretation. This module aims to equip students with the necessary practical skills to fully utilize data from large-scale approaches. The module will make use of the Unix/Linux environment and the R language and guide students through data management, sorting, extraction and analysis. The student will be able to utilize the skills learned to ask biological question(s) of large-scale datasets.
Aims and Objectives
• To introduce students to the UNIX/LINUX environment and R language and demonstrate command line and scripting tools for data manipulation • To describe different file formats • To illustrate and explore computational methods for accomplishing data analysis tasks, basic statistics and graphing of results • Demonstrate the value of data analysis pipelines that can automate analysis from raw data to publication-ready graphical output
Having successfully completed this module you will be able to:
- Navigate and organise files and data on the command line within a UNIX/LINUX environment.
- Perform data filtering/manipulation using UNIX/LINUX tools and R
- Perform analysis tasks using bash and R scripts
- Write basic pipelines for automating and repeating analyses
- Have the skills necessary to answer biological questions using complex datasets
This module aims to give students an introduction to medium- and large-scale biological data processing, manipulation and analysis using the command line, and scripting languages. Emphasis will be on scalable, repeatable and automated analysis workflows. The course will begin with an introduction to the challenges of large scale data in modern biological sciences. The module will then include; • An introduction to the UNIX/LINUX operating system; files, directories and navigation. • Command line and R approaches to file exploration, manipulation, data extraction and analysis. • Writing scripts and pipelines for automated and repeatable analysis tasks. • Producing basic summary statistics and plots for data exploration and presentation of analysis results using R.
For features such as field trips, information should be included as to how students with special needs will be enabled to benefit from this or an equivalent experience. This module will make extensive use of VDU facilities.
Learning and Teaching
Teaching and learning methods
Study time allocation [Contact time includes: Lectures, seminars, tutorials, project supervision, demonstration, practicals/workshops/fieldwork/external visits/work based learning] Contact time: 14 hours. 2 x 1 hour lectures plus 4x 3 hours computer practical/workshops. Private study hours: 61 hours. Total study time: 75 hours. Teaching and Learning Methods [List all types of learning methods, with a brief description of what that entails, e.g. Formal Lectures will provide an introduction to …..; Practical sessions will exemplify the theory and allow you to develop …..] Formal lectures will introduce the fields of computational biology, bioinformatics and systems biology at the beginning of the module. These will be followed by 4 practical sessions of 3 hours duration. Each practical session will begin with a formal introduction of the theory and be followed by a hands-on session structured around a worksheet and/or online tutorials containing the core skills and some tasks that put these skills into practice. Following each workshop a further worksheet task will be set and assessed, each contributing 25% towards the final mark for the module.
|Practical classes and workshops||12|
|Wider reading or practice||43|
|Completion of assessment task||10|
|Preparation for scheduled sessions||8|
|Total study time||75|
Resources & Reading list
Keith Bradnam & Ian Korf (2012). UNIX and Perl to the Rescue!: A Field Guide for the Life Sciences (and Other Data-rich Pursuits.
Beginning R: the statistical programming language.