Skip to main navigationSkip to main content
The University of Southampton

Research project: Essex: Cheminformatics, Bioinformatics and Education

Currently Active: 

Our work in bioinformatics has been focused around the 4G Basic Technology project, to develop methods for high-throughput genome sequencing. This approach is based on rapidly reading short DNA sequences, followed by sequence assembly from these fragments.

To assess the feasibility of this approach, we have determined that viral DNA may be reassembled from short reads of approximately 18-25 nucleotides in length, whereas bacterial and mammalian genomes will require substantially longer fragments (Nucleic Acids Research 33, 2005, e171). We have also developed methods to optimally design DNA libraries for resequencing highly variable genomes, such as those in viruses, and novel visualization methods for examining the structure of repeated DNA in genomes. All of this work required the development of novel underlying algorithms and theory. In particular, new suffix array methods were required to perform the resequencing studies of the larger genomes, particularly that of the human. The work on library design also resulted in the development of a remarkably simple, yet highly effective, statistical physics model which enables the temperature at which the DNA duplex melts to be calculated (Nature Physics 2, 2006, 55-59).


Our work in cheminformatics consists of a number of interconnected strands. We have been involved in two database projects. In BioSimGrid, we have developed a software environment for the storage, querying, sharing and analysis of molecular simulation data (Org. Biomol. Chem. 2, 2004, 3219-3221), using the Grid to add robustness to the data storage element, while under the comb-e-chem e-science project we have applied modern methods of describing data to move away from the conventional notion of a relational database for describing chemical information (J. Chem. Inf. Mod. 46, 2006, 939-952). Combined with the development of database technologies, we are also applying modern statistical analysis techniques to improve the robustness of our models, thereby leveraging more information from the available chemical data (J. Chem. Inf. Mod. 45, 2005, 1791-1803). We have also invested considerable effort in trying to exploit distributed computing technologies more effectively (i.e. desktop computers), since they provide a potentially very large untapped computational resource, including their use in the context of parallel tempering simulations to enhance the sampling in protein molecular dynamics simulations (Phil. Trans. R. Soc. Lond. A 363, 2005, 2017-2035).

As part of the agenda to engage the public in science activities, and in particular to focus on science in schools, we have developed a web-based environment allowing school children to design and dock small molecules against an important malarial enzyme. This e-malaria project exploits our work in distributed computing and databases to deliver a practical tool that illustrates an important element of the modern drug discovery process (J. Chem. Inf. Mod. 46, 2006, 960-970).

Related research groups

Computational Systems Chemistry
Share this research project Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings