Skip to main navigationSkip to main content
The University of Southampton

Research project: Attard: Theoretical Studies Of Biological Processes: Studies Of The Origins And Implications Of Language-Like Features In Junk DNA

Currently Active: 

The application of statistical methods to the nucelotide sequences in DNA has revealed that non-coding DNA exhibits strong language-like features while in coding DNA these features are strongly attenuated.

The genome of most organisms contains large quantities of non-coding DNA (e.g. in humans some 95-97% of the genetic material does not appear to code for proteins). Since this type of DNA is transcriptionally silent and appears to have no function it is often referred to as junk DNA. The observation of language-like features, such as long-range sequence correlations, in junk DNA has been interpreted as being indicative of a higher order function. In particular, it has been suggested that junk DNA plays a central role in controlling the pattern and frequency of gene expression. However the statistical tests used to identify these language-like features do not necessarily mean that there is a hidden language in junk DNA; they merely reflect the presence of long range sequence correlations in the genome. Our work is aimed at developing a biochemically realistic model which accounts for the possible origin of these sequence correlations. Our model is based on the target sequence duplications that occur when class II transposable elements (TEs), such as P, hobo, Tc1, Mu, or mariner elements, insert at a particular target site. We have implemented a computer simulation which models the repeated insertion and imprecise excision of TEs with different target site preferences. The results of this simulation show that starting with a DNA sequence which is known to code for proteins, a longer and non-coding sequence of junk DNA can be created which exhibits strong language-like features. The implication of this finding is that there is no language in junk DNA. However, the existence of a long-ranged structure may have been exploited to control gene expression. Our current work is aimed at refining the model and at investigating ways in which these higher-order sequence correlations might be exploited to regulate gene expression.

Related research groups

Share this research project Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings