Dr Richard J. Edwards
Distributed software packages
These tools are freely available for local installation under a GNU General Public License. Please see the Manuals and ReadMe for more details.
To contact the author, e-mail: firstname.lastname@example.org
When publishing analyses performed with this software, please cite the individual papers listed for the relevant program/module.If no program is listed, please cite this website.
SeqSuite Bioinformatics Software
In addition to the programs of the SLiMSuite package for Short, Linear Motif discovery, I have been involved in developing a number of other sequence analysis utilities. As with SLiMSuite, these are all freely available under a GNU General Public License.
This software is currently undergoing reorganisation and reannotation. Much of this software and/or the accompanying documentation is still under active development - please contact me with any questions regarding the software listed. Please also see the external SeqSuite Homepage for more news, links and relevant publications.
In addition to the main programs listed, a number of accessory applications, including the RJE_PyDocs module used to create these webpages, is available as an expanded RJESuite package. These generally have less documentation and development than the main software but may be of use to some people.
Short linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as two sites may be important for activity, making identification of novel SLiMs extremely difficult. In particular, it can be very difficult to distinguish a randomly recurring "motif" from a truly over-represented one. Incorporating ambiguous amino acid positions and/or variable-length wildcard spacers between defined residues further complicates the matter.
The SLiMSuite collection contains a number of open-source bioinformatics tools to analyse these important protein features. Please note that SLiMSuite is also available as part of the larger SeqSuite and RJESuite packages.
SLiMFinder is an integrated SLiM discovery program building on the principles of the SLiMDisc software for accounting for evolutionary relationships [Davey NE, Shields DC & Edwards RJ (2006): Nucleic Acids Res. 34(12):3546-54]. SLiMFinder is comprised of two algorithms:
SLiMBuild identifies convergently evolved, short motifs in a dataset. Motifs with fixed amino acid positions are identified and then combined to incorporate amino acid ambiguity and variable-length wildcard spacers. Unlike programs such as TEIRESIAS, which return all shared patterns, SLiMBuild accelerates the process and reduces returned motifs by explicitly screening out motifs that do not occur in enough unrelated proteins. For this, SLiMBuild uses the "Unrelated Proteins" (UP) algorithm of SLiMDisc in which BLAST is used to identify pairwise relationships. Proteins are then clustered according to these relationships into "Unrelated Protein Clusters" (UPCs), which are defined such that no protein in a UPC has a BLAST-detectable relationship with a protein in another UPC. If desired, SLiMBuild can be used as a replacement for TEIRESIAS in other software (teiresias=T slimchance=F).
SLiMChance estimates the probability of these motifs arising by chance, correcting for the size and composition of the dataset, and assigns a significance value to each motif. Motif occurrence probabilites are calculated independently for each UPC, adjusted the size of a UPC using the Minimum Spanning Tree algorithm from SLiMDisc. These individual occurrence probabilities are then converted into the total probability of the seeing the observed motifs the observed number of (unrelated) times. These probabilities assume that the motif is known before the search. In reality, only over-represented motifs from the dataset are looked at, so these probabilities are adjusted for the size of motif-space searched to give a significance value. This is an estimate of the probability of seeing that motif, or another one like it. These values are calculated separately for each length of motif. Where pre-known motifs are also of interest, these can be given with the slimcheck=MOTIFS option and will be added to the output. SLiMFinder version 4.0 introduced a more precise (but more computationally intensive) statistical model, which can be switched on using sigprime=T. Likewise, the more precise (but more computationally intensive) correction to the mean UPC probability heuristic can be switched on using sigv=T. (Note that the other SLiMChance options may not work with either of these options.) The allsig=T option will output all four scores. In this case, SigPrimeV will be used for ranking etc. unless probscore=X is used.
Where significant motifs are returned, SLiMFinder will group them into Motif "Clouds", which consist of physically overlapping motifs (2+ non-wildcard positions are the same in the same sequence). This provides an easy indication of which motifs may actually be variants of a larger SLiM and should therefore be considered together.
Additional Motif Occurrence Statistics, such as motif conservation, are handled by the rje_slimlist module. Please see the documentation for this module for a full list of commandline options. These options are currently under development for SLiMFinder and are not fully supported. See the SLiMFinder Manual for further details. Note that the OccFilter *does* affect the motifs returned by SLiMBuild and thus the TEIRESIAS output (as does min. IC and min. Support) but the overall Motif StatFilter *only* affects SLiMFinder output following SLiMChance calculations.
QSLiMFinder (Query SLiMFinder) is a variant of SLiMFinder for explicitly returning motifs present in a given query sequence specified by query=X. More details can be found in the SLiMFinder Manuals or contact the author.
The rje_slimcore module forms the basis for SLiMFinder & SLiMSearch and can also be used in its own right for additional special functions: The "MotifSeq" option will output fasta files for a list of X:Y, where X is a motif pattern and Y is the output file.
The "Randomise" function will take a set of input datasets (as in Batch Mode) and regenerate a set of new datasets by shuffling the UPC among datasets. Note that, at this stage, this is quite crude and may result in the final datasets having fewer UPC due to common sequences and/or relationships between UPC clusters in different datasets.
SLiMSearch is a tool for finding pre-defined SLiMs (Short Linear Motifs) in a protein sequence database. SLiMSearch can make use of corrections for evolutionary relationships and a variation of the SLiMChance alogrithm from SLiMFinder to assess motifs for statistical over- and under-representation. SLiMSearch is a replacement for PRESTO and uses many of the same underlying modules.
Benefits of SLiMSearch that make it more useful than a lot of existing tools include:
Main output for SLiMSearch is a delimited file of motif/peptide occurrences but the motifaln=T and proteinaln=T also allow output of alignments of motifs and their occurrences.
CompariMotif is a piece of software with a single objective: to take two lists of protein motifs and compare them to each other, identifying which motifs have some degree of overlap, and identifying the relationships between those motifs. It can be used to compare a list of motifs with themselves, their reversed selves, or a list of previously published motifs, for example (e.g. ELM (http://elm.eu.org/)). CompariMotif outputs a table of all pairs of matching motifs, along with their degree of similarity (information content) and their relationship to each other.
The best match is used to define the relationship between the two motifs. These relationships are comprised of the following keywords:
Match type keywords identify the type of relationship seen:
Match length keywords identify the length relationships of the two motifs:
RJESuite is the new name for all software previously available as a single package, under the umbrella of the
Details of the main software is given as part of the SLiMSuite and SeqSuite packages. This page gives an overview of some of the accessory applications. The level of documentation and help for these modules is highly variable. However, if you think you may find one of the modules useful and want the documentation improved, or if you have any other questions regarding the software listed, then please contact me .
Primary Accessory Applications
Secondary Accessory Applications
Please see the manual(s) and ReadMe distributed with the program for details on program options etc. If all else fails, please contact me.
When publishing analyses performed with these accessory applications, please cite the External SeqSuite website: https://sites.google.com/site/seqsuite/.