S4Pred
Publication Copy citation Github repository
S4PRED predicts the secondary structure of a single protein sequence in the absence of homology information, achieving a Q3 score of 75.3% on the standard CB513 test set, taking only single sequences as input. Although they don't perform as their homology-based counterparts, single-sequence methods are not constrained by the requirement for evolutionary information. More accurate single-sequence approaches have the potential to improve structural modelling across the vast majority of sequence space, especially in areas of great scientific interest like viral proteins, the “dark proteome”, and de novo protein design. Academic users can download the model here.
DMPfold2
Publication Copy citation Github repository
DMPfold2 is an ultrafast end-to-end deep learning method that predicts tertiary structure using only a multiple sequence alignment (MSA) as input. The network makes use of just three recurrent networks and a stack of residual convolutional layers, making the predictor very fast to run, and easy to install and use. Our approach constructs a directly learned representation of the sequences in an MSA, starting from a one-hot encoding of the sequences. When supplemented with an approximate precision matrix, the learned representation can be used to produce structural models of comparable or greater accuracy as compared to our original DMPfold method, while requiring less than a second to produce a typical model.DMPfold
Publication Copy citation Github repository Understanding the webserver results
DMPfold uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and torsion angles, which it uses to build models in an iterative fashion. DMPfold produces more accurate models than two popular methods for a test set of CASP12 domains, and works just as well for transmembrane proteins. It produced confident models for 25% of all Pfam domains without known structures and models for 16% of human proteome UniProt entries without structures and generates accurate models with fewer than 100 sequences in some cases. Using the DMPfold method we have modelled all but ten of the proteins without templates in the JCVI-syn3.0 minimal genome. The paper is available here . A broader discussion of the use of deep learning in structural prediction is available here .PASS: Profile Augmentation of Single Sequences
Profile Augmentation of Single Sequences (PASS) is a simple but powerful framework for accurately modelling single orphan protein sequences in the absence of homology information. S4PRED uses PASS to achieve an unprecedented Q3 score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences.PSIPRED: Predict Secondary Structure
Publication Copy citation Github repository Understanding the webserver results
PSIPRED is a simple and accurate secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST). Using a very stringent cross validation method to evaluate the method's performance, PSIPRED 3.2 achieves an average Q3 score of 81.6%. Predictions produced by PSIPRED were also submitted to the CASP4 evaluation and assessed during the CASP4 meeting, which took place in December 2000 at Asilomar. PSIPRED 2.0 achieved an average Q3 score of 80.6% across all 40 submitted target domains with no obvious sequence similarity to structures present in PDB, which ranked PSIPRED top out of 20 evaluated methods (an earlier version of PSIPRED was also ranked top in CASP3 held in 1998). It is important to realise, however, that due to the small sample sizes, the results from CASP are not statistically significant, although they do give a rough guide as to the current "state of the art". For a more reliable evaluation, the EVA web site at Columbia University provides a continuous evaluation. NOTE that at the time of writing, the EVA site is no longer being updated. Downloads: The PSIPRED V3.2 software can be downloaded from HERE. Please note that you should read the license terms given in the README file if you wish to incorporate PSIPRED in another program or Web server. Older releases of PSIPRED can be downloaded here HERE.
DISOPRED3: Protein intrinsic disorder prediction
Publication Copy citation Github repository Understanding the webserver results Tutorial
DISOPRED3 represents the latest release of our successful machine-learning based approach to the detection of intrinsically disordered regions. The method was originally trained on evolutionarily conserved sequence features of disordered regions from missing residues in high-resolution X-ray structures. DISOPRED2 mainly addressed the marked class imbalance between ordered and disordered amino acids as well as the different sequence patterns associated with terminal and internal disordered regions using SVMs. DISOPRED3 extends the previous architecture with two independent predictors of intrinsic disorder - a neural network and a nearest neighbor classifier - which were trained to identify long intrinsically disordered regions using data from the PDB and DisProt databases. The intermediate results are integrated by an additional neural network. DISOPRED3 was blindly tested and compared during the ninth and tenth rounds of the world-wide CASP experiments, where it was found to achieve high levels of specificity (about 99%) and therefore precision (about 75%). Indeed, the official assessment teams ranked DISOPRED3 at the top or near the top across a number of tests and evaluation measures. To provide insights into the biological roles of proteins, DISOPRED3 also predicts protein binding sites within disordered regions using a SVM that examines patterns of evolutionary sequence conservation, positional information and amino acid composition of putative disordered regions. Using a stringent test set, DISOPRED3 predictions were found to improve over existing methods, achieving approximately 20% precision and 30% recall. These results highlight the need for additional efforts in the area.