SPOTONE

Welcome to SpotONE: hot SPOTs ON protein complexes with Extremely randomized trees via sequence-only features. This webserver allows the user to input solely a protein sequence (in FASTA format) and attain an in-silico prediction for all amino-acid residues as hot-spots (HS) or non-hot-spots (NS). Please cite A.J. Preto and Irina S. Moreira 2020.

Submit

Input your protein in the upload box, in the form of a FASTA file, then click submit. The protein sequence should only contain the amino acid single letter codes. For optimal results, you should only include soluble proteins, as there were no membrane proteins in the training dataset. You will be directly sent to your results page, which might take some time to finish. Please, also fill-up your email address and an appropriate name for the run. Make sure to use a valid email, otherwise the run will not proceed correctly. An email will be sent to you with the results link.

Example File

The submitted file must be a FASTA file, thus, it needs to have the ".fasta" termination. In this file, must be present some sort of identifier, and the protein sequence in single-letter amino acid code. Chains that include the generic amino acid encoding X will not be processed. An example of the content of a FASTA file can be downloaded below.

Database

Due to the high volume of the plots, these are only loaded upon click on the respective button, please feel free to explore our results and hover over the plots to discover additional information.

The above plot displays the number of examples for each class available (HS and NS) in the dataset.

The above plot displays the number of amino-acids considering four quartiles that cover the length of the sequence. The amino acids are labelled according to their class, and as such it is possible to analyse the abundance of amino acids per relative position.

The above plot displays the proportion of Hot-spots and Null-spots per amino-acid residue type.

The above plots display several amino-acid characteristics split by their class value (HS vs NS). The values for the amino-acid characteristics were retrieve from the Biological Magnetic Resonance Data Bank (BMRB) DOI: 10.1093/nar/gkm957.

Methods Summary

SPOTONE is a new Machine-Learning (ML) predictor able to accurately classify protein Hot-Spots (HS) via sequence-only features. This algorithm shows an accuracy, AUROC, precision, recall and F1-score of 0.82, 0.83, 0.91, 0.82 and 0.85, respectively, in an independent testing set. The algorithm is deployed within a free-to-use webserver at http://moreiralab.com/resources/spotone, only requiring the user to submit a FASTA file with one or more protein sequences.