Spaces:
Sleeping
Sleeping
## Setup | |
No setup is required. Simply fill in the input boxes with the necessary data and click the **Run** button. | |
You can find a list of examples at the bottom of the page; clicking on them will autofill the fields for you. | |
If the server remains idle for a period, it will enter standby mode. Running a calculation will wake the tool from standby, but note that the first run may take longer due to startup and model loading. | |
## Input | |
**Sequence**: Enter the full amino acid sequence to be analyzed in the **Sequence** text box. | |
Note: While jolly characters (e.g., `-X.B`) can be included, they currently cannot be visualised. | |
**Substitutions**: Specify the substitutions you wish to test in the **Substitutions** box. The tool supports three running modes based on your input: | |
- **Single Substitution**: Input one or more substitutions (e.g. `R218K R218W`) to score specific changes. | |
- **Residue Position**: Provide residue positions to evaluate all possible substitutions at those sites. | |
- **Same-Length Sequence**: Analyze differing amino acid substitutions one by one within sequences of equal length. | |
- **Different Inputs**: For any other input format, a deep mutational scan of the full sequence will be performed. | |
**Model Selection**: Choose an ESM model for calculations from those available on Hugging Face Model Hub. | |
The model `esm2_t33_650M_UR50D` offers an optimal balance between cost and accuracy [*](https://doi.org/10.1126/science.ade2574). | |
**Accuracy Option**: The **Use higher accuracy** option applies a masked-marginals scoring strategy, which considers sequence context during inference. | |
While this method is slower, it enhances accuracy. If you experience long runtimes, unchecking this option can significantly speed up calculations at the cost of some accuracy. | |
**Deep Mutational Scan Recommendations**: When performing a deep mutational scan, it is advisable to use smaller models (8M, 35M, or 150M parameters) due to significant runtime concerns—especially with longer sequences or during peak server usage times. | |
For example, calculating a 300-residue-long sequence with larger models may require over 30 minutes. | |
Generally, accuracy is more affected by the scoring strategy than by model size; therefore, prioritise reducing model size when optimizing for runtime. | |
The computational cost of the scoring strategy scales with the number of substitutions tested, while model cost scales with wild-type sequence length. | |
**Concurrent Substitutions**: To calculate the effect of multiple concurrent substitutions, you must manually change the input sequence and rerun the calculation. Accuracy is not guaranteed as this use case is yet untested. | |
## Output | |
Results are displayed in a color-coded table, except for deep mutational scans, which produce a heatmap. | |
In the table: | |
- Beneficial substitutions are highlighted in green with positive values. | |
- Detrimental substitutions appear in red with negative values. | |
As a rule of thumb, score differences of *4* or more are considered significant. For instance: | |
- A substitution scoring *-6* is likely detrimental to protein functionality. | |
- A score of *+2* is generally regarded as neutral. | |
The **Download raw data** button lets you download the output in CSV format. | |
**If you use this tool in your research, please cite**: | |
Totaro MG, Vide U, Zausinger R, Winkler A, Oberdorfer G. ESM-scan—A tool to guide amino acid substitutions. *Protein Science.* 2024; 33(12):e5221. [doi.org/10.1002/pro.5221](https://doi.org/10.1002/pro.5221) | |