Fine-Tuning ESM-1b for Phosphosite Prediction
This repository provides a fine-tuned version of the ESM-1b model, trained to classify phosphosites using unlabeled phosphosites(ie, which kinases phosphorylate those phosphosites is unknown) from PhosphoSitePlus. The model is designed for binary classification, distinguishing phosphosites from non-phosphorylated peptid sequences (Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites)
Developed by:
Zeynep Işık (MSc, Sabanci University)
Dataset & Labeling Strategy
The dataset was constructed using phosphosite information from PhosphoSitePlus, with the following assumptions:
- Positive Samples: Known phosphorylated residues from PhosphoSitePlus.
- Negative Samples: Derived by selecting 15-residue sequences from the same proteins, ensuring the central residue matches a known phosphorylation site but is not reported as phosphorylated in PhosphoSitePlus. Note: The absence of phosphorylation reports does not imply absolute non-phosphorylation but is assumed as negative in this study.
Dataset Statistics
- Positive Samples: 366,028
- Negative Samples: 364,121
- Training Samples: 511,104
- Validation Samples: 109,522
- Testing Samples: 109,523
Test Performance
- Accuracy: 0.94
- F1-Score: 0.94
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load the model and tokenizer
model_name = "isikz/phosphorylation_binaryclassification_esm1b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example sequence
sequence = "MKTLLLTLVVVTIVCLDLGYTGV"
# Tokenize input
inputs = tokenizer(sequence, return_tensors="pt")
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
prediction = torch.sigmoid(logits).item()
print(f"Phosphorylation Probability: {prediction:.4f}")
- Downloads last month
- 12
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for isikz/phosphorylation_binaryclassification_esm1b
Base model
facebook/esm1b_t33_650M_UR50S