Fine-Tuning ESM-1b for Phosphosite Prediction

This repository provides a fine-tuned version of the ESM-1b model, trained to classify phosphosites using unlabeled phosphosites(ie, which kinases phosphorylate those phosphosites is unknown) from PhosphoSitePlus. The model is designed for binary classification, distinguishing phosphosites from non-phosphorylated peptid sequences (Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites)

Developed by:

Zeynep Işık (MSc, Sabanci University)

Dataset & Labeling Strategy

The dataset was constructed using phosphosite information from PhosphoSitePlus, with the following assumptions:

  • Positive Samples: Known phosphorylated residues from PhosphoSitePlus.
  • Negative Samples: Derived by selecting 15-residue sequences from the same proteins, ensuring the central residue matches a known phosphorylation site but is not reported as phosphorylated in PhosphoSitePlus. Note: The absence of phosphorylation reports does not imply absolute non-phosphorylation but is assumed as negative in this study.

Dataset Statistics

  • Positive Samples: 366,028
  • Negative Samples: 364,121
  • Training Samples: 511,104
  • Validation Samples: 109,522
  • Testing Samples: 109,523

Test Performance

  • Accuracy: 0.94
  • F1-Score: 0.94

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model and tokenizer
model_name = "isikz/phosphorylation_binaryclassification_esm1b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example sequence
sequence = "MKTLLLTLVVVTIVCLDLGYTGV"

# Tokenize input
inputs = tokenizer(sequence, return_tensors="pt")

# Get prediction
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
prediction = torch.sigmoid(logits).item()

print(f"Phosphorylation Probability: {prediction:.4f}")
Downloads last month
12
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for isikz/phosphorylation_binaryclassification_esm1b

Finetuned
(3)
this model