|
--- |
|
tags: |
|
- protein |
|
- small-molecule |
|
- dti |
|
- ibm |
|
- mammal |
|
- pytorch |
|
- transformers |
|
library_name: biomed |
|
license: apache-2.0 |
|
base_model: |
|
- ibm/biomed.omics.bl.sm.ma-ted-400m |
|
--- |
|
|
|
Accurate prediction of drug-target binding affinity is essential in the early stages of drug discovery. |
|
This is an example of finetuning ibm/biomed.omics.bl.sm-ted-400 the task. |
|
Prediction of binding affinities using pKd, the negative logarithm of the dissociation constant, which reflects the strength of the interaction between a small molecule (drug) and a protein (target). |
|
The expected inputs for the model are the amino acid sequence of the target and the SMILES representation of the drug. |
|
|
|
The benchmark used for fine-tuning defined on: `https://tdcommons.ai/multi_pred_tasks/dti/` |
|
We also harmonize the values using data.harmonize_affinities(mode = 'max_affinity') and transforming to log-scale. |
|
By default, we are using Drug+Target cold-split, as provided by tdcommons. |
|
|
|
|
|
## Model Summary |
|
|
|
- **Developers:** IBM Research |
|
- **GitHub Repository:** https://github.com/BiomedSciAI/biomed-multi-alignment |
|
- **Paper:** TBD |
|
- **Release Date**: Oct 28th, 2024 |
|
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). |
|
|
|
## Usage |
|
|
|
Using `ibm/biomed.omics.bl.sm.ma-ted-400m` requires installing [https://github.com/BiomedSciAI/biomed-multi-alignment](https://github.com/TBD) |
|
|
|
``` |
|
pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git#egg=mammal[examples] |
|
``` |
|
|
|
A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-400m`: |
|
```python |
|
import os |
|
from fuse.data.tokenizers.modular_tokenizer.op import ModularTokenizerOp |
|
|
|
from mammal.examples.dti_bindingdb_kd.task import DtiBindingdbKdTask |
|
from mammal.keys import CLS_PRED, SCORES |
|
from mammal.model import Mammal |
|
|
|
# input |
|
target_seq = "NLMKRCTRGFRKLGKCTTLEEEKCKTLYPRGQCTCSDSKMNTHSCDCKSC" |
|
drug_seq = "CC(=O)NCCC1=CNc2c1cc(OC)cc2" |
|
|
|
# Load Model |
|
model = Mammal.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-400m.dti_bindingdb_pkd") |
|
model.eval() |
|
|
|
# Load Tokenizer |
|
tokenizer_op = ModularTokenizerOp.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-400m.dti_bindingdb_pkd") |
|
|
|
# convert to MAMMAL style |
|
sample_dict = {"target_seq": target_seq, "drug_seq": drug_seq} |
|
sample_dict = DtiBindingdbKdTask.data_preprocessing( |
|
sample_dict=sample_dict, |
|
tokenizer_op=tokenizer_op, |
|
target_sequence_key="target_seq", |
|
drug_sequence_key="drug_seq", |
|
norm_y_mean=None, |
|
norm_y_std=None, |
|
device=model.device, |
|
) |
|
|
|
# forward pass - encoder_only mode which supports scalar predictions |
|
batch_dict = model.forward_encoder_only([sample_dict]) |
|
|
|
# Post-process the model's output |
|
batch_dict = DtiBindingdbKdTask.process_model_output( |
|
batch_dict, |
|
scalars_preds_processed_key="model.out.dti_bindingdb_kd", |
|
norm_y_mean=5.79384684128215, |
|
norm_y_std=1.33808027428196, |
|
) |
|
ans = { |
|
"model.out.dti_bindingdb_kd": float(batch_dict["model.out.dti_bindingdb_kd"][0]) |
|
} |
|
|
|
# Print prediction |
|
print(f"{ans=}") |
|
``` |
|
|
|
For more advanced usage, see our detailed example at: on `https://github.com/BiomedSciAI/biomed-multi-alignment` |
|
|
|
|
|
## Citation |
|
|
|
If you found our work useful, please consider giving a star to the repo and cite our paper: |
|
``` |
|
@article{TBD, |
|
title={TBD}, |
|
author={IBM Research Team}, |
|
jounal={arXiv preprint arXiv:TBD}, |
|
year={2024} |
|
} |
|
``` |