|
--- |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- AUC ROC |
|
- precision |
|
- recall |
|
tags: |
|
- biology |
|
- chemistry |
|
- therapeutic science |
|
- drug design |
|
- drug development |
|
- therapeutics |
|
library_name: tdc |
|
license: bsd-2-clause |
|
--- |
|
|
|
## Dataset description |
|
|
|
The CYP P450 genes are involved in the formation and breakdown (metabolism) of various molecules and chemicals within cells. Specifically, CYP3A4 is an important enzyme in the body, mainly found in the liver and in the intestine. It oxidizes small foreign organic molecules (xenobiotics), such as toxins or drugs, so that they can be removed from the body. |
|
|
|
## Task description |
|
Binary classification. Given a drug SMILES string, predict CYP3A4 inhibition. |
|
|
|
## Dataset statistics |
|
Total: 12,328 drugs |
|
|
|
## Pre-requisites |
|
Install the following packages |
|
``` |
|
pip install PyTDC |
|
pip install DeepPurpose |
|
pip install git+https://github.com/bp-kelley/descriptastorus |
|
pip install dgl torch torchvision |
|
``` |
|
You can also reference the colab notebook [here](https://colab.research.google.com/drive/1CL92SOCBS-eYDL99w8tjSNIG_ySXzMrG?usp=sharing) |
|
|
|
|
|
## Dataset split |
|
Random split on 70% training, 10% validation, and 20% testing |
|
|
|
To load the dataset in TDC, type |
|
|
|
```python |
|
from tdc.single_pred import ADME |
|
data = ADME(name = 'CYP3A4_Veith') |
|
``` |
|
|
|
## Model description |
|
|
|
Morgan chemical fingerprint with an MLP decoder. The model is tuned with 100 runs using the Ax platform. |
|
|
|
```python |
|
from tdc import tdc_hf_interface |
|
tdc_hf = tdc_hf_interface("CYP3A4_Veith-Morgan") |
|
# load deeppurpose model from this repo |
|
dp_model = tdc_hf.load_deeppurpose('./data') |
|
tdc_hf.predict_deeppurpose(dp_model, ['YOUR SMILES STRING']) |
|
``` |
|
|
|
## References |
|
* Dataset entry in Therapeutics Data Commons, https://tdcommons.ai/single_pred_tasks/adme/#cyp-p450-3a4-inhibition-veith-et-al |
|
* Veith, Henrike et al. “Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries.” Nature Biotechnology vol. 27,11 (2009): 1050-5. |