|
--- |
|
license: gpl-2.0 |
|
--- |
|
|
|
# Model Card for FupBERT |
|
|
|
A descriptor free approach to predicting fraction unbound in human plasma. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
Chemical specific parameters are either measured _in vitro_ or estimated using quantitative |
|
structure–activity relationship (QSAR) models. The existing body of QSAR work relies on extracting a |
|
set of descriptors or fingerprints, subset selection, and training a machine learning model. In this work, |
|
we used a state-of-the-art natural language processing model, Bidirectional Encoder Representations from Transformers |
|
(BERT), that allowed us to circumvent the need for calculation of these chemical descriptors. In this approach, |
|
simplified molecular-input line-entry system (SMILES) strings were embedded in a high dimensional space using a |
|
two-stage training approach. The model was first pre-trained on a masked SMILES token task and then fine-tuned on |
|
a QSAR prediction task. The pre-training task learned meaningful high dimensional embeddings based upon the relationships |
|
between the chemical tokens in the SMILES strings derived from the "in-stock" portion of the ZINC 15 dataset – a |
|
large dataset of commercially available chemicals. The fine-tuning task then perturbed the pre-trained embeddings |
|
to facilitate prediction of a specific QSAR endpoint of interest. The power of this model stems from the ability |
|
to reuse the pre-trained model for multiple different fine-tuning tasks, reducing the computational burden of developing |
|
multiple models for different endpoints. We used our framework to develop a predictive model for fraction unbound |
|
in human plasma (fup). This approach is flexible, requires minimum domain expertise, and can be generalized for |
|
other parameters of interest for rapid and accurate estimation of absorption, distribution, metabolism, excretion, and toxicity (ADMET). |
|
|
|
|
|
|
|
- **Developed by:** Michael Riedl, Sayak Mukherjee, and Mitch Gauthier |
|
- **Model type:** BERT |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Paper:** Riedl, Michael, Sayak Mukherjee, and Mitch Gauthier. "Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma." Molecular Pharmaceutics (2023). |
|
- **Demo:** https://huggingface.co/spaces/battelle/FupBERT_Space |
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
``` |
|
@article{riedl2023descriptor, |
|
title={Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma}, |
|
author={Riedl, Michael and Mukherjee, Sayak and Gauthier, Mitch}, |
|
journal={Molecular Pharmaceutics}, |
|
publisher={ACS Publications} |
|
} |
|
``` |
|
|
|
## Model Card Contact |
|
|
|
riedl@battelle.org |
|
|
|
|
|
|