license: gpl-2.0
Model Card for FupBERT
A descriptor free approach to predicting fraction unbound in human plasma.
Model Details
Model Description
Chemical specific parameters are either measured in vitro or estimated using quantitative structure–activity relationship (QSAR) models. The existing body of QSAR work relies on extracting a set of descriptors or fingerprints, subset selection, and training a machine learning model. In this work, we used a state-of-the-art natural language processing model, Bidirectional Encoder Representations from Transformers (BERT), that allowed us to circumvent the need for calculation of these chemical descriptors. In this approach, simplified molecular-input line-entry system (SMILES) strings were embedded in a high dimensional space using a two-stage training approach. The model was first pre-trained on a masked SMILES token task and then fine-tuned on a QSAR prediction task. The pre-training task learned meaningful high dimensional embeddings based upon the relationships between the chemical tokens in the SMILES strings derived from the "in-stock" portion of the ZINC 15 dataset – a large dataset of commercially available chemicals. The fine-tuning task then perturbed the pre-trained embeddings to facilitate prediction of a specific QSAR endpoint of interest. The power of this model stems from the ability to reuse the pre-trained model for multiple different fine-tuning tasks, reducing the computational burden of developing multiple models for different endpoints. We used our framework to develop a predictive model for fraction unbound in human plasma (fup). This approach is flexible, requires minimum domain expertise, and can be generalized for other parameters of interest for rapid and accurate estimation of absorption, distribution, metabolism, excretion, and toxicity (ADMET).
- Developed by: Michael Riedl, Sayak Mukherjee, and Mitch Gauthier
- Model type: BERT
Model Sources
- Paper: Riedl, Michael, Sayak Mukherjee, and Mitch Gauthier. "Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma." Molecular Pharmaceutics (2023).
- Demo: https://huggingface.co/spaces/battelle/FupBERT_Space
Citation
BibTeX:
@article{riedl2023descriptor,
title={Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma},
author={Riedl, Michael and Mukherjee, Sayak and Gauthier, Mitch},
journal={Molecular Pharmaceutics},
publisher={ACS Publications}
}