battelle
/

FupBERT

Feature Extraction

Transformers

PyTorch

FupBERT

custom_code

Model card Files Files and versions Community

mriedl commited on Jul 3, 2023

Commit

bb7f7f1

1 Parent(s): b4dd09c

Update README.md

Browse files

Files changed (1) hide show

README.md +50 -0

README.md CHANGED Viewed

@@ -1,3 +1,53 @@
 ---
 license: gpl-2.0
 ---

 ---
 license: gpl-2.0
 ---
+# Model Card for FupBERT
+A descriptor free approach to predicting fraction unbound in human plasma.
+## Model Details
+### Model Description
+Chemical specific parameters are either measured \emph{in vitro} or estimated using quantitative
+structure–activity relationship (QSAR) models. The existing body of QSAR work relies on extracting a
+set of descriptors or fingerprints, subset selection, and training a machine learning model. In this work,
+we used a state-of-the-art natural language processing model, Bidirectional Encoder Representations from Transformers
+(BERT), that allowed us to circumvent the need for calculation of these chemical descriptors. In this approach,
+simplified molecular-input line-entry system (SMILES) strings were embedded in a high dimensional space using a
+two-stage training approach. The model was first pre-trained on a masked SMILES token task and then fine-tuned on
+a QSAR prediction task. The pre-training task learned meaningful high dimensional embeddings based upon the relationships
+between the chemical tokens in the SMILES strings derived from the "in-stock" portion of the ZINC 15 dataset – a
+large dataset of commercially available chemicals. The fine-tuning task then perturbed the pre-trained embeddings
+to facilitate prediction of a specific QSAR endpoint of interest. The power of this model stems from the ability
+to reuse the pre-trained model for multiple different fine-tuning tasks, reducing the computational burden of developing
+multiple models for different endpoints. We used our framework to develop a predictive model for fraction unbound
+in human plasma (fup). This approach is flexible, requires minimum domain expertise, and can be generalized for
+other parameters of interest for rapid and accurate estimation of absorption, distribution, metabolism, excretion, and toxicity (ADMET).
+- **Developed by:** Michael Riedl, Sayak Mukherjee, and Mitch Gauthier
+- **Model type:** BERT
+### Model Sources
+<!-- Provide the basic links for the model. -->
+- **Paper:** TBA
+- **Demo:** https://huggingface.co/spaces/battelle/FupBERT_Space
+## Citation
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+## Model Card Contact
+riedl@battelle.org