mriedl commited on
Commit
bb7f7f1
·
1 Parent(s): b4dd09c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md CHANGED
@@ -1,3 +1,53 @@
1
  ---
2
  license: gpl-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: gpl-2.0
3
  ---
4
+
5
+ # Model Card for FupBERT
6
+
7
+ A descriptor free approach to predicting fraction unbound in human plasma.
8
+
9
+ ## Model Details
10
+
11
+ ### Model Description
12
+
13
+ Chemical specific parameters are either measured \emph{in vitro} or estimated using quantitative
14
+ structure–activity relationship (QSAR) models. The existing body of QSAR work relies on extracting a
15
+ set of descriptors or fingerprints, subset selection, and training a machine learning model. In this work,
16
+ we used a state-of-the-art natural language processing model, Bidirectional Encoder Representations from Transformers
17
+ (BERT), that allowed us to circumvent the need for calculation of these chemical descriptors. In this approach,
18
+ simplified molecular-input line-entry system (SMILES) strings were embedded in a high dimensional space using a
19
+ two-stage training approach. The model was first pre-trained on a masked SMILES token task and then fine-tuned on
20
+ a QSAR prediction task. The pre-training task learned meaningful high dimensional embeddings based upon the relationships
21
+ between the chemical tokens in the SMILES strings derived from the "in-stock" portion of the ZINC 15 dataset – a
22
+ large dataset of commercially available chemicals. The fine-tuning task then perturbed the pre-trained embeddings
23
+ to facilitate prediction of a specific QSAR endpoint of interest. The power of this model stems from the ability
24
+ to reuse the pre-trained model for multiple different fine-tuning tasks, reducing the computational burden of developing
25
+ multiple models for different endpoints. We used our framework to develop a predictive model for fraction unbound
26
+ in human plasma (fup). This approach is flexible, requires minimum domain expertise, and can be generalized for
27
+ other parameters of interest for rapid and accurate estimation of absorption, distribution, metabolism, excretion, and toxicity (ADMET).
28
+
29
+
30
+
31
+ - **Developed by:** Michael Riedl, Sayak Mukherjee, and Mitch Gauthier
32
+ - **Model type:** BERT
33
+
34
+ ### Model Sources
35
+
36
+ <!-- Provide the basic links for the model. -->
37
+
38
+ - **Paper:** TBA
39
+ - **Demo:** https://huggingface.co/spaces/battelle/FupBERT_Space
40
+
41
+ ## Citation
42
+
43
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
44
+
45
+ **BibTeX:**
46
+
47
+ [More Information Needed]
48
+
49
+ ## Model Card Contact
50
+
51
+ riedl@battelle.org
52
+
53
+