Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,9 @@ license: mit
|
|
3 |
tags:
|
4 |
- chemistry
|
5 |
- smiles
|
|
|
|
|
|
|
6 |
---
|
7 |
|
8 |
# Model Card for Model hogru/MolReactGen-GuacaMol-Molecules
|
@@ -44,7 +47,6 @@ The main use of this model is to pass the master's examination of the author ;-)
|
|
44 |
The model can be used in a Hugging Face text generation pipeline. For the intended use case a wrapper around the raw text generation pipeline is needed. This is the [`generate.py` from the repository](https://github.com/hogru/MolReactGen/blob/main/src/molreactgen/generate.py).
|
45 |
The model has a default `GenerationConfig()` (`generation_config.json`) which can be overwritten. Depending on the number of molecules to be generated (`num_return_sequences` in the `JSON` file) this might take a while. The generation code above shows a progress bar during generation.
|
46 |
|
47 |
-
|
48 |
## Bias, Risks, and Limitations
|
49 |
|
50 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
@@ -63,11 +65,11 @@ The model generates molecules that are similar to the GuacaMol training data, wh
|
|
63 |
|
64 |
The default Hugging Face `Trainer()` has been used, with an `EarlyStoppingCallback()`.
|
65 |
|
66 |
-
|
67 |
|
68 |
The training data was pre-processed with a `PreTrainedTokenizerFast()` trained on the training data with a character level pre-tokenizer and Unigram as the sub-word tokenization algorithm with a vocabulary size of 88. Other tokenizers can be configured.
|
69 |
|
70 |
-
|
71 |
|
72 |
- **Batch size:** 64
|
73 |
- **Gradient accumulation steps:** 4
|
@@ -86,7 +88,7 @@ More configuration (options) can be found in the [`conf`](https://github.com/hog
|
|
86 |
|
87 |
Please see the slides / the poster mentioned above.
|
88 |
|
89 |
-
|
90 |
|
91 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
92 |
|
|
|
3 |
tags:
|
4 |
- chemistry
|
5 |
- smiles
|
6 |
+
widget:
|
7 |
+
- text: "^"
|
8 |
+
example_title: "Sample molecule | SMILES"
|
9 |
---
|
10 |
|
11 |
# Model Card for Model hogru/MolReactGen-GuacaMol-Molecules
|
|
|
47 |
The model can be used in a Hugging Face text generation pipeline. For the intended use case a wrapper around the raw text generation pipeline is needed. This is the [`generate.py` from the repository](https://github.com/hogru/MolReactGen/blob/main/src/molreactgen/generate.py).
|
48 |
The model has a default `GenerationConfig()` (`generation_config.json`) which can be overwritten. Depending on the number of molecules to be generated (`num_return_sequences` in the `JSON` file) this might take a while. The generation code above shows a progress bar during generation.
|
49 |
|
|
|
50 |
## Bias, Risks, and Limitations
|
51 |
|
52 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
|
|
65 |
|
66 |
The default Hugging Face `Trainer()` has been used, with an `EarlyStoppingCallback()`.
|
67 |
|
68 |
+
### Preprocessing
|
69 |
|
70 |
The training data was pre-processed with a `PreTrainedTokenizerFast()` trained on the training data with a character level pre-tokenizer and Unigram as the sub-word tokenization algorithm with a vocabulary size of 88. Other tokenizers can be configured.
|
71 |
|
72 |
+
### Training Hyperparameters
|
73 |
|
74 |
- **Batch size:** 64
|
75 |
- **Gradient accumulation steps:** 4
|
|
|
88 |
|
89 |
Please see the slides / the poster mentioned above.
|
90 |
|
91 |
+
### Metrics
|
92 |
|
93 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
94 |
|