Update README.md
Browse files
README.md
CHANGED
@@ -52,12 +52,13 @@ output # 'CCN(CC)CCN=C=S.Cc1cnc2c(c1)CCCC2N'
|
|
52 |
### Training Procedure
|
53 |
|
54 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
55 |
-
We used the Open Reaction Database (ORD) dataset for model training.
|
56 |
-
The command used for training is the following. For more information, please refer to the paper and GitHub repository.
|
57 |
|
58 |
```python
|
59 |
-
|
60 |
-
|
|
|
61 |
--epochs=80 \
|
62 |
--lr=2e-4 \
|
63 |
--batch_size=32 \
|
@@ -67,10 +68,10 @@ python train_without_duplicates.py \
|
|
67 |
--evaluation_strategy='epoch' \
|
68 |
--save_strategy='epoch' \
|
69 |
--logging_strategy='epoch' \
|
70 |
-
--train_data_path='
|
71 |
-
--valid_data_path='
|
72 |
-
--test_data_path='
|
73 |
-
--USPTO_test_data_path='
|
74 |
--pretrained_model_name_or_path='sagawa/CompoundT5'
|
75 |
```
|
76 |
|
|
|
52 |
### Training Procedure
|
53 |
|
54 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
55 |
+
We used the [Open Reaction Database (ORD) dataset](https://drive.google.com/file/d/1fa2MyLdN1vcA7Rysk8kLQENE92YejS9B/view?usp=drive_link) for model training. In addition, we used [USPTO_50k dataset](https://yzhang.hpc.nyu.edu/T5Chem/index.html)'s test split to prevent data leakage.
|
56 |
+
The command used for training is the following. For more information about data preprocessing and training, please refer to the paper and GitHub repository.
|
57 |
|
58 |
```python
|
59 |
+
cd task_retrosynthesis
|
60 |
+
python train.py \
|
61 |
+
--output_dir='t5' \
|
62 |
--epochs=80 \
|
63 |
--lr=2e-4 \
|
64 |
--batch_size=32 \
|
|
|
68 |
--evaluation_strategy='epoch' \
|
69 |
--save_strategy='epoch' \
|
70 |
--logging_strategy='epoch' \
|
71 |
+
--train_data_path='../data/preprocessed_ord_train.csv' \
|
72 |
+
--valid_data_path='../data/preprocessed_ord_valid.csv' \
|
73 |
+
--test_data_path='../data/preprocessed_ord_test.csv' \
|
74 |
+
--USPTO_test_data_path='../data/USPTO_50k/test.csv' \
|
75 |
--pretrained_model_name_or_path='sagawa/CompoundT5'
|
76 |
```
|
77 |
|