adan details
Browse files
README.md
CHANGED
@@ -207,6 +207,7 @@ TODO
|
|
207 |
|
208 |
#### Epochs 5 & 6
|
209 |
The following hyperparameters were used during training:
|
|
|
210 |
- learning_rate: 6e-05
|
211 |
- train_batch_size: 4
|
212 |
- eval_batch_size: 1
|
@@ -214,8 +215,9 @@ The following hyperparameters were used during training:
|
|
214 |
- distributed_type: multi-GPU
|
215 |
- gradient_accumulation_steps: 32
|
216 |
- total_train_batch_size: 128
|
217 |
-
- optimizer:
|
218 |
- lr_scheduler_type: constant_with_warmup
|
|
|
219 |
- num_epochs: 2
|
220 |
|
221 |
### Framework versions
|
|
|
207 |
|
208 |
#### Epochs 5 & 6
|
209 |
The following hyperparameters were used during training:
|
210 |
+
|
211 |
- learning_rate: 6e-05
|
212 |
- train_batch_size: 4
|
213 |
- eval_batch_size: 1
|
|
|
215 |
- distributed_type: multi-GPU
|
216 |
- gradient_accumulation_steps: 32
|
217 |
- total_train_batch_size: 128
|
218 |
+
- optimizer: _ADAN_ using lucidrains' `adan-pytorch` with default betas
|
219 |
- lr_scheduler_type: constant_with_warmup
|
220 |
+
- data type: TF32
|
221 |
- num_epochs: 2
|
222 |
|
223 |
### Framework versions
|