Update README.md
Browse files
README.md
CHANGED
@@ -145,9 +145,10 @@ Wandb run https://wandb.ai/yepster/ul2-large-de-neddx2-en-nl/runs/s3z13day?works
|
|
145 |
* Pre-trained model used as starting point: yhavinga/ul2-large-dutch-english (3150k checkpoint)
|
146 |
|
147 |
The first three epochs were trained using the T5x framework, with a batch size of 128, a constant learning rate of 0.001. This process spanned from step 3150k to 3440k.
|
148 |
-
For the concluding epoch, a HuggingFace Flax based trainer was used with the following settings:
|
149 |
|
150 |
- **Batch Size**: Total effective batch size of 512, achieved via per-device settings and gradient accumulation.
|
|
|
151 |
- **Learning Rate**: Set at 0.0002, utilizing cosine scheduling.
|
152 |
- **Optimizer**: AdamW with beta1=0.9, beta2=0.997, epsilon=1e-8.
|
153 |
- **Weight Decay**: Configured to 0.001 for regularization.
|
|
|
145 |
* Pre-trained model used as starting point: yhavinga/ul2-large-dutch-english (3150k checkpoint)
|
146 |
|
147 |
The first three epochs were trained using the T5x framework, with a batch size of 128, a constant learning rate of 0.001. This process spanned from step 3150k to 3440k.
|
148 |
+
For the concluding ~half epoch, a HuggingFace Flax based trainer was used with the following settings:
|
149 |
|
150 |
- **Batch Size**: Total effective batch size of 512, achieved via per-device settings and gradient accumulation.
|
151 |
+
- **Num Train Samples**: 5120k.
|
152 |
- **Learning Rate**: Set at 0.0002, utilizing cosine scheduling.
|
153 |
- **Optimizer**: AdamW with beta1=0.9, beta2=0.997, epsilon=1e-8.
|
154 |
- **Weight Decay**: Configured to 0.001 for regularization.
|