yhavinga
/

ul2-large-en-nl-v2

text2text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

yhavinga commited on Jan 27

Commit

2973fc4

•

1 Parent(s): bae580e

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -145,9 +145,10 @@ Wandb run https://wandb.ai/yepster/ul2-large-de-neddx2-en-nl/runs/s3z13day?works
 * Pre-trained model used as starting point: yhavinga/ul2-large-dutch-english (3150k checkpoint)
 The first three epochs were trained using the T5x framework, with a batch size of 128, a constant learning rate of 0.001. This process spanned from step 3150k to 3440k.
-For the concluding epoch, a HuggingFace Flax based trainer was used with the following settings:
  - **Batch Size**: Total effective batch size of 512, achieved via per-device settings and gradient accumulation.
  - **Learning Rate**: Set at 0.0002, utilizing cosine scheduling.
  - **Optimizer**: AdamW with beta1=0.9, beta2=0.997, epsilon=1e-8.
  - **Weight Decay**: Configured to 0.001 for regularization.

 * Pre-trained model used as starting point: yhavinga/ul2-large-dutch-english (3150k checkpoint)
 The first three epochs were trained using the T5x framework, with a batch size of 128, a constant learning rate of 0.001. This process spanned from step 3150k to 3440k.
+For the concluding ~half epoch, a HuggingFace Flax based trainer was used with the following settings:
  - **Batch Size**: Total effective batch size of 512, achieved via per-device settings and gradient accumulation.
+ - **Num Train Samples**: 5120k.
  - **Learning Rate**: Set at 0.0002, utilizing cosine scheduling.
  - **Optimizer**: AdamW with beta1=0.9, beta2=0.997, epsilon=1e-8.
  - **Weight Decay**: Configured to 0.001 for regularization.