Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,7 @@ The following table summarizes the parameters for all available models
|
|
29 |
|`dataset` |`gsarti/clean_mc4_it` |`gsarti/clean_mc4_it` |`gsarti/clean_mc4_it` |`oscar/unshuffled_deduplicated_it`|
|
30 |
|`architecture` |`google/t5-v1_1-small` |`google/t5-v1_1-base` |`google/t5-v1_1-large` |`t5-base` |
|
31 |
|`learning rate` | 5e-3 | 5e-3 | 5e-3 | 1e-2 |
|
32 |
-
|`steps` | 1'050'000 | 1'050'000 |
|
33 |
|`training time` | 36 hours | 101 hours | 370 hours | 98 hours |
|
34 |
|`ff projection` |`gated-gelu` |`gated-gelu` |`gated-gelu` |`relu` |
|
35 |
|`tie embeds` |`false` |`false` |`false` |`true` |
|
|
|
29 |
|`dataset` |`gsarti/clean_mc4_it` |`gsarti/clean_mc4_it` |`gsarti/clean_mc4_it` |`oscar/unshuffled_deduplicated_it`|
|
30 |
|`architecture` |`google/t5-v1_1-small` |`google/t5-v1_1-base` |`google/t5-v1_1-large` |`t5-base` |
|
31 |
|`learning rate` | 5e-3 | 5e-3 | 5e-3 | 1e-2 |
|
32 |
+
|`steps` | 1'050'000 | 1'050'000 | 2'100'000 | 258'000 |
|
33 |
|`training time` | 36 hours | 101 hours | 370 hours | 98 hours |
|
34 |
|`ff projection` |`gated-gelu` |`gated-gelu` |`gated-gelu` |`relu` |
|
35 |
|`tie embeds` |`false` |`false` |`false` |`true` |
|