File size: 527 Bytes
f467091 5198f5c 499e3bf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
---
language:
- ru
---
# FRED-T5 1.7B (Full-scale Russian Enhanced Denoisers T5)
Architecture based on T5.
It has 24 layers and 1536 hidden size.
Model was trained on a mixture of 7 denoisers like UL2 with several differences .
It trained on Russian language corpus (300GB). The dataset is the same as for ruT5 models.
Bbpe tokenizer. First half of the time model was trained on the small part of all datasets (1%).
We continue to experiment...
We'll tell you more and release checkpoint to the public soon. |