sberbank-ai
commited on
Commit
•
f158b54
1
Parent(s):
2daa064
Update README.md
Browse files
README.md
CHANGED
@@ -14,15 +14,16 @@ Model trained on a mixture of 7 denoisers like UL2 with several differences (htt
|
|
14 |
|
15 |
It trained on Russian language corpus (300GB). Dataset is the same as for ruT5 models.
|
16 |
|
17 |
-
Bbpe tokenizer.
|
18 |
|
19 |
First half of the time model trained on the small part of all datasets (1%,3GB) and without prefixes in each task.
|
20 |
|
21 |
For RSG we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
|
|
|
22 |
|
23 |
Total training time was around 45 days on 112 A100 GPUs.
|
24 |
|
25 |
-
Training loss
|
26 |
![Screenshot 2023-01-21 at 11.36.52.png](https://s3.amazonaws.com/moonup/production/uploads/1674290304538-5f91b1208a61a359f44e1851.png)
|
27 |
|
28 |
We continue to experiment...
|
|
|
14 |
|
15 |
It trained on Russian language corpus (300GB). Dataset is the same as for ruT5 models.
|
16 |
|
17 |
+
Bbpe tokenizer. 50257 + special tokens 107. Prefix tokens: '<LM>','<SC1>'...'<SC6>'
|
18 |
|
19 |
First half of the time model trained on the small part of all datasets (1%,3GB) and without prefixes in each task.
|
20 |
|
21 |
For RSG we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
|
22 |
+
RSG submit here https://russiansuperglue.com/login/submit_info/1936
|
23 |
|
24 |
Total training time was around 45 days on 112 A100 GPUs.
|
25 |
|
26 |
+
Training loss
|
27 |
![Screenshot 2023-01-21 at 11.36.52.png](https://s3.amazonaws.com/moonup/production/uploads/1674290304538-5f91b1208a61a359f44e1851.png)
|
28 |
|
29 |
We continue to experiment...
|