sberbank-ai
commited on
Commit
•
6777adb
1
Parent(s):
856eccb
Update README.md
Browse files![Screenshot 2023-01-21 at 11.36.52.png](https://s3.amazonaws.com/moonup/production/uploads/1674290304538-5f91b1208a61a359f44e1851.png)
README.md
CHANGED
@@ -10,12 +10,18 @@ Architecture based on T5.
|
|
10 |
|
11 |
It has 24 layers and 1536 hidden size.
|
12 |
|
13 |
-
Model
|
14 |
|
15 |
It trained on Russian language corpus (300GB). Dataset is the same as for ruT5 models.
|
16 |
|
17 |
-
Bbpe tokenizer.
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
We continue to experiment...
|
21 |
|
|
|
10 |
|
11 |
It has 24 layers and 1536 hidden size.
|
12 |
|
13 |
+
Model trained on a mixture of 7 denoisers like UL2 with several differences .
|
14 |
|
15 |
It trained on Russian language corpus (300GB). Dataset is the same as for ruT5 models.
|
16 |
|
17 |
+
Bbpe tokenizer.
|
18 |
|
19 |
+
First half of the time model trained on the small part of all datasets (1%,3GB) and without prefixes in each task.
|
20 |
+
|
21 |
+
For RSG we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
|
22 |
+
|
23 |
+
Training loss:
|
24 |
+
![Screenshot 2023-01-21 at 11.36.52.png](https://s3.amazonaws.com/moonup/production/uploads/1674290304538-5f91b1208a61a359f44e1851.png)
|
25 |
|
26 |
We continue to experiment...
|
27 |
|