Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ widget:
|
|
22 |
- min_lr 0.00001
|
23 |
- weight_decay 0.01
|
24 |
- grad_clip 1.0
|
25 |
-
-
|
26 |
- 约等于512 batch size, 100w步条件下的54%
|
27 |
|
28 |
最终loss:
|
|
|
22 |
- min_lr 0.00001
|
23 |
- weight_decay 0.01
|
24 |
- grad_clip 1.0
|
25 |
+
- 总共训练的句子```128*30w + 256*15w + 256*14.5w + 256*46.5w + 256*17w = 27648w```
|
26 |
- 约等于512 batch size, 100w步条件下的54%
|
27 |
|
28 |
最终loss:
|