Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ tags:
|
|
8 |
- rwkv
|
9 |
license: apache-2.0
|
10 |
datasets:
|
11 |
-
-
|
12 |
|
13 |
---
|
14 |
|
@@ -33,3 +33,11 @@ RWKV-4-Pile-430M-20220808-8066.pth : Trained on the Pile for 333B tokens.
|
|
33 |
* PIQA acc 67.52%
|
34 |
* SC2016 acc 63.87%
|
35 |
* Hellaswag acc_norm 40.90%
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
- rwkv
|
9 |
license: apache-2.0
|
10 |
datasets:
|
11 |
+
- the_pile
|
12 |
|
13 |
---
|
14 |
|
|
|
33 |
* PIQA acc 67.52%
|
34 |
* SC2016 acc 63.87%
|
35 |
* Hellaswag acc_norm 40.90%
|
36 |
+
|
37 |
+
With tiny attention (--tiny_att_dim 512 --tiny_att_layer 18):
|
38 |
+
RWKV-4a-Pile-433M-20221223-8039.pth
|
39 |
+
* Pile loss 2.2394
|
40 |
+
* LAMBADA ppl 10.54, acc 50.20%
|
41 |
+
* PIQA acc 68.12%
|
42 |
+
* SC2016 acc 63.55%
|
43 |
+
* Hellaswag acc_norm 40.82%
|