Haiyang-W commited on
Commit
094679d
1 Parent(s): b2a7924

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -25,3 +25,14 @@ same data, in the exact same order.
25
  - License: Apache 2.0
26
  - Contact: to ask questions about this model, please email Haiyang Wang.
27
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  - License: Apache 2.0
26
  - Contact: to ask questions about this model, please email Haiyang Wang.
27
 
28
+ <figure>
29
+
30
+ | TokenFormer model | Layers | #QKV Param Tokens | #Output Param Tokens | #FFN Param Tokens | Model Dim | Heads | Batch Size | Learning Rate | Training Iterations |
31
+ | ----------------: | -----: | :---------------: | :------------------: | :---------------: | :-------: | :---: | :--------: | :-------------------: | :-------------------------: |
32
+ | 150M | 12 | 768 | 768 | 3072 | 768 | 12 | 2M | 6.0 x 10<sup>-4</sup> | 143000 |
33
+ | 450M | 24 | 1024 | 1024 | 4096 | 1024 | 16 | 2M | 6.0 x 10<sup>-4</sup> | 143000 |
34
+ | 900M | 32 | 1280 | 1280 | 5120 | 1280 | 16 | 2M | 6.0 x 10<sup>-4</sup> | 143000 |
35
+ | 1.5B | 40 | 1536 | 1536 | 6144 | 1536 | 16 | 2M | 6.0 x 10<sup>-4</sup> | 143000 |
36
+ <figcaption>Engineering details for the <i>TokenFormer</i>. </figcaption>
37
+ </figure>
38
+