Haiyang-W
/

TokenFormer-150M

Model card Files Files and versions Community

Haiyang-W commited on Oct 30

Commit

094679d

•

1 Parent(s): b2a7924

Update README.md

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -25,3 +25,14 @@ same data, in the exact same order.
 - License: Apache 2.0
 - Contact: to ask questions about this model, please email Haiyang Wang.

 - License: Apache 2.0
 - Contact: to ask questions about this model, please email Haiyang Wang.
+<figure>
+| TokenFormer model | Layers | #QKV Param Tokens | #Output Param Tokens | #FFN Param Tokens | Model Dim | Heads | Batch Size | Learning Rate         | Training Iterations         |
+| ----------------: | -----: | :---------------: | :------------------: | :---------------: | :-------: | :---: | :--------: | :-------------------: | :-------------------------: |
+| 150M              | 12     | 768               | 768                  | 3072              | 768       | 12    | 2M         | 6.0 x 10<sup>-4</sup> | 143000                      |
+| 450M              | 24     | 1024              | 1024                 | 4096              | 1024      | 16    | 2M         | 6.0 x 10<sup>-4</sup> | 143000                      |
+| 900M              | 32     | 1280              | 1280                 | 5120              | 1280      | 16    | 2M         | 6.0 x 10<sup>-4</sup> | 143000                      |
+| 1.5B              | 40     | 1536              | 1536                 | 6144              | 1536      | 16    | 2M         | 6.0 x 10<sup>-4</sup> | 143000                      |
+<figcaption>Engineering details for the <i>TokenFormer</i>. </figcaption>
+</figure>