Update README.md
Browse files
README.md
CHANGED
@@ -25,3 +25,14 @@ same data, in the exact same order.
|
|
25 |
- License: Apache 2.0
|
26 |
- Contact: to ask questions about this model, please email Haiyang Wang.
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
- License: Apache 2.0
|
26 |
- Contact: to ask questions about this model, please email Haiyang Wang.
|
27 |
|
28 |
+
<figure>
|
29 |
+
|
30 |
+
| TokenFormer model | Layers | #QKV Param Tokens | #Output Param Tokens | #FFN Param Tokens | Model Dim | Heads | Batch Size | Learning Rate | Training Iterations |
|
31 |
+
| ----------------: | -----: | :---------------: | :------------------: | :---------------: | :-------: | :---: | :--------: | :-------------------: | :-------------------------: |
|
32 |
+
| 150M | 12 | 768 | 768 | 3072 | 768 | 12 | 2M | 6.0 x 10<sup>-4</sup> | 143000 |
|
33 |
+
| 450M | 24 | 1024 | 1024 | 4096 | 1024 | 16 | 2M | 6.0 x 10<sup>-4</sup> | 143000 |
|
34 |
+
| 900M | 32 | 1280 | 1280 | 5120 | 1280 | 16 | 2M | 6.0 x 10<sup>-4</sup> | 143000 |
|
35 |
+
| 1.5B | 40 | 1536 | 1536 | 6144 | 1536 | 16 | 2M | 6.0 x 10<sup>-4</sup> | 143000 |
|
36 |
+
<figcaption>Engineering details for the <i>TokenFormer</i>. </figcaption>
|
37 |
+
</figure>
|
38 |
+
|