Update README.md
Browse files
README.md
CHANGED
@@ -15,12 +15,16 @@ tags:
|
|
15 |
|
16 |
An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com/pszemraj/nanoT5/tree/flan-dataset):
|
17 |
|
18 |
-
- tokenizer:
|
19 |
- data: `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
|
20 |
- context length: 1024 ctx
|
21 |
|
22 |
## details
|
23 |
|
|
|
|
|
|
|
|
|
24 |
|
25 |
1. Model:
|
26 |
- Dropout rate: 0.0
|
@@ -46,6 +50,7 @@ An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com
|
|
46 |
4. Hardware:
|
47 |
- Device: RTX 4080
|
48 |
- Precision: bfloat16, tf32
|
|
|
49 |
|
50 |
## plots
|
51 |
|
@@ -55,6 +60,9 @@ training loss
|
|
55 |
![loss](./checkpoints/loss_over_steps.png)
|
56 |
|
57 |
|
|
|
|
|
|
|
58 |
grad norm
|
59 |
|
60 |
![grad](./checkpoints/grad_l2_over_steps.png)
|
@@ -65,5 +73,6 @@ weights norm
|
|
65 |
|
66 |
![weights](./checkpoints/weights_l2_over_steps.png)
|
67 |
|
|
|
68 |
|
69 |
---
|
|
|
15 |
|
16 |
An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com/pszemraj/nanoT5/tree/flan-dataset):
|
17 |
|
18 |
+
- tokenizer: sentencepiece BPE w/ byte fallback, 48k vocab (from [vocab scaling laws](https://hf.co/collections/sail/scaling-laws-with-vocabulary-6699e0cbd77a8b2870859bfe))
|
19 |
- data: `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
|
20 |
- context length: 1024 ctx
|
21 |
|
22 |
## details
|
23 |
|
24 |
+
Detailed info, including training logs, configs, and checkpoints can be found under `checkpoints/` in this repo.
|
25 |
+
|
26 |
+
<details>
|
27 |
+
<summary><strong>Expand hyperparameter overview</strong></summary>
|
28 |
|
29 |
1. Model:
|
30 |
- Dropout rate: 0.0
|
|
|
50 |
4. Hardware:
|
51 |
- Device: RTX 4080
|
52 |
- Precision: bfloat16, tf32
|
53 |
+
</details>
|
54 |
|
55 |
## plots
|
56 |
|
|
|
60 |
![loss](./checkpoints/loss_over_steps.png)
|
61 |
|
62 |
|
63 |
+
<details>
|
64 |
+
<summary><strong>Expand grad and weights L2 norm plots</strong></summary>
|
65 |
+
|
66 |
grad norm
|
67 |
|
68 |
![grad](./checkpoints/grad_l2_over_steps.png)
|
|
|
73 |
|
74 |
![weights](./checkpoints/weights_l2_over_steps.png)
|
75 |
|
76 |
+
</details>
|
77 |
|
78 |
---
|