pszemraj commited on
Commit
44ce8b3
·
verified ·
1 Parent(s): 4f4a121

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -15,12 +15,16 @@ tags:
15
 
16
  An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com/pszemraj/nanoT5/tree/flan-dataset):
17
 
18
- - tokenizer: custom llama2 with 48k vocab (from [vocab scaling laws](https://hf.co/collections/sail/scaling-laws-with-vocabulary-6699e0cbd77a8b2870859bfe))
19
  - data: `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
20
  - context length: 1024 ctx
21
 
22
  ## details
23
 
 
 
 
 
24
 
25
  1. Model:
26
  - Dropout rate: 0.0
@@ -46,6 +50,7 @@ An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com
46
  4. Hardware:
47
  - Device: RTX 4080
48
  - Precision: bfloat16, tf32
 
49
 
50
  ## plots
51
 
@@ -55,6 +60,9 @@ training loss
55
  ![loss](./checkpoints/loss_over_steps.png)
56
 
57
 
 
 
 
58
  grad norm
59
 
60
  ![grad](./checkpoints/grad_l2_over_steps.png)
@@ -65,5 +73,6 @@ weights norm
65
 
66
  ![weights](./checkpoints/weights_l2_over_steps.png)
67
 
 
68
 
69
  ---
 
15
 
16
  An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com/pszemraj/nanoT5/tree/flan-dataset):
17
 
18
+ - tokenizer: sentencepiece BPE w/ byte fallback, 48k vocab (from [vocab scaling laws](https://hf.co/collections/sail/scaling-laws-with-vocabulary-6699e0cbd77a8b2870859bfe))
19
  - data: `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
20
  - context length: 1024 ctx
21
 
22
  ## details
23
 
24
+ Detailed info, including training logs, configs, and checkpoints can be found under `checkpoints/` in this repo.
25
+
26
+ <details>
27
+ <summary><strong>Expand hyperparameter overview</strong></summary>
28
 
29
  1. Model:
30
  - Dropout rate: 0.0
 
50
  4. Hardware:
51
  - Device: RTX 4080
52
  - Precision: bfloat16, tf32
53
+ </details>
54
 
55
  ## plots
56
 
 
60
  ![loss](./checkpoints/loss_over_steps.png)
61
 
62
 
63
+ <details>
64
+ <summary><strong>Expand grad and weights L2 norm plots</strong></summary>
65
+
66
  grad norm
67
 
68
  ![grad](./checkpoints/grad_l2_over_steps.png)
 
73
 
74
  ![weights](./checkpoints/weights_l2_over_steps.png)
75
 
76
+ </details>
77
 
78
  ---