pszemraj commited on
Commit
a92e613
1 Parent(s): 986e79f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -2
README.md CHANGED
@@ -1,12 +1,23 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
3
  ---
4
 
5
- # nanoT5-65kBPE-v2
6
 
 
 
 
 
 
7
 
8
  ## plots
9
 
 
 
10
  loss
11
 
12
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/i_PtDB292icNcKAvh9eX5.png)
@@ -17,4 +28,4 @@ gradients
17
 
18
  weights
19
 
20
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/IT5OApwU5HEII5-Huf5E7.png)
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - allenai/c4
5
+ language:
6
+ - en
7
  ---
8
 
9
+ # nanoT5-base-65kBPE-v2
10
 
11
+ - SiLU/gated-SiLU activation
12
+ - 25% mask rate during pretrain
13
+ - 65k vocab size, [adapted claude3 tokenizer](https://hf.co/BEE-spoke-data/claude-tokenizer-forT5)
14
+
15
+ training code: https://github.com/pszemraj/nanoT5/tree/any-tokenizer
16
 
17
  ## plots
18
 
19
+ more details are under `checkpoints/`
20
+
21
  loss
22
 
23
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/i_PtDB292icNcKAvh9eX5.png)
 
28
 
29
  weights
30
 
31
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/IT5OApwU5HEII5-Huf5E7.png)