pszemraj
/

nanoT5-base-65kBPE-v2

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Jul 27

Commit

a92e613

•

1 Parent(s): 986e79f

Update README.md

Files changed (1) hide show

README.md +13 -2

README.md CHANGED Viewed

@@ -1,12 +1,23 @@
 ---
 license: apache-2.0
 ---
-# nanoT5-65kBPE-v2
 ## plots
 loss
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/i_PtDB292icNcKAvh9eX5.png)
@@ -17,4 +28,4 @@ gradients
 weights
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/IT5OApwU5HEII5-Huf5E7.png)

 ---
 license: apache-2.0
+datasets:
+- allenai/c4
+language:
+- en
 ---
+# nanoT5-base-65kBPE-v2
+- SiLU/gated-SiLU activation
+- 25% mask rate during pretrain
+- 65k vocab size, [adapted claude3 tokenizer](https://hf.co/BEE-spoke-data/claude-tokenizer-forT5)
+training code: https://github.com/pszemraj/nanoT5/tree/any-tokenizer
 ## plots
+more details are under `checkpoints/`
 loss
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/i_PtDB292icNcKAvh9eX5.png)
 weights
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/IT5OApwU5HEII5-Huf5E7.png)