metadata
license: apache-2.0
datasets:
- allenai/c4
language:
- en
nanoT5-base-65kBPE-v2
This is a "raw" pretrained model intended to be fine-tuned on downstream tasks
- SiLU/gated-SiLU activation
- 25% mask rate during pretrain
- 65k vocab size, adapted claude3 tokenizer
training code: https://github.com/pszemraj/nanoT5/tree/any-tokenizer
plots
more details are under checkpoints/
loss
gradients
weights