--- license: apache-2.0 datasets: - allenai/c4 language: - en --- # nanoT5-base-65kBPE-v2 > [!NOTE] > This is a "raw" pretrained model intended to be fine-tuned on downstream tasks - SiLU/gated-SiLU activation - 25% mask rate during pretrain - 65k vocab size, [adapted claude3 tokenizer](https://hf.co/BEE-spoke-data/claude-tokenizer-forT5) training code: https://github.com/pszemraj/nanoT5/tree/any-tokenizer ## plots more details are under `checkpoints/` loss ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/i_PtDB292icNcKAvh9eX5.png) gradients ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/FGllI4PIC1e5YVCspya-e.png) weights ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/IT5OApwU5HEII5-Huf5E7.png)