---
license: apache-2.0
datasets:
- allenai/c4
language:
- en
---

# nanoT5-base-65kBPE-v2

> [!NOTE]
> This is a "raw" pretrained model intended to be fine-tuned on downstream tasks


- SiLU/gated-SiLU activation
- 25% mask rate during pretrain
- 65k vocab size, [adapted claude3 tokenizer](https://hf.co/BEE-spoke-data/claude-tokenizer-forT5)

training code: https://github.com/pszemraj/nanoT5/tree/any-tokenizer

## plots

more details are under `checkpoints/`

loss

![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/i_PtDB292icNcKAvh9eX5.png)

gradients

![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/FGllI4PIC1e5YVCspya-e.png)

weights

![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/IT5OApwU5HEII5-Huf5E7.png)