File size: 873 Bytes
a7e4772 bcda56b a7e4772 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
## Pretrained models for the paper *Scaling up Masked Diffusion Models on Text*
**Scaling law experiments**: We provided all pre-trained models in the *ar_safetensors* and *mdm_safetensors* folders.
For instance, the checkpoint `mdm-1028M-1600e18.safetensors` represents an MDM model with 1,028 million non-embedding
parameters and 1,600e18 training FLOPs. Similarly, the checkpoint `mdm-170M-100e18-rsl-0.01.safetensors` indicates
an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected
to random sequence lengths during pretraining.
**Math reasoning**: please see the *gsm8k_safetensors* folder.
**Conditional generation**: please see the *sharegpt_safetensors* folder.
**Reverse curse**: please see the *reverse_safetensors* folder
For all models, we provide models in `.pth` and `.safetensors` formats.
|