license: gpl-3.0 | |
language: | |
- en | |
library_name: transformers | |
This model uses the LTG-BERT architecture. | |
The model was trained on a combination of the BabyLM Dataset, the TinyStories Dataset, and generated data, | |
in accordance with the rules of the Stric track, and the 100M word budget. | |
The model was trained with 128 token sequence length | |
Hyperparameters used and evaluation scores will follow in a subsequent update. | |