This model uses the LTG-BERT architecture. The model was trained on a combination of the BabyLM Dataset, the TinyStories Dataset, and generated data, in accordance with the rules of the Stric track, and the 100M word budget.

The model was trained with 128 token sequence length

Hyperparameters used and evaluation scores will follow in a subsequent update.

Downloads last month: 3,174

Inference Providers NEW

Fill-Mask

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API does not yet support model repos that contain custom code.

Collection including nikitastheo/BERTtime-Stories-100m-nucleus-1

BERTtime-Stories

Collection

10 items • Updated Oct 13, 2024 • 2