---
datasets:
- nilq/babylm-100M
language:
- en
---

This autoregressive model belongs to a series of rather small language models trained on the [BabyLM](https://babylm.github) data:
- the [baby_llama](https://huggingface.co/bbunzeck/baby_llama) model has few parameters and was trained on a small data set (10M tokens)
- the [**t**eenie_llama](https://huggingface.co/bbunzeck/teenie_llama) model has the same number of parameters but was trained on more **t**okens of text (100M)
- the [**w**eenie_llama](https://huggingface.co/bbunzeck/weenie_llama) model was trained on the small data set, but has more parameters/**w**eights
- the [**tw**eenie_llama](https://huggingface.co/bbunzeck/tweenie_llama) model features both -- more **t**okens (the larger data set) and more **w**eights (*viz.* parameters)


|                 | baby_llama | teenie_llama | weenie_llama | tweenie_llama |
|-----------------|-----------|-------------|-------------|--------------|
| Parameters      | 2.97M     | 2.97M       | 11.44M      | 11.44M       |
| hidden layers   | 8         | 8           | 16          | 16           |
| Attention heads | 8         | 8           | 16          | 16           |
| Embedding size  | 128       | 128         | 256         | 256          |
| Context size    | 128       | 128         | 256         | 256          |
| Vocab size      | 16k       | 16k         | 16k         | 16k          |


If you use this model in your research, please cite the following publication:

```
@inproceedings{bunzeck-zarriess-2024-fifty,
    title = "Fifty shapes of {BL}i{MP}: syntactic learning curves in language models are not uniform, but sometimes unruly",
    author = "Bunzeck, Bastian  and
      Zarrie{\ss}, Sina",
    editor = "Qiu, Amy  and
      Noble, Bill  and
      Pagmar, David  and
      Maraev, Vladislav  and
      Ilinykh, Nikolai",
    booktitle = "Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning",
    month = oct,
    year = "2024",
    address = "Gothenburg, Sweden",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.clasp-1.7",
    pages = "39--55",
}
```