A collection containing the baseline models for the BabyLM 2025 edition
AI & ML interests
Pretraining data constrained and cognitively relevant baby LLMs
Recent Activity
Papers
View all Papers
A collection containing the baseline models for the BabyLM 2025 edition
A multilingual collection of datasets modeling the language a person observes from birth until they acquire a native language.
A collection of datasets with multilingual data resources. Used as part of the BabyBabelLM initiatives.
Collection of subtitles as part of the multilingual BabyBabelLM datasets.