This 89.6M parameters model is based on a custom RNN architecture loosely inspired by an unorthodox interpretation of Minimalism (Chomsky et al. 2023).
It is named eMG-RNN in reference to the closest computational implementation of the core Minimalist Grammar: expectation-based Minimalist Grammar.
The model implements two pathways, similar to those in an LSTM: one to manage “continuations” (the Merge gate) and another for “holding” (the Move gate). The specific “forget gating” system, inspired by GRUs, is designed to bias information flow in a way that may mimic C-command. The base model eMG-RNN-base uses 650 units for both the embedding and hidden layer (Gulordava et al., 2018). Only one hidden layer is adopted in this base model to appreciate the effect of individual gating systems.
It employs a BPE tokenizer with min_freq=3
, producing a lexicon of 67,572 tokens using the BabyLM 2024 10M dataset (Small-strict track) as the training corpus.
The model’s architecture, preprocessing routines, lm-eval modules for evaluation, and an alternative (unused here for English) tokenization procedure (MorPiece) are all available on GitHub at: cristianochesi/babylm-2024