Merge-Effect
Viewer • Updated • 1.01M • 3.64k • 119Note Original dataset used to train tokenisers and models.
pietrolesci/tokenisers
UpdatedNote Tokenisers trained on the MiniPile. The `_raw_tokenisers` folder contains the original tokenisers trained with a vocabulary size of 320k. Then, each folder is a `transformers`-compatible tokeniser of a smaller size.
pietrolesci/minipile
Viewer • Updated • 6.06M • 566Note Tokenised MiniPile dataset(s). Each split correponds to a tokeniser in `pietrolesci/tokenisers`.
pietrolesci/me57M-tied_minipile_bpe8064minipile
UpdatedNote Model trained for 50k steps on the MiniPile dataset. Each branch is a different checkpoint saved each 2k steps.
pietrolesci/me57M-tied_minipile_bpe32000minipile
UpdatedNote Model trained for 50k steps on the MiniPile dataset. Each branch is a different checkpoint saved each 2k steps.
pietrolesci/me57M-tied_minipile_bpe128000minipile
UpdatedNote Model trained for 50k steps on the MiniPile dataset. Each branch is a different checkpoint saved each 2k steps.
pietrolesci/me57M-tied_minipile_wordpiece32000minipile
UpdatedNote Model trained for 50k steps on the MiniPile dataset. Each branch is a different checkpoint saved each 2k steps.
pietrolesci/me57M-tied_minipile_bpe2wp32000minipile
UpdatedNote Model trained for 50k steps on the MiniPile dataset. Each branch is a different checkpoint saved every 2k steps. The bpe2wp nomenclature means that we choose the merges using the BPE objective, and we tokenised the MiniPile using the resulting vocabulary and the WordPiece tokenisation function (i.e., longest prefix match).
pietrolesci/me340M-tied_minipile_bpe32000minipile
Updated • 56Note Model trained for 50k steps on the MiniPile dataset. Each branch is a different checkpoint saved each 2k steps.
pietrolesci/me850M_minipile_bpe32000minipile
Updated • 54Note Model trained for 50k steps on the MiniPile dataset. Each branch is a different checkpoint saved each 2k steps.
pietrolesci/me-minipile-evals
Viewer • Updated • 1.82M • 336Note Log-probabilities computed on the validation set of the MiniPile dataset using the models above.