metadata
license: mit
datasets:
- RefinedWeb
- EleutherAI/OpenWebText2
library_name: open_lm
tokenizer: GPT-NeoX-20B
Resolving Discrepancies in Compute-Optimal Scaling of Language Models: Checkpoints
This repository contains the model checkpoints in the paper "Resolving Discrepancies in Compute-Optimal Scaling of Language Models", by Tomer Porian, Mithcell Wortsman, Jenia Jitsev, Ludwig Schmidt, and Yair Carmon.
Folder structure
Each checkpoint directory is in the path
dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}
where dataset, hparams, warmup, decay, params, maxstep
are as defined in the github repository, which contains the code and data for reproducing the figures in the paper.
Evaluation and text generation
The script evaluating_checkpoint.py
allows you to evaluate checkpoints on validation shards and generate text.
Move it to your open_lm
local copy and run the following commands:
python evaluating_checkpoint.py --checkpoint "path/to/checkpoint" --input-text "The quick brown fox jumps over the lazy dog."
or
python evaluating_checkpoint.py --checkpoint "path/to/checkpoint" --val-data "path/to/validation/shards"
Citation
@article{porian2024resolving,
title={Resolving Discrepancies in Compute-Optimal Scaling of Language Models},
author={Porian, Tomer and Wortsman, Mitchell and Jitsev, Jenia and Schmidt, Ludwig and Carmon, Yair},
journal={arXiv:2406.19146},
year={2024}
}