TomerPorian
commited on
Commit
•
b2fa501
1
Parent(s):
5eb16a6
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Resolving Discrepancies in Compute-Optimal Scaling of Language Models: Checkpoints
|
2 |
|
3 |
This repository contains the model checkpoints in the paper ["Resolving Discrepancies in Compute-Optimal Scaling of Language Models"](https://arxiv.org/abs/2406.19146), by Tomer Porian, Mithcell Wortsman, Jenia Jitsev, Ludwig Schmidt, and Yair Carmon.
|
@@ -8,7 +19,18 @@ Each checkpoint directory is in the path
|
|
8 |
|
9 |
`dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}`
|
10 |
|
11 |
-
where `dataset, hparams, warmup, decay, params, maxstep` are as defined in the github repository.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
## Citation
|
14 |
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
|
4 |
+
datasets:
|
5 |
+
- RefinedWeb
|
6 |
+
- EleutherAI/OpenWebText2
|
7 |
+
|
8 |
+
library_name: open_lm
|
9 |
+
|
10 |
+
tokenizer: GPT-NeoX-20B
|
11 |
+
---
|
12 |
# Resolving Discrepancies in Compute-Optimal Scaling of Language Models: Checkpoints
|
13 |
|
14 |
This repository contains the model checkpoints in the paper ["Resolving Discrepancies in Compute-Optimal Scaling of Language Models"](https://arxiv.org/abs/2406.19146), by Tomer Porian, Mithcell Wortsman, Jenia Jitsev, Ludwig Schmidt, and Yair Carmon.
|
|
|
19 |
|
20 |
`dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}`
|
21 |
|
22 |
+
where `dataset, hparams, warmup, decay, params, maxstep` are as defined in the ["github repository"](https://github.com/formll/resolving-scaling-law-discrepancies), which contains the code and data for reproducing the figures in the paper.
|
23 |
+
|
24 |
+
## Code snippet
|
25 |
+
|
26 |
+
```
|
27 |
+
# create args.yaml file for the model size...
|
28 |
+
args.resume = f'dataset={dataset}/hparams={hparams}_warmup={warmup}_decay={decay}/params={int(params / 1e6)}M_maxstep={maxstep}/{model_name}.pt'
|
29 |
+
# create model with open_lm create_model function...
|
30 |
+
load_model(args, model, None)
|
31 |
+
# create data with open_lm get_data function...
|
32 |
+
metrics = evaluate(model, data, 0, args, None)
|
33 |
+
```
|
34 |
|
35 |
## Citation
|
36 |
|