Safetensors
English
olmo2
amanrangapur commited on
Commit
22fbf33
1 Parent(s): b7030f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -56
README.md CHANGED
@@ -1,10 +1,11 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - allenai/dolmino-mix-1124
5
- language:
6
- - en
7
- ---
 
8
 
9
  ## Model Details
10
 
@@ -21,16 +22,16 @@ The core models released in this batch include the following:
21
 
22
  | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
23
  |------|--------|---------|-------------|-----------------|----------------|
24
- | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo-2-1124-7B) | 4 Trillion | 32 | 4096 | 32 | 4096 |
25
- | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo-2-13B-1124) | 5 Trillion | 40 | 5120 | 42 | 4096 |
26
 
27
  ## Inference
28
 
29
  You can use OLMo with the standard HuggingFace transformers library:
30
  ```python
31
  from transformers import AutoModelForCausalLM, AutoTokenizer
32
- olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
33
- tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo2-7B-1124")
34
  message = ["Language modeling is "]
35
  inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
36
  # optional verifying cuda
@@ -43,7 +44,7 @@ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
43
 
44
  For faster performance, you can quantize the model using the following method:
45
  ```python
46
- AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124",
47
  torch_dtype=torch.float16,
48
  load_in_8bit=True) # Requires bitsandbytes
49
  ```
@@ -57,13 +58,13 @@ The naming convention is `stepXXX-tokensYYYB`.
57
 
58
  To load a specific model revision with HuggingFace, simply add the argument `revision`:
59
  ```bash
60
- olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124", revision="step1000-tokens5B")
61
  ```
62
 
63
  Or, you can access all the revisions for the models via the following code snippet:
64
  ```python
65
  from huggingface_hub import list_repo_refs
66
- out = list_repo_refs("allenai/OLMo2-7B-1124")
67
  branches = [b.name for b in out.branches]
68
  ```
69
 
@@ -104,52 +105,54 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?
104
  <!-- - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal) -->
105
 
106
 
107
- <!-- TODO -->
108
  ## Evaluation
109
- `TODO`
110
- <!-- Core model results for OLMo2 7B models are found below:
111
-
112
- | Task | Llama-7b | Llama2-7b | Falcon-7b | Mpt-7b | OLMo-7B | Llama2-13b | OLMo 7B April 2024 | **OLMo2 7B** |
113
- |-------------------|----------|-----------|-----------|--------|---------|------------|--------------------|-----------------------|
114
- | arc_c | 44.5 | 48.5 | 47.5 | 46.5 | 48.5 | 52.8 | 42.5 | 43.8 |
115
- | arc_e | 67.9 | 69.5 | 70.4 | 70.5 | 65.4 | 73.7 | 67.2 | 68.8 |
116
- | boolq | 75.4 | 80.2 | 74.6 | 74.2 | 73.4 | 82.2 | 83.7 | 78.9 |
117
- | copa | 91.0 | 86.0 | 86.0 | 85.0 | 90.0 | 90.0 | 86.0 | 84.0 |
118
- | hellaswag | 76.2 | 76.8 | 75.9 | 77.6 | 76.4 | 78.6 | 75.5 | 77.4 |
119
- | openbookqa | 51.2 | 48.4 | 53.0 | 48.6 | 50.4 | 51.8 | 50.0 | 48.2 |
120
- | piqa | 77.2 | 76.7 | 78.5 | 77.3 | 78.4 | 79.0 | 77.5 | 78.2 |
121
- | sciq | 93.9 | 94.5 | 93.9 | 93.7 | 93.8 | 95.5 | 96.7 | 97.0 |
122
- | winogrande | 70.5 | 69.4 | 68.9 | 69.9 | 67.9 | 73.5 | 69.8 | 68.8 |
123
- | truthfulQA (MC2) | 33.9 | 38.5 | 34.0 | 33.0 | 36.0 | 36.8 | 35.8 | 36.5 |
124
- | MMLU (5 shot MC) | 31.5 | 45.0 | 24.0 | 30.8 | 28.3 | 55.5 | 52.0 | 53.4 |
125
- | GSM8k | 10.0 | 12.0 | 4.0 | 4.5 | 8.5 | 25.0 | 29.0 | 35.0 |
126
- | Full average | 60.3 | 62.1 | 59.2 | 59.3 | 59.8 | 66.2 | 63.8 | 64.2 |
127
-
128
- And for OLMo 13B model:
129
-
130
- | task | random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
131
- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
132
- | arc_challenge | 25 | 43.81 | 33.11 | 34.78 | 34.45 | 36.5 |
133
- | arc_easy | 25 | 63.68 | 50.18 | 53.16 | 58.07 | 55.3 |
134
- | boolq | 50 | 76.6 | 61.8 | 64.6 | 60.7 | 67.5 |
135
- | copa | 50 | 84 | 72 | 78 | 79 | 83.0 |
136
- | hellaswag | 25 | 68.2 | 44.7 | 58.7 | 62.5 | 66.9 |
137
- | openbookqa | 25 | 45.8 | 37.8 | 43.6 | 46.4 | 46.4 |
138
- | piqa | 50 | 74 | 69.1 | 71.1 | 73.7 | 74.9 |
139
- | sciq | 25 | 94.7 | 86 | 90.5 | 88.1 | 93.4 |
140
- | winogrande | 50 | 64.9 | 53.3 | 58.9 | 58.9 | 61.4 |
141
- | Average | 36.11 | 68.41 | 56.44 | 61.48 | 62.42 | 65.0 |
142
- -->
143
 
144
  ## Model Details
145
 
146
- ### Data
147
- `TODO`
148
-
149
- ### Staged training / annealing
150
- `TODO`
151
-
152
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
 
154
 
155
  ## Bias, Risks, and Limitations
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - allenai/dolmino-mix-1124
5
+ - allenai/dolma
6
+ language:
7
+ - en
8
+ ---
9
 
10
  ## Model Details
11
 
 
22
 
23
  | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
24
  |------|--------|---------|-------------|-----------------|----------------|
25
+ | [OLMo2-7B](https://huggingface.co/allenai/OLMo-2-1124-7B) | 4 Trillion | 32 | 4096 | 32 | 4096 |
26
+ | [OLMo2- 13B](https://huggingface.co/allenai/OLMo-2-1124-13B) | 5 Trillion | 40 | 5120 | 42 | 4096 |
27
 
28
  ## Inference
29
 
30
  You can use OLMo with the standard HuggingFace transformers library:
31
  ```python
32
  from transformers import AutoModelForCausalLM, AutoTokenizer
33
+ olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B")
34
+ tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-2-1124-7B")
35
  message = ["Language modeling is "]
36
  inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
37
  # optional verifying cuda
 
44
 
45
  For faster performance, you can quantize the model using the following method:
46
  ```python
47
+ AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B",
48
  torch_dtype=torch.float16,
49
  load_in_8bit=True) # Requires bitsandbytes
50
  ```
 
58
 
59
  To load a specific model revision with HuggingFace, simply add the argument `revision`:
60
  ```bash
61
+ olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B", revision="step1000-tokens5B")
62
  ```
63
 
64
  Or, you can access all the revisions for the models via the following code snippet:
65
  ```python
66
  from huggingface_hub import list_repo_refs
67
+ out = list_repo_refs("allenai/OLMo-2-1124-7B")
68
  branches = [b.name for b in out.branches]
69
  ```
70
 
 
105
  <!-- - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal) -->
106
 
107
 
 
108
  ## Evaluation
109
+ Core model results for OLMo2 7B and 13B models are found below.
110
+
111
+ | Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMWLUPro | TriviaQA |
112
+ |-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------|
113
+ | Gemma-2-9B | 4.4·10²³ | 52.9 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 1.1 | 42 | 0.9 |
114
+ | Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 |
115
+ | Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 |
116
+ | Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 |
117
+ | Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 |
118
+ | Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 |
119
+ | Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 |
120
+ | StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 |
121
+ | Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 |
122
+ | Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 |
123
+ | OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 |
124
+ | MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 |
125
+ | OLMo-0424-7B | 0.9·10²³ | 50.7 | 66.9 | 80.1 | 73.6 | 54.3 | 50 | 29.6 | 43.9 | 27.7 | 22.1 | 58.8 |
126
+ | DCLM-7B | 1.0·10²³ | 56.9 | 79.8 | 82.3 | 77.3 | 64.4 | 39.3 | 28.8 | 47.5 | 46.1 | 31.3 | 72.1 |
127
+ | **OLMo-2-1124-7B** | 1.8·10²³ | 62.9 | 79.8 | 83.8 | 77.2 | 63.7 | 60.8 | 36.9 | 50.4 | 67.5 | 31 | 78 |
128
+ | **OLMo-2-1124-13B** | 4.6·10²³ | 68.3 | 83.5 | 86.4 | 81.5 | 67.5 | 70.7 | 46.7 | 54.2 | 75.1 | 35.1 | 81.9 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
 
130
  ## Model Details
131
 
132
+ ### Pretraining
133
+ | | **OLMo 2 7B** | **OLMo 2 13B** |
134
+ |-------------------|------------|------------|
135
+ | Pretraining Stage 1<br>([OLMo-Mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124)) | 4 trillion tokens<br>(1 epoch) | 5 trillion tokens<br>(1.2 epochs) |
136
+ | Pretraining Stage 2<br>([Dolmino-Mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124)) | 50B tokens (3 runs)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* |
137
+ | Post-training<br>([Tulu 3 SFT OLMo mix](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-mixture)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix)) |
138
+
139
+ #### Stage 1: Initial Pretraining
140
+ - Dataset: [OLMo-Mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)
141
+ - Coverage: 90%+ of total pretraining budget
142
+ - 7B Model: ~1 epoch
143
+ - 13B Model: 1.2 epochs (5T tokens)
144
+
145
+ #### Stage 2: Fine-tuning
146
+ - Dataset: [Dolmino-Mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124) (843B tokens)
147
+ - Three training mixes:
148
+ - 50B tokens
149
+ - 100B tokens
150
+ - 300B tokens
151
+ - Mix composition: 50% high-quality data + academic/Q&A/instruction/math content
152
+
153
+ #### Model Merging
154
+ - 7B Model: 3 versions trained on 50B mix, merged via model souping
155
+ - 13B Model: 3 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint
156
 
157
 
158
  ## Bias, Risks, and Limitations