Safetensors
English
olmo2
amanrangapur commited on
Commit
83f13fa
1 Parent(s): 7b1b2c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -11
README.md CHANGED
@@ -15,21 +15,18 @@ language:
15
 
16
  OLMo2 7B November 2024 is an updated version of the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model rocking a ____ point increase in ____, among other evaluations improvements, from an improved version of the Dolma dataset and staged training.
17
 
18
- OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
19
- The OLMo models are trained on the [Dolmino](https://huggingface.co/datasets/allenai/dolmino-mix-1124) dataset.
20
- We release all code, checkpoints, logs (coming soon), and details involved in training these models.
21
 
22
-
23
-
24
- The core models released in this batch are the following:
25
  | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
26
  |------|--------|---------|-------------|-----------------|----------------|
27
- | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo2-7B-1124) | 4 Trillion | 32 | 4096 | 32 | 4096 |
28
- | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo2-13B-1124) | 5 Trillion | 40 | 5120 | 42 | 4096 |
29
 
30
  ## Inference
31
 
32
- Proceed as usual with HuggingFace:
33
  ```python
34
  from transformers import AutoModelForCausalLM, AutoTokenizer
35
  olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
@@ -44,8 +41,16 @@ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
44
  >> 'Language modeling is the first step to build natural language generation...'
45
  ```
46
 
47
- Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
48
- The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
 
 
 
 
 
 
 
 
49
 
50
  We have released checkpoints for these models, for every 1000 training steps.
51
  The naming convention is `stepXXX-tokensYYYB`.
 
15
 
16
  OLMo2 7B November 2024 is an updated version of the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model rocking a ____ point increase in ____, among other evaluations improvements, from an improved version of the Dolma dataset and staged training.
17
 
18
+ OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
19
+ These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
20
+ The core models released in this batch include the following:
21
 
 
 
 
22
  | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
23
  |------|--------|---------|-------------|-----------------|----------------|
24
+ | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo-7B-0724-hf) | 4 Trillion | 32 | 4096 | 32 | 4096 |
25
+ | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo-1B-0724-hf) | 5 Trillion | 40 | 5120 | 42 | 4096 |
26
 
27
  ## Inference
28
 
29
+ You can use OLMo with the standard HuggingFace transformers library:
30
  ```python
31
  from transformers import AutoModelForCausalLM, AutoTokenizer
32
  olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
 
41
  >> 'Language modeling is the first step to build natural language generation...'
42
  ```
43
 
44
+ For faster performance, you can quantize the model using the following method:
45
+ ```python
46
+ AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124",
47
+ torch_dtype=torch.float16,
48
+ load_in_8bit=True) # Requires bitsandbytes
49
+ ```
50
+ The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using:
51
+ ```python
52
+ inputs.input_ids.to('cuda')
53
+ ```
54
 
55
  We have released checkpoints for these models, for every 1000 training steps.
56
  The naming convention is `stepXXX-tokensYYYB`.