allenai
/

OLMo-7B-Instruct

@@ -24,7 +24,7 @@ We release all code, checkpoints, logs (coming soon), and details involved in tr
 OLMo 7B Instruct and OLMo SFT are two adapted versions of these models trained for better question answering.
 They show the performance gain that OLMo base models can achieve with existing fine-tuning techniques.
-*Note:* This model requires installing `ai2-olmo` with pip and using HuggingFace Transformers<=4.39. New versions of the model will be released soon with compatibility improvements.
 ## Model Details
@@ -82,11 +82,9 @@ pip install ai2-olmo
 ```
 Now, proceed as usual with HuggingFace:
 ```python
-import hf_olmo
-from transformers import AutoModelForCausalLM, AutoTokenizer
-olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-Instruct")
-tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-Instruct")
 chat = [
     { "role": "user", "content": "What is language modeling?" },
 ]
@@ -99,17 +97,8 @@ response = olmo.generate(input_ids=inputs.to(olmo.device), max_new_tokens=100, d
 print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
 >> '<|user|>\nWhat is language modeling?\n<|assistant|>\nLanguage modeling is a type of natural language processing (NLP) task or machine learning task that...'
 ```
-Alternatively, with the pipeline abstraction:
-```python
-import hf_olmo
-from transformers import pipeline
-olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B-Instruct")
-print(olmo_pipe("What is language modeling?"))
->> '[{'generated_text': 'What is language modeling?\nLanguage modeling is a type of natural language processing (NLP) task...'}]'
-```
-Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-Instruct", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
 The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
 Note, you may see the following error if `ai2-olmo` is not installed correctly, which is caused by internal Python check naming. We'll update the code soon to make this error clearer.

 OLMo 7B Instruct and OLMo SFT are two adapted versions of these models trained for better question answering.
 They show the performance gain that OLMo base models can achieve with existing fine-tuning techniques.
+*Note:* This model requires installing `ai2-olmo` with pip and using `ai2-olmo`>=0.3.0 or HuggingFace Transformers<=4.39. New versions of the model will be released soon with compatibility improvements.
 ## Model Details
 ```
 Now, proceed as usual with HuggingFace:
 ```python
+from hf_olmo import OLMoForCausalLM, OLMoTokenizerFast
+olmo = OLMoForCausalLM.from_pretrained("allenai/OLMo-7B-Instruct")
+tokenizer = OLMoTokenizerFast.from_pretrained("allenai/OLMo-7B-Instruct")
 chat = [
     { "role": "user", "content": "What is language modeling?" },
 ]
 print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
 >> '<|user|>\nWhat is language modeling?\n<|assistant|>\nLanguage modeling is a type of natural language processing (NLP) task or machine learning task that...'
 ```
+You can make this slightly faster by quantizing the model, e.g. `OLMoForCausalLM.from_pretrained("allenai/OLMo-7B-Instruct", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
 The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
 Note, you may see the following error if `ai2-olmo` is not installed correctly, which is caused by internal Python check naming. We'll update the code soon to make this error clearer.