pankajmathur commited on
Commit
5e33d3d
1 Parent(s): 18e1081

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -2
README.md CHANGED
@@ -39,7 +39,7 @@ Hello Orca Mini, what can you do for me?<|eot_id|>
39
  <|start_header_id|>assistant<|end_header_id|>
40
  ```
41
 
42
- Below shows a code example on how to use this model in default full precision (bf16) format, it requires
43
 
44
  ```python
45
  import torch
@@ -59,7 +59,7 @@ outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.0
59
  print(outputs[0]["generated_text"][-1])
60
  ```
61
 
62
- Below shows a code example on how to use this model in 8-bit format via bitsandbytes library
63
 
64
  ```python
65
  import torch
@@ -87,6 +87,31 @@ print(outputs[0]["generated_text"][-1])
87
 
88
  ```
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  Below shows a code example on how to do a tool use with this model and tranformer library
91
 
92
  Since **orca_mini_v8_0_70b** based upon LLaMA-3.3, it supports multiple tool use formats. You can see a full guide to prompt formatting [here](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/).
 
39
  <|start_header_id|>assistant<|end_header_id|>
40
  ```
41
 
42
+ Below shows a code example on how to use this model in default full precision (bf16) format, it requires around ~130GB VRAM
43
 
44
  ```python
45
  import torch
 
59
  print(outputs[0]["generated_text"][-1])
60
  ```
61
 
62
+ Below shows a code example on how to use this model in 4-bit format via bitsandbytes library, it requires around ~39GB VRAM
63
 
64
  ```python
65
  import torch
 
87
 
88
  ```
89
 
90
+ Below shows a code example on how to use this model in 8-bit format via bitsandbytes library, it requires around ~69GB VRAM
91
+
92
+ ```python
93
+ import torch
94
+ from transformers import BitsAndBytesConfig, pipeline
95
+
96
+ model_slug = "pankajmathur/orca_mini_v8_0_70b"
97
+ quantization_config = BitsAndBytesConfig(
98
+ load_in_8bit=True
99
+ )
100
+ pipeline = pipeline(
101
+ "text-generation",
102
+ model=model_slug,
103
+ model_kwargs={"quantization_config": quantization_config},
104
+ device_map="auto",
105
+ )
106
+ messages = [
107
+ {"role": "system", "content": "You are Orca Mini, a helpful AI assistant."},
108
+ {"role": "user", "content": "Hello Orca Mini, what can you do for me?"}
109
+ ]
110
+ outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95)
111
+ print(outputs[0]["generated_text"][-1])
112
+
113
+ ```
114
+
115
  Below shows a code example on how to do a tool use with this model and tranformer library
116
 
117
  Since **orca_mini_v8_0_70b** based upon LLaMA-3.3, it supports multiple tool use formats. You can see a full guide to prompt formatting [here](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/).