wwwaj commited on
Commit
2d03770
1 Parent(s): 9e55c3c

add Hardware section

Browse files
Files changed (1) hide show
  1. README.md +12 -6
README.md CHANGED
@@ -125,12 +125,6 @@ output = pipe(messages, **generation_args)
125
  print(output[0]['generated_text'])
126
  ```
127
 
128
- Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
129
-
130
- + V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()` with `attn_implementation="eager"`
131
- + CPU: use the **GGUF** quantized models [4K](https://aka.ms/Phi3-mini-4k-instruct-gguf)
132
- + Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [4K](https://aka.ms/Phi3-mini-4k-instruct-onnx)
133
-
134
  ## Responsible AI Considerations
135
 
136
  Like other language models, the Phi series models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
@@ -216,6 +210,18 @@ The number of k–shot examples is listed per-benchmark.
216
  * [Transformers](https://github.com/huggingface/transformers)
217
  * [Flash-Attention](https://github.com/HazyResearch/flash-attention)
218
 
 
 
 
 
 
 
 
 
 
 
 
 
219
  ## Cross Platform Support
220
 
221
  ONNX runtime ecosystem now supports Phi-3 Mini models across platforms and hardware. You can find the optimized ONNX models [here](https://aka.ms/Phi3-ONNX-HF).
 
125
  print(output[0]['generated_text'])
126
  ```
127
 
 
 
 
 
 
 
128
  ## Responsible AI Considerations
129
 
130
  Like other language models, the Phi series models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
 
210
  * [Transformers](https://github.com/huggingface/transformers)
211
  * [Flash-Attention](https://github.com/HazyResearch/flash-attention)
212
 
213
+ ## Hardware
214
+ Note that by default, the Phi-3-mini model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types:
215
+ * NVIDIA A100
216
+ * NVIDIA A6000
217
+ * NVIDIA H100
218
+
219
+ If you want to run the model on:
220
+ * NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager"
221
+ * CPU: use the **GGUF** quantized models [4K](https://aka.ms/Phi3-mini-4k-instruct-gguf)
222
+ + Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [4K](https://aka.ms/Phi3-mini-4k-instruct-onnx)
223
+
224
+
225
  ## Cross Platform Support
226
 
227
  ONNX runtime ecosystem now supports Phi-3 Mini models across platforms and hardware. You can find the optimized ONNX models [here](https://aka.ms/Phi3-ONNX-HF).