alvarobartt HF staff commited on
Commit
cfb8846
·
verified ·
1 Parent(s): b5422fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -4
README.md CHANGED
@@ -162,18 +162,15 @@ curl 0.0.0.0:8080/v1/chat/completions \
162
  }'
163
  ```
164
 
165
- Or programatically via the `huggingface_hub` Python client as follows (TGI is fully compatible with OpenAI so its `openai` SDK can also be used):
166
 
167
  ```python
168
  import os
169
- # Instead of `from openai import OpenAI`
170
  from huggingface_hub import InferenceClient
171
 
172
- # Instead of `client = OpenAI(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("OPENAI_API_KEY"))`
173
  client = InferenceClient(base_url="http://0.0.0.0:8080", api_key=os.getenv("HF_TOKEN", "-"))
174
 
175
  chat_completion = client.chat.completions.create(
176
- # Instead of `model="tgi"`
177
  model="hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
178
  messages=[
179
  {"role": "system", "content": "You are a helpful assistant."},
@@ -183,6 +180,24 @@ chat_completion = client.chat.completions.create(
183
  )
184
  ```
185
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
  ### vLLM
187
 
188
  To run vLLM with Llama 3.1 70B Instruct AWQ in INT4, you will need to have Docker installed (see [installation notes](https://docs.docker.com/engine/install/)) and run the latest vLLM Docker container as follows:
 
162
  }'
163
  ```
164
 
165
+ Or programatically via the `huggingface_hub` Python client as follows:
166
 
167
  ```python
168
  import os
 
169
  from huggingface_hub import InferenceClient
170
 
 
171
  client = InferenceClient(base_url="http://0.0.0.0:8080", api_key=os.getenv("HF_TOKEN", "-"))
172
 
173
  chat_completion = client.chat.completions.create(
 
174
  model="hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
175
  messages=[
176
  {"role": "system", "content": "You are a helpful assistant."},
 
180
  )
181
  ```
182
 
183
+ Alternatively, the OpenAI Python client can also be used (see [installation notes](https://github.com/openai/openai-python?tab=readme-ov-file#installation)) as follows:
184
+
185
+ ```python
186
+ import os
187
+ from openai import OpenAI
188
+
189
+ client = OpenAI(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("OPENAI_API_KEY", "-"))
190
+
191
+ chat_completion = client.chat.completions.create(
192
+ model="tgi",
193
+ messages=[
194
+ {"role": "system", "content": "You are a helpful assistant."},
195
+ {"role": "user", "content": "What is Deep Learning?"},
196
+ ],
197
+ max_tokens=128,
198
+ )
199
+ ```
200
+
201
  ### vLLM
202
 
203
  To run vLLM with Llama 3.1 70B Instruct AWQ in INT4, you will need to have Docker installed (see [installation notes](https://docs.docker.com/engine/install/)) and run the latest vLLM Docker container as follows: