Update README.md
Browse files
README.md
CHANGED
@@ -162,18 +162,15 @@ curl 0.0.0.0:8080/v1/chat/completions \
|
|
162 |
}'
|
163 |
```
|
164 |
|
165 |
-
Or programatically via the `huggingface_hub` Python client as follows
|
166 |
|
167 |
```python
|
168 |
import os
|
169 |
-
# Instead of `from openai import OpenAI`
|
170 |
from huggingface_hub import InferenceClient
|
171 |
|
172 |
-
# Instead of `client = OpenAI(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("OPENAI_API_KEY"))`
|
173 |
client = InferenceClient(base_url="http://0.0.0.0:8080", api_key=os.getenv("HF_TOKEN", "-"))
|
174 |
|
175 |
chat_completion = client.chat.completions.create(
|
176 |
-
# Instead of `model="tgi"`
|
177 |
model="hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
|
178 |
messages=[
|
179 |
{"role": "system", "content": "You are a helpful assistant."},
|
@@ -183,6 +180,24 @@ chat_completion = client.chat.completions.create(
|
|
183 |
)
|
184 |
```
|
185 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
186 |
### vLLM
|
187 |
|
188 |
To run vLLM with Llama 3.1 70B Instruct AWQ in INT4, you will need to have Docker installed (see [installation notes](https://docs.docker.com/engine/install/)) and run the latest vLLM Docker container as follows:
|
|
|
162 |
}'
|
163 |
```
|
164 |
|
165 |
+
Or programatically via the `huggingface_hub` Python client as follows:
|
166 |
|
167 |
```python
|
168 |
import os
|
|
|
169 |
from huggingface_hub import InferenceClient
|
170 |
|
|
|
171 |
client = InferenceClient(base_url="http://0.0.0.0:8080", api_key=os.getenv("HF_TOKEN", "-"))
|
172 |
|
173 |
chat_completion = client.chat.completions.create(
|
|
|
174 |
model="hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
|
175 |
messages=[
|
176 |
{"role": "system", "content": "You are a helpful assistant."},
|
|
|
180 |
)
|
181 |
```
|
182 |
|
183 |
+
Alternatively, the OpenAI Python client can also be used (see [installation notes](https://github.com/openai/openai-python?tab=readme-ov-file#installation)) as follows:
|
184 |
+
|
185 |
+
```python
|
186 |
+
import os
|
187 |
+
from openai import OpenAI
|
188 |
+
|
189 |
+
client = OpenAI(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("OPENAI_API_KEY", "-"))
|
190 |
+
|
191 |
+
chat_completion = client.chat.completions.create(
|
192 |
+
model="tgi",
|
193 |
+
messages=[
|
194 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
195 |
+
{"role": "user", "content": "What is Deep Learning?"},
|
196 |
+
],
|
197 |
+
max_tokens=128,
|
198 |
+
)
|
199 |
+
```
|
200 |
+
|
201 |
### vLLM
|
202 |
|
203 |
To run vLLM with Llama 3.1 70B Instruct AWQ in INT4, you will need to have Docker installed (see [installation notes](https://docs.docker.com/engine/install/)) and run the latest vLLM Docker container as follows:
|