markoarnauto
commited on
Commit
•
4200ed6
1
Parent(s):
544606b
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -15,14 +15,14 @@ Install **vLLM** and
|
|
15 |
run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
|
16 |
|
17 |
```
|
18 |
-
python -m vllm.entrypoints.openai.api_server --model cortecs/
|
19 |
```
|
20 |
Access the model:
|
21 |
```
|
22 |
curl http://localhost:8000/v1/completions
|
23 |
-H "Content-Type: application/json"
|
24 |
-d '{
|
25 |
-
"model": "cortecs/
|
26 |
"prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>
|
27 |
Tell me a joke<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
|
28 |
}'
|
@@ -75,7 +75,7 @@ Take with caution. We did not check for data contamination.
|
|
75 |
| NVIDIA L40Sx4 | 2.38 | 1135.41 |
|
76 |
| | | |
|
77 |
| __Llama 3 70B GPTQ__ | __requests/s__ | __tokens/s__ |
|
78 |
-
| NVIDIA L40Sx2 |
|
79 |
| | | |
|
80 |
| __Llama-3 8B Instruct__ | __requests/s__ | __tokens/s__ |
|
81 |
| NVIDIA L40Sx1 | 11.64 | 5548.63 |
|
|
|
15 |
run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
|
16 |
|
17 |
```
|
18 |
+
python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-70B-Instruct-GPTQ
|
19 |
```
|
20 |
Access the model:
|
21 |
```
|
22 |
curl http://localhost:8000/v1/completions
|
23 |
-H "Content-Type: application/json"
|
24 |
-d '{
|
25 |
+
"model": "cortecs/Meta-Llama-3-70B-Instruct-GPTQ",
|
26 |
"prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>
|
27 |
Tell me a joke<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
|
28 |
}'
|
|
|
75 |
| NVIDIA L40Sx4 | 2.38 | 1135.41 |
|
76 |
| | | |
|
77 |
| __Llama 3 70B GPTQ__ | __requests/s__ | __tokens/s__ |
|
78 |
+
| NVIDIA L40Sx2 | 2.0 | 951.28 |
|
79 |
| | | |
|
80 |
| __Llama-3 8B Instruct__ | __requests/s__ | __tokens/s__ |
|
81 |
| NVIDIA L40Sx1 | 11.64 | 5548.63 |
|