markoarnauto commited on
Commit
4200ed6
1 Parent(s): 544606b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -15,14 +15,14 @@ Install **vLLM** and
15
  run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
16
 
17
  ```
18
- python -m vllm.entrypoints.openai.api_server --model cortecs/cortecs--Meta-Llama-3-70B-Instruct-GPTQ
19
  ```
20
  Access the model:
21
  ```
22
  curl http://localhost:8000/v1/completions
23
  -H "Content-Type: application/json"
24
  -d '{
25
- "model": "cortecs/cortecs--Meta-Llama-3-70B-Instruct-GPTQ",
26
  "prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>
27
  Tell me a joke<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
28
  }'
@@ -75,7 +75,7 @@ Take with caution. We did not check for data contamination.
75
  | NVIDIA L40Sx4 | 2.38 | 1135.41 |
76
  | | | |
77
  | __Llama 3 70B GPTQ__ | __requests/s__ | __tokens/s__ |
78
- | NVIDIA L40Sx2 | 1.58 | 750.89 |
79
  | | | |
80
  | __Llama-3 8B Instruct__ | __requests/s__ | __tokens/s__ |
81
  | NVIDIA L40Sx1 | 11.64 | 5548.63 |
 
15
  run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
16
 
17
  ```
18
+ python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-70B-Instruct-GPTQ
19
  ```
20
  Access the model:
21
  ```
22
  curl http://localhost:8000/v1/completions
23
  -H "Content-Type: application/json"
24
  -d '{
25
+ "model": "cortecs/Meta-Llama-3-70B-Instruct-GPTQ",
26
  "prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>
27
  Tell me a joke<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
28
  }'
 
75
  | NVIDIA L40Sx4 | 2.38 | 1135.41 |
76
  | | | |
77
  | __Llama 3 70B GPTQ__ | __requests/s__ | __tokens/s__ |
78
+ | NVIDIA L40Sx2 | 2.0 | 951.28 |
79
  | | | |
80
  | __Llama-3 8B Instruct__ | __requests/s__ | __tokens/s__ |
81
  | NVIDIA L40Sx1 | 11.64 | 5548.63 |