cortecs
/

Meta-Llama-3-70B-Instruct-GPTQ

@@ -15,14 +15,14 @@ Install **vLLM** and
     run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
 ```
-python -m vllm.entrypoints.openai.api_server --model cortecs/cortecs--Meta-Llama-3-70B-Instruct-GPTQ
 ```
 Access the model:
 ```
 curl http://localhost:8000/v1/completions
     -H "Content-Type: application/json"
     -d '{
-        "model": "cortecs/cortecs--Meta-Llama-3-70B-Instruct-GPTQ",
         "prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>
 Tell me a joke<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
     }'
@@ -75,7 +75,7 @@ Take with caution. We did not check for data contamination.
 | NVIDIA L40Sx4              | 2.38             | 1135.41        |
 |                            |                  |                |
 | __Llama 3 70B GPTQ__   | __requests/s__   | __tokens/s__   |
-| NVIDIA L40Sx2          | 1.58             | 750.89         |
 |                        |                  |                |
 | __Llama-3 8B Instruct__   |   __requests/s__ |   __tokens/s__ |
 | NVIDIA L40Sx1             |            11.64 |        5548.63 |

     run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
 ```
+python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-70B-Instruct-GPTQ
 ```
 Access the model:
 ```
 curl http://localhost:8000/v1/completions
     -H "Content-Type: application/json"
     -d '{
+        "model": "cortecs/Meta-Llama-3-70B-Instruct-GPTQ",
         "prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>
 Tell me a joke<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
     }'
 | NVIDIA L40Sx4              | 2.38             | 1135.41        |
 |                            |                  |                |
 | __Llama 3 70B GPTQ__   | __requests/s__   | __tokens/s__   |
+| NVIDIA L40Sx2          | 2.0              | 951.28         |
 |                        |                  |                |
 | __Llama-3 8B Instruct__   |   __requests/s__ |   __tokens/s__ |
 | NVIDIA L40Sx1             |            11.64 |        5548.63 |