FlorianJc
/

Mistral-Nemo-Instruct-2407-vllm-fp8

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

FlorianJc commited on Jul 31, 2024

Commit

14da5c7

·

verified ·

1 Parent(s): 69e3ae5

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -19,8 +19,7 @@ tags:
 ## Model infos:
 FP8 (F8_E4M3) quantized version of Mistral-Nemo-Instruct-2407 with 512 epochs.
-Should work with transformers, but you need this patch to use it with vLLM : https://github.com/vllm-project/vllm/pull/6548
-Or simply wait for vLLM 0.5.3...
 ```diff
 --- vllm/model_executor/models/llama.py	2024-07-19 02:01:59.192831673 +0200

 ## Model infos:
 FP8 (F8_E4M3) quantized version of Mistral-Nemo-Instruct-2407 with 512 epochs.
+Tested on vLLM 0.5.3, but you need this patch to use it with vLLM 0.5.2 : https://github.com/vllm-project/vllm/pull/6548
 ```diff
 --- vllm/model_executor/models/llama.py	2024-07-19 02:01:59.192831673 +0200