Update README.md
Browse files
README.md
CHANGED
@@ -16,8 +16,7 @@ https://huggingface.co/alpindale/magnum-72b-v1
|
|
16 |
* <h3 style="display: inline;">Release Date:</h3> June 25, 2024
|
17 |
|
18 |
Magnum-72B-v1 quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
|
19 |
-
Calibrated with 512 UltraChat samples to achieve
|
20 |
-
Reduces space on disk by ~45%.
|
21 |
Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
|
22 |
|
23 |
## Usage and Creation
|
|
|
16 |
* <h3 style="display: inline;">Release Date:</h3> June 25, 2024
|
17 |
|
18 |
Magnum-72B-v1 quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
|
19 |
+
Calibrated with 512 UltraChat samples to achieve better performance recovery.
|
|
|
20 |
Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
|
21 |
|
22 |
## Usage and Creation
|