|
--- |
|
license: llama3.1 |
|
base_model: |
|
- meta-llama/Llama-3.1-405B-Instruct |
|
base_model_relation: quantized |
|
tags: |
|
- VPTQ |
|
- Quantized |
|
- Quantization |
|
--- |
|
|
|
**Disclaimer**: |
|
|
|
The model is reproduced based on the paper *VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models* [github](https://github.com/microsoft/vptq) and [arXiv](https://arxiv.org/abs/2409.17066) |
|
|
|
The model itself is sourced from a community release. |
|
|
|
It is intended only for experimental purposes. |
|
|
|
Users are responsible for any consequences arising from the use of this model. |
|
|
|
**Note**: |
|
|
|
The PPL test results are for reference only and were collected using GPTQ testing script. |
|
|
|
```json |
|
{ |
|
"ctx_2048": { |
|
"wikitext2": 4.625874996185303 |
|
}, |
|
"ctx_4096": { |
|
"wikitext2": 4.31096076965332 |
|
}, |
|
"ctx_8192": { |
|
"wikitext2": 4.156002521514893 |
|
} |
|
} |
|
``` |