qwp4w3hyb
/

Meta-Llama-3.1-70B-Instruct-iMat-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

qwp4w3hyb commited on Jul 25, 2024

Commit

3958e85

·

verified ·

1 Parent(s): 374f001

Create README.md

Files changed (1) hide show

README.md +38 -0

README.md ADDED Viewed

	@@ -0,0 +1,38 @@

+---
+language:
+  - en
+  - de
+  - fr
+  - it
+  - pt
+  - hi
+  - es
+  - th
+license: llama3.1
+pipeline_tag: text-generation
+tags:
+  - facebook
+  - meta
+  - pytorch
+  - llama
+  - llama-3
+  - gguf
+  - imatrix
+base_model: meta-llama/Meta-Llama-3.1-70B-Instruct
+---
+# Quant Infos
+- Requires latest master + [Rope Scaling PR](https://github.com/ggerganov/llama.cpp/pull/8676)
+- Might not be perfect yet, but seems to mostly work.
+- quants done with an importance matrix for improved quantization loss
+- Quantized ggufs & imatrix from hf bf16, through bf16. `safetensors bf16 -> gguf bf16 -> quant` for *optimal* quant loss.
+- Wide coverage of different gguf quant types from Q\_8\_0 down to IQ1\_S
+- Imatrix generated with [this](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8) multi-purpose dataset by [bartowski](https://huggingface.co/bartowski).
+  ```
+  ./imatrix -m $model_name-bf16.gguf -f calibration_datav3.txt -o $model_name.imatrix
+  ```
+# Original Model Card:
+TODO