metadata

language:
  - en
  - de
  - fr
  - it
  - pt
  - hi
  - es
  - th
license: llama3.1
pipeline_tag: text-generation
tags:
  - facebook
  - meta
  - pytorch
  - llama
  - llama-3
  - gguf
  - imatrix
base_model: meta-llama/Meta-Llama-3.1-70B-Instruct

Still uploading and quantizing, quants will appear 1by1 as they become available.

Quant Infos

Requires latest master + Rope Scaling PR
- @ubergarm explained how to set up your llama.cpp here
Might not be perfect yet, but seems to mostly work.
quants done with an importance matrix for improved quantization loss
Quantized ggufs & imatrix from hf bf16, through bf16. safetensors bf16 -> gguf bf16 -> quant for optimal quant loss.
Wide coverage of different gguf quant types from Q_8_0 down to IQ1_S
- still WIP
- experimental custom quant types
  - _L with --output-tensor-type f16 --token-embedding-type f16, which supposedly leads to better accuracy.

Imatrix generated with this multi-purpose dataset by bartowski.

./imatrix -m $model_name-bf16.gguf -f calibration_datav3.txt -o $model_name.imatrix

Original Model Card:

TODO