h2o-danube2-1.8b-chat-GGUF

Description

This repo contains GGUF format model files for h2o-danube2-1.8b-chat quantized using llama.cpp framework.

Table below summarizes different quantized versions of h2o-danube2-1.8b-chat. It shows the trade-off between size, speed and quality of the models.

Name	Quant method	Model size	MT-Bench AVG	Perplexity	Tokens per second
h2o-danube2-1.8b-chat-F16.gguf	F16	3.66 GB	5.60	8.02	797
h2o-danube2-1.8b-chat-Q8_0.gguf	Q8_0	1.95 GB	5.51	8.02	1156
h2o-danube2-1.8b-chat-Q6_K.gguf	Q6_K	1.50 GB	5.51	8.03	1131
h2o-danube2-1.8b-chat-Q5_K_M.gguf	Q5_K_M	1.30 GB	5.56	8.10	1172
h2o-danube2-1.8b-chat-Q5_K_S.gguf	Q5_K_S	1.27 GB	5.49	8.12	1107
h2o-danube2-1.8b-chat-Q4_K_M.gguf	Q4_K_M	1.11 GB	5.60	8.27	1162
h2o-danube2-1.8b-chat-Q4_K_S.gguf	Q4_K_S	1.06 GB	5.59	8.34	1270
h2o-danube2-1.8b-chat-Q3_K_L.gguf	Q3_K_L	0.98 GB	5.23	8.72	1442
h2o-danube2-1.8b-chat-Q3_K_M.gguf	Q3_K_M	0.91 GB	4.91	8.81	1107
h2o-danube2-1.8b-chat-Q3_K_S.gguf	Q3_K_S	0.82 GB	4.03	10.12	1103
h2o-danube2-1.8b-chat-Q2_K.gguf	Q2_K	0.71 GB	3.03	12.56	1160

Columns in the table are:

Name -- model name and link
Quant method -- quantization method
Model size -- size of the model in gigabytes
MT-Bench AVG -- MT-Bench benchmark score. The score is from 1 to 10, the higher, the better
Perplexity -- perplexity metric on WikiText-2 dataset. It's reported in a perplexity test from llama.cpp. The lower, the better
Tokens per second -- generation speed in tokens per second, as reported in a perplexity test from llama.cpp. The higher, the better. Speed tests are done on a single H100 GPU

<|prompt|>Why is drinking water so healthy?</s><|answer|>

GGUF

Model size

1.83B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit