File size: 1,171 Bytes
84e3837 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
---
license: other
license_name: yi-license
license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
tags:
- merge, GGUF, imatrix
---
## [Kyllene-57B](/TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw GGUF
### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8)
#### There may well be problems with these quants but I'll eat my own ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
imatrix included. generated from a 900k text file, also included
this file was made by concatenating most of the default exllamav2 calibration data. a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
[IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
- This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
IQ2_M breifly existed before I clobbered (technical term) it. It might be back.
[IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) (<3.1 BPW)
|