nah, and it looks like the tokenizer on the source file's broken anyway. probably the base model too. loves </s> for some reason but Yi doesn't use that?

made from TeeZee/Kyllene-57B-v1.0.q6_k.gguf

no quants here to download. i did try. make it yourself; the imatrix works and i'm feeling very irritable now. do people not test these things? I know git-lfs hasn't been subject to any QA ever so maybe?

the dataset file was made by concatenating most of the default exllamav2 calibration data. a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers. like this:

$ cd exllamav2/conversion/standard_cal_data
$ cat technical.utf8 multilingual.utf8 code.utf8 tiny.utf8 > techmulcodetiny.utf8

reference to: exllamav2/conversion/standard_cal_data and techmulcodetiny.utf8 produce a file that is used by imatrix for 560~ "chunks"

imatrix was run with default sampling settings besides the dataset (i think? i increased the batch number and reduced the batch size so i could cram on more layers but the generation should have been the same in the end) (someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.) (UPDATE found the command I used. use at your peril and obviously fix the paths)

imatrix -m Kyllene-57B-v1.0.q6_K.gguf -f ~/exltabbytorcher220/exllamav2/conversion/standard_cal_data/techmulcodetiny.utf8 -o Kyllene-57B-v1.0.q6_K.gguf.imat --verbosity 1 -ngl 50 -cb -t 3 -b 256 --no_mmap

51 layers was too many on a 3090 and I had to kill wayland (pro tip: tmux). needless to say you'll probably die if you tried something idiotic like using this on windows --no_mmap was appropriate on my nigtmare vortex of 32GB DDR4, layered swap,tiny zrams and weird kernel parameters but maybe just omit it.

how-to because i'm grouchy but I did actually want people to have these. Remember to replace IQ2_M (appears only twice, near the end) with whatever you fancy. Q2_K might be more compatible.

         ~]$ git clone https://github.com/ggerganov/llama.cpp
         ~]$ cd llama.cpp
if you're like me and you break llamas for fun and don't understand cmake: git switch master && git pull; git restore Makefile
otherwise
 llama.cpp]$ git pull; make -j
 llama.cpp]$ ./quantize --allow-requantize --imatrix Kyllene-57B-v1.0.q6_K.gguf.imatrix INPUT_DIRECTORY/Kyllene-57B-v1.0.q6_K.gguf Kyllene-57B-v1.0.IQ2_M.gguf IQ2_M

if your computer has less than 8 cores, add the number of cores to the end of this (there's an invisible 8 by default). and yes, you can just use ./ (llama.cpp) as INPUT_DIRECTORY

Downloads (eat my ass huggingface yeah just leave the cryptic git lfs error message on the far side of a 3 hour upload over LTE thanks)

no downloads now. ive uploaded 50 gigabytes so far and none of them made it past the great wall of git-lfs you have the imatrix and the q6, DIY. IQ2_M probably for a 24GB device, IQ3XXS for better with kv offload.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.