Splitting ggufs

#1
by Jobaar - opened

I love your abliterated models, but I would like to discourage splitting ggufs into small parts unless necessary. If the weight file is above the upload limit for HF then splitting is useful but otherwise it makes managing and downloading them more difficult for no added benefit.

For instance setting up a shell script to give a listing of models in a directory to choose one on load time will have an extra 4 entries for each gguf split into 5 parts. Of course this can be rectified by recombining them, which is not difficult, but if you look at the work needed and multiply that by hundreds or thousands of downloads, then there is a solid argument to be made that a lot of aggregate time would be wasted to undo an unneeded step.

Thanks so much for your contributions to this field and community and I look forward to seeing more insightful and practical additions wherever you decide to spend your efforts.

Owner

Ack! Funnily enough it's exactly my homebrew shell script that got me into this mess. I 120% agree with you it shouldn't be this painful.

I'll reupload them soon stitched back together. Apologies to all having to fight with it.

I tried merging them into one file (Q6) and got an error when trying to load it into KoboldCpp.

I tried merging them into one file (Q6) and got an error when trying to load it into KoboldCpp.

~/llm/llama.cpp/gguf-split --merge Llama-3-70B-Instruct-abliterated-v3_q6-00001-of-00007.gguf Llama-3-70B-Instruct-abliterated-v3-q6.gguf

Owner

Good to know, thanks.

Llama cpp already supports loading from split files.

Sign up or log in to comment