Mozilla/llava-v1.5-7b-llamafile · Hey :) LMstudio support and picking your brain

Dec 3, 2023

Hi. I did the 13B GGUF for these and was wondering if you would be so kind as to point me at the docs / script / something you used to compress the CLIP mmproj - I recall seeing something like that being available on some linux branch of something but for the life of me can't dredge it up.
Would really appreciate the assist.

In other news. To make this LMstudio compatible OOTB you might consider renaming the adaptors to mmproj-Q4_0.gguf (et al)

then it will "just work TM"

Thanks for this version.

PsiPi

Dec 3, 2023

•

edited Dec 3, 2023

In addition, if you were provide the file llava.preset.json as shown here

in the repo like this

that would also preload the LMstudio template. Hope it helps

jartine

mozilla org Dec 5, 2023

Hey! I heard about your project for the first time a few days ago. It's pretty cool.

If you need to quantize the clip mmproj files, you can use the llamafile-llava-quantize-0.2.1 program from https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.2.1

All the code that's needed for doing this is in the llama.cpp project upstream. But no one wrote a main() function for it until I came along. I hope you find it useful. I'm very much forward to using LLaVA 13B when you post it, since I honestly have no idea how to quantize the other file!

jartine changed discussion status to closed Dec 5, 2023

PsiPi

Dec 7, 2023

Thanks very much. I'm also on the verge of releasing NOUS . again I will eventually be using the encoding quantiser when I get it working - in the interim Ill just post what I have.