Got it working in llama.cpp! Thanks!
#1
by
ubergarm
- opened
Thanks for this and looking for your commit to get PR'd
For now just got it working in llama.cpp like so:
$ cd llama.cpp
$ git pull
$ git remote add iamlemec git@github.com:iamlemec/llama.cpp.git
$ git cherry-pick 6515e787d10095d439228f2
$ git log --pretty=oneline | head -n 5
7c9f8d3c3775c38cb014285752ea88319d5275f8 mistral nemo inference support
69c487f4ed57bb4d4514a1b7ff12608d5a8e7ef0 CUDA: MMQ code deduplication + iquant support (#8495)
07283b1a90e1320aae4762c7e03c879043910252 gguf : handle null name during init (#8587)
940362224d20e35f13aa5fd34a0d937ae57bdf7d llama : add support for Tekken pre-tokenizer (#8579)
69b9945b44c3057ec17cb556994cd36060455d44 llama.swiftui: fix end of generation bug (#8268)
$ make clean && time GGML_CUDA=1 make -j$(nproc)
:gucci:
EDIT: Found the GitHub PR #8604
PR is merged and working with this and other quants on hf now! Cheers and thanks!
ubergarm
changed discussion status to
closed