Spaces:
Running
on
A10G
Split/shard support
Will it be possible to support model sharding recently introduced in llama.cpp ?
+1
Heya! @phymbert - definitely yes, do you mind pointing me to the relevant snippet?
We're currently just quantizing and uploading to the Hub: https://huggingface.co/spaces/ggml-org/gguf-my-repo/blob/main/app.py#L63
Happy for suggestions!
Hi @reach-vb ,
I wrote a tutorial here: https://github.com/ggerganov/llama.cpp/discussions/6404
The --split-max-size
has been fixed recently.
Please ping if you need additional explanations.
Thanks
Hello! I have implemented this and will do a PR :) tried to keep the additions minimal to avoid cluttering the interface, so let me know if there's any change I should do to the layout or anything else before merging and will do so gladly.