All your models compatible with Ollama and rename all parts to compatible with Ollama, for example
Rename part1of8, for example, to 00001-of-00008.gguf, for example.
Source: https://huggingface.co/docs/hub/en/ollama
Rename part1of8, for example, to 00001-of-00008.gguf, for example.
Source: https://huggingface.co/docs/hub/en/ollama
He is using a different approach to split models and so it is not possible to load them that way even if you rename them.
Just concatenate them using the following command before loading them-.
cat $(ls /$path/$model.$quant.gguf.* | sort -V) > /$path/$model.$quant.gguf
If you really want to load the model without concatenating it first, you can use the following command to get a mountpoint to a fuse concatenated model:
cfconcat $(ls /$path/$model.$quant.gguf.* | sort -V)
You can use the output directly in command line tools like this:
CUDA_VISIBLE_DEVICES=0 llama.cpp/llama-perplexity -m $(cfconcat $(ls /$path/$model.$quant.gguf.* | sort -V)) --multiple-choice --multiple-choice-tasks 2000 -f mmlu-validation.bin -c 1024 -ngl 0 > ./evaluation/$model.$quant.mmlu.txt
My personal favorite option is to just concatenate the models while downloading like this but only do so if you have stable internet and as a single download issue will sliently corrupt the entire download
curl -L https://huggingface.co/mradermacher/$model-GGUF/resolve/main/$model.$squant.gguf.part[1-3]of3 > /upool/$model.$squant.gguf
Arguably, it's also a bug in ollama's documentation to claim it works with all repos, when it works with, well, more like half :) None of the split TheBloke's models and everything else that is older works.