mradermacher/model_requests · All your models compatible with Ollama and rename all parts to compatible with Ollama, for example

joaquinito2073

Oct 19

Rename part1of8, for example, to 00001-of-00008.gguf, for example.
Source: https://huggingface.co/docs/hub/en/ollama

nicoboss

Oct 19

•

edited Oct 20

Rename part1of8, for example, to 00001-of-00008.gguf, for example.
Source: https://huggingface.co/docs/hub/en/ollama

He is using a different approach to split models and so it is not possible to load them that way even if you rename them.
Just concatenate them using the following command before loading them-.

cat $(ls /$path/$model.$quant.gguf.* | sort -V) > /$path/$model.$quant.gguf

If you really want to load the model without concatenating it first, you can use the following command to get a mountpoint to a fuse concatenated model:

cfconcat $(ls /$path/$model.$quant.gguf.* | sort -V)

You can use the output directly in command line tools like this:

CUDA_VISIBLE_DEVICES=0 llama.cpp/llama-perplexity -m $(cfconcat $(ls /$path/$model.$quant.gguf.* | sort -V)) --multiple-choice --multiple-choice-tasks 2000 -f mmlu-validation.bin -c 1024 -ngl 0 > ./evaluation/$model.$quant.mmlu.txt

My personal favorite option is to just concatenate the models while downloading like this but only do so if you have stable internet and as a single download issue will sliently corrupt the entire download

curl -L https://huggingface.co/mradermacher/$model-GGUF/resolve/main/$model.$squant.gguf.part[1-3]of3 > /upool/$model.$squant.gguf

mradermacher

Owner Oct 20

•

edited Oct 20

Arguably, it's also a bug in ollama's documentation to claim it works with all repos, when it works with, well, more like half :) None of the split TheBloke's models and everything else that is older works.

mradermacher changed discussion status to closed Oct 20