How can I run 8bit models?

#2
by kadirnar - opened

I want to run 8bit files using the LlamaCpp-Python library. Should I upload 2 files at the same time? Can you share sample code?

Since HuggingFace supports maximum size of single file is 50GB, we have to upload separate weights.

You should download both files AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf and AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf on the same directory.

And, concatenate these two files using following command,

Linux:
cat AkaLlama-llama3-70b-v0.1.Q8_0.-of-00002.gguf > AkaLlama-llama3-70b-v0.1.Q8_0.gguf && rm AkaLlama-llama3-70b-v0.1.Q8_0.-of-00002.gguf

Windows:
COPY /B AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf + AkaLlama-llama3-70b-v0.1.Q8_0.00002-of-00002.gguf AkaLlama-llama3-70b-v0.1.Q8_0.gguf
del AkaLlama-llama3-70b-v0.1.Q8_0.00001-of-00002.gguf AkaLlama-llama3-70b-v0.1.Q8_0.00002-of-00002.gguf

Now, you can get one GGUF weight and run it via llama.cpp.python

Sign up or log in to comment