Aug 23, 2023

getting this error while loading the model -
Could not load Llama model from path: /root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/47d28ef5de4f3de523c421f325a2e4e039035bab/llama-2-13b-chat.ggmlv3.q5_1.bin. Received error fileno (type=value_error)

TheBloke

Owner Aug 23, 2023

How are you trying to load it? Using what client/library?

rahul07

Aug 23, 2023

•

edited Aug 23, 2023

I'm loading the model via this code -

Loading model,

llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=False,
        )

I'm trying to pass a pdf and query it using this sheet via your model - https://github.com/MuhammadMoinFaisal/LargeLanguageModelsProjects/blob/main/QA%20Book%20PDF%20LangChain%20Llama%202/Final_Llama_CPP_Ask_Question_from_book_PDF_Llama.ipynb

Hrrrrr

Aug 24, 2023

I also got the same error, have you found the solution?

Lokesh1200

Aug 24, 2023

I also got the same error, have you found the solution?

kalam

Aug 25, 2023

i am facing same error, how can resolve it please help me :- (
Could not load Llama model from path: /root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/47d28ef5de4f3de523c421f325a2e4e039035bab/llama-2-13b-chat.ggmlv3.q5_1.bin. Received error fileno (type=value_error)

TheBloke

Owner Aug 25, 2023

Please report this to whatever client provides that code. Nothing has changed with my models.

shodhi

Aug 26, 2023

•

edited Aug 26, 2023

Thanks a lot TheBloke for your immense work. I am working with llama2 7b/13b q8 models both successfully in koboldcpp. But i can't get both of them work with lLamaCpp. I am getting value errror, assertion error. Do you have any suggestions, i can try. Thanks.

TheBloke

Owner Aug 26, 2023

•

edited Aug 26, 2023

@shodhi llama.cpp no longer supports GGML models as of August 21st. GGML has been replaced by a new format called GGUF.

I will soon be providing GGUF models for all my existing GGML repos, but I'm waiting until they fix a bug with GGUF models. I will also soon update the READMEs on all my GGML models to mention this.

For now, please downgrade llama.cpp to commit dadbed99e65252d79f81101a392d0d6497b86caa and rebuild it, and it will work fine with these and all other GGML files. If you're using llama-cpp-python, please use version v0.1.78 or earlier.

shodhi

Aug 26, 2023

Thank you so much for the prompt response. I will do as suggested and update it here. I was thinking of trying the model with Ctransformers inspite of llama also. Will update the results of that too here. Regards

shodhi

Aug 26, 2023

•

edited Aug 26, 2023

Unfortunately it doesnt works with llama-cpp-python v0.1.78/0.1.77/0.1.76.
Else, I want it to be worked with langchain's LlamaCpp mostly?
No luck with cTransformers as well.
Any recommendations? Thanks

kalam

Aug 26, 2023

please take look , i am facing this error

actionpace

Aug 26, 2023

•

edited Aug 26, 2023

Use this colab code as a starting point
At this time I do not know if some parameters in LlamaCpp() are ignored or if they need to be in some sort of metafile as input to the conversion but at least the model should work

!pip install -qq langchain wget 
!pip install gguf  #https://github.com/ggerganov/llama.cpp/tree/master/gguf-py
!git clone https://github.com/ggerganov/llama.cpp
!pip -qq install git+https://github.com/huggingface/transformers
#Assuming you are using a GPU
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -qq install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

from huggingface_hub import hf_hub_download
repo_id="TheBloke/Llama-2-13B-GGML"; filename="llama-2-13b.ggmlv3.q5_1.bin"
hf_hub_download(
    repo_id=repo_id, filename=filename,
    local_dir="/content"
)

!python /content/llama.cpp/convert-llama-ggmlv3-to-gguf.py --input `ls -t /content/*ggmlv3*.bin | head -1` --output `ls -t /content/*ggmlv3*.bin | head -1`.gguf

filename=filename+".gguf"


n_gpu_layers = 32  
n_batch = 512  
n_threads=4
llm = LlamaCpp(
    model_path="/content/"+filename,
    n_threads=n_threads,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    n_ctx=2048,
    temperature=0.8,
    repeat_penalty=1.18,
    top_p=1,
    top_k=3,
    max_tokens=256,
    streaming=True,
    #verbose=True,
)

kalam

Aug 26, 2023

i try this ,but facing same error

actionpace

Aug 26, 2023

•

edited Aug 26, 2023

Make sure the single quotes on the conversion line are backtics `
In your code it looks like you removed them

actionpace

Aug 26, 2023

•

edited Aug 26, 2023

So far my work with this is showing dropped words. I'm not sure if it's something due to the conversion or the Beta status of the new Llama.cpp
I will be going back to v0.1.78 but will keep an eye on the cutting edge to see how this works out

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -qq install --upgrade --force-reinstall llama-cpp-python==0.1.78 --no-cache-dir

More information for the conversion script
Looks like -c 4096 and --eps 1e-5 should be used for Llama2

!python /content/llama.cpp/convert-llama-ggmlv3-to-gguf.py -c 4096 --eps 1e-5 --input `ls -tr /content/*ggmlv3*.bin | head -1` --output `ls -tr /content/*ggmlv3*.bin | head -1`.gguf

Convert GGMLv3 models to GGUF

--input, -i (Input GGMLv3 filename)
--output, -o (Output GGUF filename)
--name (Set model name)
--desc (Set model description)
--gqa default = 1 (grouped-query attention factor (use 8 for LLaMA2 70B))
--eps default = '5.0e-06' (RMS norm eps: Use 1e-6 for LLaMA1 and OpenLLaMA, use 1e-5 for LLaMA2)
--context-length, -c default = 2048 (Default max context length: LLaMA1 is typically 2048, LLaMA2 is typically 4096'))
--model-metadata-dir, -m (Load HuggingFace/.pth vocab and metadata from the specified directory'))
--vocab-dir (directory containing tokenizer.model, if separate from model file - only meaningful with --model-metadata-dir)
--vocabtype ["spm", "bpe"] (vocab format - only meaningful with --model-metadata-dir and/or --vocab-dir (default: spm))

actionpace

Aug 27, 2023

Info on successful conversions
https://github.com/ggerganov/llama.cpp/issues/2812#issuecomment-1694413605

https://www.reddit.com/r/LocalLLaMA/comments/15zvxta/comment/jxjj61s/?utm_source=share&utm_medium=web2x&context=3

AbdelrahmanAhmed

Sep 1, 2023

Fix for "Could not load Llama model from path":

Download GGUF model from this link:
https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF

Code Example:

model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

Then Change "verbose=False" to "verbose=True" like the following code:

llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=True,
)

advitiya

Sep 3, 2023

•

edited Sep 3, 2023

I used to get the same error then, I included these lines and it worked!!

!pip install gguf #https://github.com/ggerganov/llama.cpp/tree/master/gguf-py
!git clone https://github.com/ggerganov/llama.cpp

model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"

thanks @AbdelrahmanAhmed and @actionpace

@TheBloke thanks for the great work:)

zuhashaik

Sep 5, 2023

•

edited Sep 5, 2023

any inputs on 70b ?
I initially loaded the GGML version by mistake instead of GGUF and discovered that LLama.cpp doesn't support GGML. I then converted it to GGUF using the LLama.cpp repository, but I'm still encountering the same error.(70B versions)
still getting the same errors..

modelq2gguf='/media/iiit/Karvalo/zuhair/llama/llama70b_q2/llama-2-70b.gguf.q2_K.bin'
llm = LlamaCpp(
model_path=modelq2gguf,
temperature=0.75,
max_tokens=2000,
top_p=1,
callback_manager=callback_manager,
verbose=True
)
ValidationError: 1 validation error for LlamaCpp
root
[Could not load Llama model from path: /media/iiit/Karvalo/zuhair/llama/llama70b_q2/llama-2-70b.gguf.q2_K.bin. Received error (type=value_error)]
(ValidationError: 1 validation error for LlamaCpp
root
Could not load Llama model from path: /media/iiit/Karvalo/zuhair/llama/llama-2-70b.ggmlv3.q4_1.bin. Received error Model path does not exist: /media/iiit/Karvalo/zuhair/llama/llama-2-70b.ggmlv3.q4_1.bin )

jamesbraza

Sep 12, 2023

•

edited Sep 15, 2023

For those wanting a fully-baked way with macOS, that works as of Sept 12 2023:

# requirements.txt
huggingface-hub==0.17.1
llama-cpp-python==0.1.85

Please make sure your Python 3.11 supports amd64 per this:

python -m venv venv
source venv/bin/activate
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 python -m pip install -r requirements.txt --no-cache-dir

Then this Python code:

import pathlib

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

HF_REPO_NAME = "TheBloke/Llama-2-13B-chat-GGUF"
HF_MODEL_NAME = "llama-2-13b-chat.Q4_K_S.gguf"
REPO_MODELS_FOLDER = pathlib.Path(__file__).parent / "models"

REPO_MODELS_FOLDER.mkdir(exist_ok=True)
model_path = hf_hub_download(
    repo_id=HF_REPO_NAME, filename=HF_MODEL_NAME, local_dir=REPO_MODELS_FOLDER
)

llm = Llama(model_path=model_path, n_gpu_layers=1)  # n_gpu_layers uses macOS Metal GPU
llm("What is the capital of China?")

zuhashaik

Sep 13, 2023

can anyone explain me about gpu_layers..
I've 3 cards of nvidia v100 tesla 32 gb each, now how many gpu layers i have to pass as atribute:
llm = AutoModelForCausalLM.from_pretrained('/media/iiit/Karvalo/zuhair/llama/llama70b_q2/genz-70b.Q2_K.gguf', model_type='llama', gpu_layers=gpu_layers)
it is accepting from [0-infinite) as Ive checked it for 100000 as the value, its still accepting it.

jamesbraza

Sep 14, 2023

@zuhashaik I think your question is outside this discussion's scope, but check these links for info:

If you have further questions, I think it's worth a designated discussion thread somewhere else

Zainabsa99

Sep 17, 2023

This comment has been hidden

dorike

Sep 17, 2023

Fix for "Could not load Llama model from path":

Download GGUF model from this link:
https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF

Code Example:

model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

Then Change "verbose=False" to "verbose=True" like the following code:

llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=True,
)

This "verbose=True" worked for me
Thanks

zuhashaik

Sep 18, 2023

•

edited Sep 18, 2023

Thank you all!
But I'm still confused about the value of gpu_layers, what does the value of gpu_layers say in llama.cpp
It works when you use gpu_layers=[0 / 10/ 100000000000].
what does the value indicates? percentage?

madeganesh228

Oct 6, 2023

@dorike hi, I've tried your code, but I'm still facing the same error ( Could not load Llama model from path: TheBloke/CodeLlama-13B-Python-GGUF/codellama-13b-python.Q5_K_M.gguf. Received error Model path does not
exist: TheBloke/CodeLlama-13B-Python-GGUF/codellama-13b-python.Q5_K_M.gguf (type=value_error))
do you have any solutions for this? thx

YaTharThShaRma999

Oct 6, 2023

Gpu layers offload the model to gpu. I think around 50 to 70ish should be enough?

Also, the reason it’s not working is because you have to replace it with your model path?

Zethearc

Oct 24, 2023

!ls /root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin
!cp /root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin /content/
model_path = "/content/llama-2-13b-chat.ggmlv3.q5_1.bin"

to solve

Venkat009

Oct 27, 2023

Can I do the same process for reading tables from a pdf?

MoAusaf

Nov 2, 2023

any inputs on 70b ?
I initially loaded the GGML version by mistake instead of GGUF and discovered that LLama.cpp doesn't support GGML. I then converted it to GGUF using the LLama.cpp repository, but I'm still encountering the same error.(70B versions)
still getting the same errors..

modelq2gguf='/media/iiit/Karvalo/zuhair/llama/llama70b_q2/llama-2-70b.gguf.q2_K.bin'
llm = LlamaCpp(
model_path=modelq2gguf,
temperature=0.75,
max_tokens=2000,
top_p=1,
callback_manager=callback_manager,
verbose=True
)
ValidationError: 1 validation error for LlamaCpp
root
[Could not load Llama model from path: /media/iiit/Karvalo/zuhair/llama/llama70b_q2/llama-2-70b.gguf.q2_K.bin. Received error (type=value_error)]
(ValidationError: 1 validation error for LlamaCpp
root
Could not load Llama model from path: /media/iiit/Karvalo/zuhair/llama/llama-2-70b.ggmlv3.q4_1.bin. Received error Model path does not exist: /media/iiit/Karvalo/zuhair/llama/llama-2-70b.ggmlv3.q4_1.bin )

If someone is still trying the @actionpace starter notebook given above and getting the same error, try looking at the paths. For example, I couldn't locate the conversion script at the path in the following cmd or at least the name wasn't correct

!python /content/llama.cpp/convert-llama-ggmlv3-to-gguf.py --input `ls -t /content/*ggmlv3*.bin | head -1` --output `ls -t /content/*ggmlv3*.bin | head -1`.gguf

go to the llama.cpp folder (or in the dir you passed as dir in the previous code line) and find the conversion script manually, copy and paste the path into the above cmd, for me the changed cmd was

!python /content/llama.cpp/convert-llama-ggml-to-gguf.py --input `ls -t /content/*ggmlv3*.bin | head -1` --output `ls -t /content/*ggmlv3*.bin | head -1`.gguf

sanjitaa

Nov 29, 2023

This comment has been hidden

SahilBhoite

Dec 9, 2023

Fix for "Could not load Llama model from path":

Download GGUF model from this link:
https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF

Code Example:

model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

Then Change "verbose=False" to "verbose=True" like the following code:

llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=True,
)

This "verbose=True" worked for me
Thanks

henrykohl

Jun 9, 2024

Use this colab code as a starting point
At this time I do not know if some parameters in LlamaCpp() are ignored or if they need to be in some sort of metafile as input to the conversion but at least the model should work

!pip install -qq langchain wget 
!pip install gguf  #https://github.com/ggerganov/llama.cpp/tree/master/gguf-py
!git clone https://github.com/ggerganov/llama.cpp
!pip -qq install git+https://github.com/huggingface/transformers
#Assuming you are using a GPU
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -qq install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

from huggingface_hub import hf_hub_download
repo_id="TheBloke/Llama-2-13B-GGML"; filename="llama-2-13b.ggmlv3.q5_1.bin"
hf_hub_download(
    repo_id=repo_id, filename=filename,
    local_dir="/content"
)

!python /content/llama.cpp/convert-llama-ggmlv3-to-gguf.py --input `ls -t /content/*ggmlv3*.bin | head -1` --output `ls -t /content/*ggmlv3*.bin | head -1`.gguf

filename=filename+".gguf"


n_gpu_layers = 32  
n_batch = 512  
n_threads=4
llm = LlamaCpp(
    model_path="/content/"+filename,
    n_threads=n_threads,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    n_ctx=2048,
    temperature=0.8,
    repeat_penalty=1.18,
    top_p=1,
    top_k=3,
    max_tokens=256,
    streaming=True,
    #verbose=True,
)

I tried to convert llama-2-13b.ggmlv3.q5_1.bin into a UUGF file with the above code like
!python /content/llama.cpp/convert-llama-ggmlv3-to-gguf.py --inputls -t /content/ggmlv3.bin | head -1--outputls -t /content/ggmlv3.bin | head -1.gguf.

Unfortunately, it doesn't work for me. The error message is

raise ValueError(f"Quantized tensor bytes per row ({shape[-1]}) is not a multiple of {quant_type.name} type size ({type_size})")
AttributeError: 'int' object has no attribute 'name'

How could I fix this issue, if I want to convert a GGML to a GGUF ?