Context length
In the model card this is stated:
max_seq_length = 32 768
Why I am getting this then?
Number of tokens (1442) exceeded maximum context length (512).
Part of the code:
llm = AutoModelForCausalLM.from_pretrained("ariel-ml/PULI-LlumiX-32K-instruct-GGUF", model_file="PULI-LlumiX-32K-instruct-Q5_K_S.gguf", model_type="llama", gpu_layers=50)
print(llm("""<|im_start|>system
Context information is below.Given the context information and not prior knowledge, answer the query.
\n---------------------\npage_label: 7\nfile_name: test.pdf
.
.
<|im_end|>
<|im_start|>user
Mivel foglalkozik az adott cég?<|im_end|><|im_start|>assistant<|im_end|>"""))
It seems like you are using the ctransformers library. You can change the default context_length and max_new_token parameters.
Documentation:
https://github.com/marella/ctransformers
from ctransformers import AutoModelForCausalLM
llm = AutoModelForCausalLM.from_pretrained(
"ariel-ml/PULI-LlumiX-32K-instruct-GGUF",
model_file="PULI-LlumiX-32K-instruct-Q5_K_S.gguf",
model_type="llama",
max_new_tokens=2048,
context_length=2048,
gpu_layers=50
)