Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
GGUF Q8 output quality
#1
by
opendev
- opened
I'm running locally a quantized version of the aya-35B with a maximum quant Q8 (48GB VRAM), but the local output quality is much lower than the quality of the model in this space. The difference is too noticeable. The question is "Why"? I haven't noticed such a big difference when using other models at maximum quants.
Prompt format I'm using locally:
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>
{system_prompt}
<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>
{prompt}
<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>