bartowski/Mistral-Large-Instruct-2407-GGUF

Jul 29

•

The "</s>" (EOS) in the current chat template seems to stop the model from generating tokens. Removing the "</s>" works for me.

FWIW the upstream chat template in the config json is super weird and hard to read...

ubergarm

Jul 29

•

edited Jul 29

Huh, something does seem a bit odd. The model card suggests normal looking Mistral prompt format (including the goofy white spaces):

<s>[INST] {prompt}[/INST] </s>

The model loading debug log in llama.cpp seems to match expected Mistral tokens:

tokenizer.ggml.tokens arr[str,32768]   = ["<unk>", "<s>", "</s>", "[INST]", "[...

However, the model loading debug log shows what looks like ChatML prompt format chat_example:

chat_example="<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n"

In practice, when I use Mistral format, the response always begins with that tell-tell extra white space. When I use ChatML format prompt, it doesn't start with the extra white-space. Using ChatML also tends to not stop inference prematurely in my limited testing.

I'm not 100% sure which one to actually use, but it likely matters - especially with how you do the system prompt. I'm leaning towards using ChatML given it does not produce the extra white-space and tends to continue inference without premature ending.

sydneyfong

Jul 29

Yeah that's my experience as well, the ChatML prompt also works.

I can't really tell the difference between the outputs of [INST] or ChatML prompts.

treehugg3

Aug 22

If you use llama.cpp use --chat-template llama2 in your commands.

bartowski
/

Mistral-Large-Instruct-2407-GGUF

Chat template