Sep 26

•

The chat template appears very similar to the one used in H4 Zephyr (and later also in stablelm zephyr) but with minor differences (mostly about eos_token).
Are those differences intentional?

Just to make the differences easier to see:

Zephyr

{% for message in messages %}
    {% if message['role'] == 'user' %}
        {{ '<|user|> ' + message['content'] + eos_token }}
    {% elif message['role'] == 'system' %}
        {{ '<|system|> ' + message['content'] + eos_token }}
    {% elif message['role'] == 'assistant' %}
        {{ '<|assistant|> ' + message['content'] + eos_token }}
    {% endif %}
    
    {% if loop.last and add_generation_prompt %}
        {{ '<|assistant|>' }}
    {% endif %}
{% endfor %}

OLMoE

{{ bos_token }}
{% for message in messages %}
    {% if message['role'] == 'system' %}
        {{ '<|system|> ' + message['content'] }}
    {% elif message['role'] == 'user' %}
        {{ '<|user|> ' + message['content'] }}
    {% elif message['role'] == 'assistant' %}
        {{ '<|assistant|> ' + message['content'] + eos_token }}
    {% endif %}
    
    {% if loop.last and add_generation_prompt %}
        {{ '<|assistant|>' }}
    {% endif %}
{% endfor %}

amanrangapur

Ai2 org about 1 month ago

Yes, the difference is intentional. The OLMoE chat template uses the tulu chat template (https://arxiv.org/abs/2306.04751), which was concurrent work with Zephyr.

Muennighoff

Ai2 org 9 days ago

Btw in this paper we ran an ablation comparing the two chat templates and found the tulu one to perform better as the zephyr one often leads to shorter generations: https://arxiv.org/abs/2402.09906

allenai
/

OLMoE-1B-7B-0924-Instruct

Differences in chat template compared to Zephyr

Zephyr

OLMoE