Text Generation
Transformers
Safetensors
English
olmoe
Mixture of Experts
olmo
conversational
Inference Endpoints

Differences in chat template compared to Zephyr

#8
by stefanritterhoff - opened

The chat template appears very similar to the one used in H4 Zephyr (and later also in stablelm zephyr) but with minor differences (mostly about eos_token).
Are those differences intentional?

Just to make the differences easier to see:

Zephyr

{% for message in messages %}
    {% if message['role'] == 'user' %}
        {{ '<|user|> ' + message['content'] + eos_token }}
    {% elif message['role'] == 'system' %}
        {{ '<|system|> ' + message['content'] + eos_token }}
    {% elif message['role'] == 'assistant' %}
        {{ '<|assistant|> ' + message['content'] + eos_token }}
    {% endif %}
    
    {% if loop.last and add_generation_prompt %}
        {{ '<|assistant|>' }}
    {% endif %}
{% endfor %}

OLMoE

{{ bos_token }}
{% for message in messages %}
    {% if message['role'] == 'system' %}
        {{ '<|system|> ' + message['content'] }}
    {% elif message['role'] == 'user' %}
        {{ '<|user|> ' + message['content'] }}
    {% elif message['role'] == 'assistant' %}
        {{ '<|assistant|> ' + message['content'] + eos_token }}
    {% endif %}
    
    {% if loop.last and add_generation_prompt %}
        {{ '<|assistant|>' }}
    {% endif %}
{% endfor %}

Yes, the difference is intentional. The OLMoE chat template uses the tulu chat template (https://arxiv.org/abs/2306.04751), which was concurrent work with Zephyr.

Btw in this paper we ran an ablation comparing the two chat templates and found the tulu one to perform better as the zephyr one often leads to shorter generations: https://arxiv.org/abs/2402.09906

Sign up or log in to comment