Add Optional `add_generation_prompt` Parameter to control final generation prompt in `chat_template`
Problem
When training models with data prepared using the current chat template, the model often generates responses in an infinite loop. This happens because the template appends an assistant prompt at the end of every conversation, even in training data. Consequently, the model learns to always expect a new input prompt, which causes it to keep generating responses indefinitely.
Proposed Solution
Add an optional add_generation_prompt parameter to tokenizer.apply_chat_template. When add_generation_prompt is set to False, the template should exclude the final assistant prompt, thus preventing models from learning this unnecessary completion pattern during training. This change would ensure that the model does not inadvertently develop an infinite generation habit.
Suggested Template Modification: Here is a modification example for the current chat template:
{% set loop_messages = messages %}
{% for message in loop_messages %}
{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' %}
{% if loop.index0 == 0 %}
{% set content = bos_token + content %}
{% endif %}
{{ content }}
{% endfor %}
{% if add_generation_prompt %}
{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{% endif %}
Desired Outcomes
- Prevents Infinite Generation: By setting add_generation_prompt to False during training data preparation, the model won't learn to add prompts indefinitely, thus avoiding infinite generation loops.
- Enhances Training Flexibility: Enables more control over the conversation format, ensuring the model learns appropriate conversational stopping points.
Thank you for considering this feature enhancement!
์ข์ ๋ชจ๋ธ์ ๊ณต์ ํด์ฃผ์ ์ ๊ฐ์ฌํฉ๋๋ค. ์ ๋ ์์ chat template์ผ๋ก ํ๋์ฝ๋ฉํ ํ ํฌ๋์ด์ ๋ฅผ ์์ ํด์ ์ฌ์ฉํ๊ณ ์์ต๋๋ค.
์๋ ๋งํฌ ์ฐธ๊ณ ํด์ฃผ์ธ์.
https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/commit/7b8a80c1b2c60f06e51c17728215e266fe24bec9