Fix add_generation_prompt in tokenizer.config
Browse filesWe should only add generation prompt after the last message.
Code
```
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b-chat', revision='refs/pr/23')
chat = [{
'content':
'Please summarize the goals in this text:\n\nGoing outside has benefits include reducing stress and triggering the relaxation response, which can help us not only feel better mentally, but even heal faster from physical ailments.',
'role':
'user'
}, {
'content': 'You should go outside and touch grass.',
'role': 'assistant'
}, {
'content': 'What else can I do?',
'role': 'user'
}
]
print('\nBEFORE!')
print(tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True))
tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b-chat', revision='refs/pr/24')
print('\nAFTER!')
print(tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True))
```
Output
```
BEFORE!
<|im_start|>system
A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.
<|im_start|>user
Please summarize the goals in this text:
Going outside has benefits include reducing stress and triggering the relaxation response, which can help us not only feel better mentally, but even heal faster from physical ailments.<|im_end|>
<|im_start|>assistant
<|im_start|>assistant
You should go outside and touch grass.<|im_end|>
<|im_start|>assistant
<|im_start|>user
What else can I do?<|im_end|>
<|im_start|>assistant
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
AFTER!
<|im_start|>system
A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.
<|im_start|>user
Please summarize the goals in this text:
Going outside has benefits include reducing stress and triggering the relaxation response, which can help us not only feel better mentally, but even heal faster from physical ailments.<|im_end|>
<|im_start|>assistant
You should go outside and touch grass.<|im_end|>
<|im_start|>user
What else can I do?<|im_end|>
<|im_start|>assistant
```
- tokenizer_config.json +1 -1
@@ -6,5 +6,5 @@
|
|
6 |
"model_max_length": 8192,
|
7 |
"tokenizer_class": "GPTNeoXTokenizer",
|
8 |
"unk_token": "<|endoftext|>",
|
9 |
-
"chat_template": "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% elif not 'system' in messages[0]['role'] %}{% set loop_messages = messages %}{% set system_message = 'A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.' %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if loop.index0 == 0 %}{% if system_message != false %}{{ '<|im_start|>system\n' + system_message.strip() + '\n'}}{% endif %}{{ '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' }}{% else %}{{ '\n' + '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' }}{% endif %}{% if (add_generation_prompt == true) %}{{ '\n' + '<|im_start|>' + 'assistant' + '\n' }}{% elif (message['role'] == 'assistant') %}{% endif %}{% endfor %}"
|
10 |
}
|
|
|
6 |
"model_max_length": 8192,
|
7 |
"tokenizer_class": "GPTNeoXTokenizer",
|
8 |
"unk_token": "<|endoftext|>",
|
9 |
+
"chat_template": "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% elif not 'system' in messages[0]['role'] %}{% set loop_messages = messages %}{% set system_message = 'A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.' %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if loop.index0 == 0 %}{% if system_message != false %}{{ '<|im_start|>system\n' + system_message.strip() + '\n'}}{% endif %}{{ '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' }}{% else %}{{ '\n' + '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' }}{% endif %}{% if (add_generation_prompt == true and loop.last) %}{{ '\n' + '<|im_start|>' + 'assistant' + '\n' }}{% elif (message['role'] == 'assistant') %}{% endif %}{% endfor %}"
|
10 |
}
|