How was fine-tuning done?
I'm wondering why this model is CC-By-NC-SA-4.0 (non-commercial use only)?
I suppose it's because of the alpaca and HC3 datasets used for fine tuning.
What was the thinking behind that? Were the datasets used for mpt-7b-instruct insufficient for being able to achieve chat-type fine tuning?
Is there information available on how the fine tuning was done and the prompt format? Were any multiple response formats used (i.e. user, then assistant, then user, then assistant). Thanks
We have explained the license, see for example https://huggingface.co/mosaicml/mpt-7b-chat/discussions/15
"Were the datasets used for mpt-7b-instruct insufficient for being able to achieve chat-type fine tuning?" β See the LIMA paper, Instruct could probably be adapted to chat without too many samples! We decided to build 2 models to do 2 different things: Instruct is meant to be immediately valuable for commercial use, chat is meant to be the highest quality model we can build regardless of license.
Prompt format used to be public but we took the Spaces down, though you can see the format in the inference script, we will have to document that again. The format was ChatML, n-turn conversations were turned into n samples, so each response is loss-generating once.
The training details are in the readme, https://huggingface.co/mosaicml/mpt-7b-chat#training-configuration
@sam-mosaic many thanks!
To be sure I understand... you're using |im_start| and |im_end| to wrap each system/user/assistant response individually (but I assume you're then using BOS and EOS tokens to wrap each full turn (i.e. system + user + assistant) as well:
def __init__(self, system: str, user: str, assistant: str) -> None:
self.system = system if system else '<|im_start|>system\nA conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.<|im_end|>\n'
self.user = user if user else '<|im_start|>user\n{}<|im_end|>\n'
self.assistant = assistant if assistant else '<|im_start|>assistant\n{}<|im_end|>\n'
self.response_prefix = self.assistant.split('{}')[0]