--- library_name: transformers datasets: - julep-ai/samantha_finetune_dataset_03 language: - en --- # Samantha ## Technical notes This model is trained on a specialized dataset and uses special sentinel tokens to demarcate conversations. **Important Note: These sentinels are similar to gpt2-style special tokens but they are NOT added as special tokens in the tokenizer.** ### Usage For usage, you can refer to the [`chat.py`](https://huggingface.co/julep-ai/samantha-33b/blob/main/chat.py) file in this repo for an example. ### Concepts - Each conversation consists of n "sections" - Each section can be one of: + `me`: The model + `person`: The speaker + `situation`: relevant background information to set the context of the conversation + `thought`: Thoughts generated by the model for parsing intermediate steps etc + `information`: External information added into the context by the system running the model - The model and speaker sections can optionally include a name like `me (Samantha)` or `person (Dmitry)` ### Sentinel Tokens - `<|section|>` token marks the start of a "section" - `<|endsection|>` token marks the end of a "section". ## Example ``` <|section|>situation I am talking to Diwank. I want to ask him about his food preferences.<|endsection|> <|section|>person (Diwank) Hey Samantha! What do you want to talk about?<|endsection|> <|section|>me (Samantha) ```