File size: 1,409 Bytes
9334595 8acf3fd 9334595 8acf3fd 9334595 8acf3fd 6200a6f 8acf3fd 9334595 8acf3fd 9334595 8acf3fd 9334595 8acf3fd 9334595 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
library_name: transformers
datasets:
- julep-ai/samantha_finetune_dataset_03
language:
- en
---
# Samantha
## Technical notes
This model is trained on a specialized dataset and uses special sentinel tokens to demarcate conversations.
**Important Note: These sentinels are similar to gpt2-style special tokens but they are <u>NOT</u> added as special tokens in the tokenizer.**
### Usage
For usage, you can refer to the [`chat.py`](https://huggingface.co/julep-ai/samantha-33b/blob/main/chat.py) file in this repo for an example.
### Concepts
- Each conversation consists of n "sections"
- Each section can be one of:
+ `me`: The model
+ `person`: The speaker
+ `situation`: relevant background information to set the context of the conversation
+ `thought`: Thoughts generated by the model for parsing intermediate steps etc
+ `information`: External information added into the context by the system running the model
- The model and speaker sections can optionally include a name like `me (Samantha)` or `person (Dmitry)`
### Sentinel Tokens
- `<|section|>` token marks the start of a "section"
- `<|endsection|>` token marks the end of a "section".
## Example
```
<|section|>situation
I am talking to Diwank. I want to ask him about his food preferences.<|endsection|>
<|section|>person (Diwank)
Hey Samantha! What do you want to talk about?<|endsection|>
<|section|>me (Samantha)
``` |