julep-ai
/

samantha-33b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

diwank commited on Jun 22, 2023

Commit

9334595

•

1 Parent(s): ad5c574

Create README.md

Files changed (1) hide show

README.md +40 -0

README.md ADDED Viewed

	@@ -0,0 +1,40 @@

+---
+library_name: transformers
+datasets:
+- julep-ai/samantha_finetune_dataset_03
+language:
+- en
+---
+## Samantha
+### Technical notes
+This model is trained on a specialized dataset and uses special sentinel tokens to demarcate conversations.
+**Important Note: These sentinels are similar to gpt2-style special tokens but they are <u>NOT</u> added as special tokens in the tokenizer.**
+**Concepts**:
+- Each conversation consists of n "sections"
+- Each section can be one of:
+  + `me`: The model
+  + `person`: The speaker
+  + `situation`: relevant background information to set the context of the conversation
+  + `thought`: Thoughts generated by the model for parsing intermediate steps etc
+  + `information`: External information added into the context by the system running the model
+- The model and speaker sections can optionally include a name like `me (Samantha)` or `person (Dmitry)`
+**Tokens**:
+- `<|section|>` token marks the start of a "section"
+- `<|endsection|>` token marks the end of a "section". This is also set to be the default `EOS` token in the tokenizer
+- these are both "special" tokens and are not split up by the tokenizer
+### Example
+```
+<|section|>situation
+I am talking to Diwank. I want to ask him about his food preferences.<|endsection|>
+<|section|>person (Diwank)
+Hey Samantha! What do you want to talk about?<|endsection|>
+<|section|>me (Samantha)
+```