Update README.md
Browse files
README.md
CHANGED
@@ -6,15 +6,18 @@ language:
|
|
6 |
- en
|
7 |
---
|
8 |
|
9 |
-
|
10 |
|
11 |
-
|
12 |
|
13 |
This model is trained on a specialized dataset and uses special sentinel tokens to demarcate conversations.
|
14 |
|
15 |
**Important Note: These sentinels are similar to gpt2-style special tokens but they are <u>NOT</u> added as special tokens in the tokenizer.**
|
16 |
|
17 |
-
|
|
|
|
|
|
|
18 |
- Each conversation consists of n "sections"
|
19 |
- Each section can be one of:
|
20 |
+ `me`: The model
|
@@ -24,12 +27,11 @@ This model is trained on a specialized dataset and uses special sentinel tokens
|
|
24 |
+ `information`: External information added into the context by the system running the model
|
25 |
- The model and speaker sections can optionally include a name like `me (Samantha)` or `person (Dmitry)`
|
26 |
|
27 |
-
|
28 |
- `<|section|>` token marks the start of a "section"
|
29 |
-
- `<|endsection|>` token marks the end of a "section".
|
30 |
-
- these are both "special" tokens and are not split up by the tokenizer
|
31 |
|
32 |
-
|
33 |
|
34 |
```
|
35 |
<|section|>situation
|
|
|
6 |
- en
|
7 |
---
|
8 |
|
9 |
+
# Samantha
|
10 |
|
11 |
+
## Technical notes
|
12 |
|
13 |
This model is trained on a specialized dataset and uses special sentinel tokens to demarcate conversations.
|
14 |
|
15 |
**Important Note: These sentinels are similar to gpt2-style special tokens but they are <u>NOT</u> added as special tokens in the tokenizer.**
|
16 |
|
17 |
+
### Usage
|
18 |
+
For usage, you can refer to the `chat.py` file in this repo for an example.
|
19 |
+
|
20 |
+
### Concepts
|
21 |
- Each conversation consists of n "sections"
|
22 |
- Each section can be one of:
|
23 |
+ `me`: The model
|
|
|
27 |
+ `information`: External information added into the context by the system running the model
|
28 |
- The model and speaker sections can optionally include a name like `me (Samantha)` or `person (Dmitry)`
|
29 |
|
30 |
+
### Sentinel Tokens
|
31 |
- `<|section|>` token marks the start of a "section"
|
32 |
+
- `<|endsection|>` token marks the end of a "section".
|
|
|
33 |
|
34 |
+
## Example
|
35 |
|
36 |
```
|
37 |
<|section|>situation
|