File size: 2,067 Bytes
86b7d22
 
 
c2b64e4
86b7d22
 
c2b64e4
86b7d22
 
 
 
 
 
 
c2b64e4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86b7d22
c2b64e4
86b7d22
 
 
 
 
c2b64e4
86b7d22
 
 
 
 
 
 
c2b64e4
86b7d22
 
 
 
 
 
 
c2b64e4
86b7d22
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
title: Conversation
description: Conversation format for supervised fine-tuning.
order: 3
---

## sharegpt

conversations where `from` is `human`/`gpt`. (optional: first row with role `system` to override default system prompt)

```{.json filename="data.jsonl"}
{"conversations": [{"from": "...", "value": "..."}]}
```

Note: `type: sharegpt` opens special configs:
- `conversation`: enables conversions to many Conversation types. Refer to the 'name' [here](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py) for options.
- `roles`: allows you to specify the roles for input and output. This is useful for datasets with custom roles such as `tool` etc to support masking.
- `field_human`: specify the key to use instead of `human` in the conversation.
- `field_model`: specify the key to use instead of `gpt` in the conversation.

```yaml
datasets:
    path: ...
    type: sharegpt

    conversation: # Options (see Conversation 'name'): https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
    field_human: # Optional[str]. Human key to use for conversation.
    field_model: # Optional[str]. Assistant key to use for conversation.
    # Add additional keys from your dataset as input or output roles
    roles:
      input: # Optional[List[str]]. These will be masked based on train_on_input
      output: # Optional[List[str]].
```

## pygmalion

```{.json filename="data.jsonl"}
{"conversations": [{"role": "...", "value": "..."}]}
```

## sharegpt.load_role

conversations where `role` is used instead of `from`

```{.json filename="data.jsonl"}
{"conversations": [{"role": "...", "value": "..."}]}
```

## sharegpt.load_guanaco

conversations where `from` is `prompter` `assistant` instead of default sharegpt

```{.json filename="data.jsonl"}
{"conversations": [{"from": "...", "value": "..."}]}
```

## sharegpt_jokes

creates a chat where bot is asked to tell a joke, then explain why the joke is funny

```{.json filename="data.jsonl"}
{"conversations": [{"title": "...", "text": "...", "explanation": "..."}]}
```