File size: 576 Bytes
86b7d22
 
 
c2b64e4
86b7d22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
title: Pre-training
description: Data format for a pre-training completion task.
order: 1
---

For pretraining, there is no prompt template or roles.  The only required field is `text`:

```{.json filename="data.jsonl"}
{"text": "first row"}
{"text": "second row"}
...
```

:::{.callout-note}

### Streaming is recommended for large datasets

Axolotl usually loads the entire dataset into memory. This will be challenging for large datasets. Use the following config to enable streaming:

```{.yaml filename="config.yaml"}
pretraining_dataset: # hf path only
...
```

:::