add dstack section (#1612) [skip ci]
Browse files* add dstack section
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
README.md
CHANGED
@@ -34,6 +34,7 @@ Features:
|
|
34 |
- [Mac](#mac)
|
35 |
- [Google Colab](#google-colab)
|
36 |
- [Launching on public clouds via SkyPilot](#launching-on-public-clouds-via-skypilot)
|
|
|
37 |
- [Dataset](#dataset)
|
38 |
- [Config](#config)
|
39 |
- [Train](#train)
|
@@ -292,6 +293,42 @@ HF_TOKEN=xx sky launch axolotl.yaml --env HF_TOKEN
|
|
292 |
HF_TOKEN=xx BUCKET=<unique-name> sky spot launch axolotl-spot.yaml --env HF_TOKEN --env BUCKET
|
293 |
```
|
294 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
295 |
### Dataset
|
296 |
|
297 |
Axolotl supports a variety of dataset formats. It is recommended to use a JSONL. The schema of the JSONL depends upon the task and the prompt template you wish to use. Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.
|
|
|
34 |
- [Mac](#mac)
|
35 |
- [Google Colab](#google-colab)
|
36 |
- [Launching on public clouds via SkyPilot](#launching-on-public-clouds-via-skypilot)
|
37 |
+
- [Launching on public clouds via dstack](#launching-on-public-clouds-via-dstack)
|
38 |
- [Dataset](#dataset)
|
39 |
- [Config](#config)
|
40 |
- [Train](#train)
|
|
|
293 |
HF_TOKEN=xx BUCKET=<unique-name> sky spot launch axolotl-spot.yaml --env HF_TOKEN --env BUCKET
|
294 |
```
|
295 |
|
296 |
+
#### Launching on public clouds via dstack
|
297 |
+
To launch on GPU instance (both on-demand and spot instances) on public clouds (GCP, AWS, Azure, Lambda Labs, TensorDock, Vast.ai, and CUDO), you can use [dstack](https://dstack.ai/).
|
298 |
+
|
299 |
+
Write a job description in YAML as below:
|
300 |
+
|
301 |
+
```yaml
|
302 |
+
# dstack.yaml
|
303 |
+
type: task
|
304 |
+
|
305 |
+
image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
|
306 |
+
|
307 |
+
env:
|
308 |
+
- HUGGING_FACE_HUB_TOKEN
|
309 |
+
- WANDB_API_KEY
|
310 |
+
|
311 |
+
commands:
|
312 |
+
- accelerate launch -m axolotl.cli.train config.yaml
|
313 |
+
|
314 |
+
ports:
|
315 |
+
- 6006
|
316 |
+
|
317 |
+
resources:
|
318 |
+
gpu:
|
319 |
+
memory: 24GB..
|
320 |
+
count: 2
|
321 |
+
```
|
322 |
+
|
323 |
+
then, simply run the job with `dstack run` command. Append `--spot` option if you want spot instance. `dstack run` command will show you the instance with cheapest price across multi cloud services:
|
324 |
+
|
325 |
+
```bash
|
326 |
+
pip install dstack
|
327 |
+
HUGGING_FACE_HUB_TOKEN=xxx WANDB_API_KEY=xxx dstack run . -f dstack.yaml # --spot
|
328 |
+
```
|
329 |
+
|
330 |
+
For further and fine-grained use cases, please refer to the official [dstack documents](https://dstack.ai/docs/) and the detailed description of [axolotl example](https://github.com/dstackai/dstack/tree/master/examples/fine-tuning/axolotl) on the official repository.
|
331 |
+
|
332 |
### Dataset
|
333 |
|
334 |
Axolotl supports a variety of dataset formats. It is recommended to use a JSONL. The schema of the JSONL depends upon the task and the prompt template you wish to use. Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.
|