Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
File size: 3,175 Bytes
06a60a3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
##
Below is an example yaml for mixed precision training using DeepSpeed ZeRO Stage-3 with CPU offloading on 8 GPUs.
<pre>
compute_environment: LOCAL_MACHINE
+deepspeed_config:
+ gradient_accumulation_steps: 1
+ gradient_clipping: 1.0
+ offload_optimizer_device: cpu
+ offload_param_device: cpu
+ zero3_init_flag: true
+ zero3_save_16bit_model: true
+ zero_stage: 3
+distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
+num_machines: 1
+num_processes: 8
rdzv_backend: static
same_network: true
use_cpu: false
</pre>
##
Assume that `model` is created utilizing the `transformers` library.
<pre>
from accelerate import Accelerator
def main():
accelerator = Accelerator()
model, optimizer, training_dataloader, scheduler = accelerator.prepare(
model, optimizer, training_dataloader, scheduler
)
generated_tokens = accelerator.unwrap_model(model).generate(
batch["input_ids"],
attention_mask=batch["attention_mask"],
**gen_kwargs,
+ synced_gpus=True
)
...
accelerator.unwrap_model(model).save_pretrained(
args.output_dir,
is_main_process=accelerator.is_main_process,
save_function=accelerator.save,
+ state_dict=accelerator.get_state_dict(model)
)
...
</pre>
##
If the YAML was generated through the `accelerate config` command:
```
accelerate launch {script_name.py} {--arg1} {--arg2} ...
```
If the YAML is saved to a `~/config.yaml` file:
```
accelerate launch --config_file ~/config.yaml {script_name.py} {--arg1} {--arg2} ...
```
Or you can use `accelerate launch` with right configuration parameters and have no `config.yaml` file:
```
accelerate launch \
--use_deepspeed \
--num_processes=8 \
--mixed_precision=fp16 \
--zero_stage=3 \
--gradient_accumulation_steps=1 \
--gradient_clipping=1 \
--zero3_init_flag=True \
--zero3_save_16bit_model=True \
--offload_optimizer_device=cpu \
--offload_param_device=cpu \
{script_name.py} {--arg1} {--arg2} ...
```
##
For core DeepSpeed features (ZeRO stages 1 and 2), Accelerate requires no code changes. For ZeRO Stage-3, `transformers`' `generate` function requires `synced_gpus=True` and `save_pretrained` requires the `state_dict` param due to the fact that model parameters are sharded across the GPUs.
You can also specify values of most of the fields in the `DeepSpeed` config file to `auto` and they will be automatically filled when performing `accelerate launch`.
##
To learn more checkout the related documentation:
- <a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed" target="_blank">How to use DeepSpeed</a>
<a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed#deepspeed-config-file" target="_blank">DeepSpeed Config File</a>
- <a href="https://huggingface.co/blog/accelerate-deepspeed" target="_blank">Accelerate Large Model Training using DeepSpeed</a>
- <a href="https://huggingface.co/docs/accelerate/package_reference/deepspeed" target="_blank">DeepSpeed Utilities</a> |