Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
## | |
Run `accelerate config` and answer the questionnaire accordingly. | |
Below is an example yaml for mixed-precision training using DeepSpeed ZeRO Stage-3 with CPU offloading on 8 GPUs. | |
<pre> | |
compute_environment: LOCAL_MACHINE | |
deepspeed_config: | |
gradient_accumulation_steps: 1 | |
gradient_clipping: 1.0 | |
offload_optimizer_device: cpu | |
offload_param_device: cpu | |
zero3_init_flag: true | |
zero3_save_16bit_model: true | |
zero_stage: 3 | |
distributed_type: DEEPSPEED | |
downcast_bf16: 'no' | |
dynamo_backend: 'NO' | |
fsdp_config: {} | |
machine_rank: 0 | |
main_training_function: main | |
megatron_lm_config: {} | |
mixed_precision: fp16 | |
num_machines: 1 | |
num_processes: 8 | |
rdzv_backend: static | |
same_network: true | |
use_cpu: false | |
</pre> | |
## | |
<pre> | |
from accelerate import Accelerator | |
+ def main(): | |
accelerator = Accelerator() | |
model, optimizer, training_dataloader, scheduler = accelerator.prepare( | |
model, optimizer, training_dataloader, scheduler | |
) | |
for batch in training_dataloader: | |
optimizer.zero_grad() | |
inputs, targets = batch | |
outputs = model(inputs) | |
loss = loss_function(outputs, targets) | |
accelerator.backward(loss) | |
optimizer.step() | |
scheduler.step() | |
... | |
generated_tokens = accelerator.unwrap_model(model).generate( | |
batch["input_ids"], | |
attention_mask=batch["attention_mask"], | |
**gen_kwargs, | |
+ synced_gpus=True #required for ZeRO Stage 3 | |
) | |
... | |
accelerator.unwrap_model(model).save_pretrained( | |
args.output_dir, | |
is_main_process=accelerator.is_main_process, | |
save_function=accelerator.save, | |
+ state_dict=accelerator.get_state_dict(model), #required for ZeRO Stage 3 | |
) | |
... | |
+ if __name__ == "__main__": | |
+ main() | |
</pre> | |
Launching a script using default accelerate config file looks like the following: | |
``` | |
accelerate launch {script_name.py} {--arg1} {--arg2} ... | |
``` | |
Alternatively, you can use `accelerate launch` with right config params for multi-gpu training as shown below | |
``` | |
accelerate launch \ | |
--use_deepspeed \ | |
--num_processes=8 \ | |
--mixed_precision=fp16 \ | |
--zero_stage=3 \ | |
--gradient_accumulation_steps=1 \ | |
--gradient_clipping=1 \ | |
--zero3_init_flag=True \ | |
--zero3_save_16bit_model=True \ | |
--offload_optimizer_device=cpu \ | |
--offload_param_device=cpu \ | |
{script_name.py} {--arg1} {--arg2} ... | |
``` | |
## | |
For core DeepSpeed features supported via accelerate config file, no changes are required for ZeRO Stages 1 and 2. For ZeRO Stage-3, transformers' `generate` function requires `synced_gpus=True` and `save_pretrained` requires the `state_dict` param due to the fact that model parameters are sharded across the GPUs. | |
For advanced users who like granular control via DeepSpeed config file, it is supported wherein you can pass its loaction when running `accelerate config` command. You can also specify values of most of the fields in DeepSpeed config file as `auto` and they are filled automatically via the arguments of `accelerate launch` command and `accelerator.prepare` call thereby making life simple for users. Please refer docs on <a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed#deepspeed-config-file" target="_blank">DeepSpeed Config File</a> | |
## | |
To learn more checkout the related documentation: | |
- <a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed" target="_blank">How to use DeepSpeed</a> | |
- <a href="https://huggingface.co/blog/accelerate-deepspeed" target="_blank">Accelerate Large Model Training using DeepSpeed</a> | |
- <a href="https://huggingface.co/docs/accelerate/package_reference/deepspeed" target="_blank">DeepSpeed Utilities</a> |