## Run `accelerate config` and answer the questionnaire accordingly. Below is an example yaml for using multi-gpu training with 4 GPUs.

compute_environment: LOCAL_MACHINE                                                                                             
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: all
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
use_cpu: false

  from accelerate import Accelerator
  
+ def main():
    accelerator = Accelerator()

    model, optimizer, training_dataloader, scheduler = accelerator.prepare(
        model, optimizer, training_dataloader, scheduler
    )

    for batch in training_dataloader:
        optimizer.zero_grad()
        inputs, targets = batch
        outputs = model(inputs)
        loss = loss_function(outputs, targets)
        accelerator.backward(loss)
        optimizer.step()
        scheduler.step()

+ if __name__ == "__main__":
+     main()

Launching a script using default accelerate config file looks like the following: ``` accelerate launch {script_name.py} {--arg1} {--arg2} ... ``` Alternatively, you can use `accelerate launch` with right config params for multi-gpu training as shown below ``` accelerate launch --multi_gpu --num_processes=4 {script_name.py} {--arg1} {--arg2} ... ``` ## Using this feature involves no changes to the code apart from the ones mentioned in the tab `Simplify your code and improve efficieny`. ## To learn more checkout the related documentation: - Launching distributed code - The Accelerate CLI