File size: 1,666 Bytes
b91e31d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
##
<pre>
from accelerate import Accelerator
accelerator = Accelerator(
+    gradient_accumulation_steps=2,
)
dataloader, model, optimizer scheduler = accelerator.prepare(
        dataloader, model, optimizer, scheduler
)

for batch in dataloader:
+  with accelerator.accumulate(model):
      optimizer.zero_grad()
      inputs, targets = batch
      outputs = model(inputs)
      loss = loss_function(outputs, targets)
      accelerator.backward(loss)
      optimizer.step()
      scheduler.step()</pre>

##
When performing gradient accumulation in a distributed setup, there are many opportunities for efficiency mistakes
to occur. `Accelerator` provides a context manager that will take care of the details for you and ensure that the
model is training correctly. Simply wrap the training loop in the `Accelerator.accumulate` context manager
while passing in the model you are training on and during training the gradients will accumulate and synchronize
automatically when needed.

##
To learn more checkout the related documentation:
- <a href="https://huggingface.co/docs/accelerate/usage_guides/gradient_accumulation" target="_blank">Performing gradient accumulation</a>
- <a href="https://huggingface.co/docs/accelerate/package_reference/accelerator#accelerate.Accelerator.accumulate" target="_blank">API reference</a>
- <a href="https://github.com/huggingface/accelerate/blob/main/examples/by_feature/gradient_accumulation.py" target="_blank">Example script</a>
- <a href="https://github.com/huggingface/accelerate/blob/main/examples/by_feature/automatic_gradient_accumulation.py" target="_blank">Performing automatic gradient accumulation example script</a>