Spaces:

hf-accelerate
/

accelerate_examples

Running on CPU Upgrade

App Files Files Community

accelerate_examples / code_samples /large_scale_training /pytorch_fsdp

smangrul

Upload 18 files

e5cadf9 almost 2 years ago

raw

history blame

2.64 kB

	##
	Run `accelerate config` and answer the questionnaire accordingly.
	Below is an example yaml for BF16 mixed-precision training using PyTorch FSDP with CPU offloading on 8 GPUs.
	<pre>
	compute_environment: LOCAL_MACHINE
	deepspeed_config: {}
	distributed_type: FSDP
	downcast_bf16: 'no'
	dynamo_backend: 'NO'
	fsdp_config:
	fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
	fsdp_backward_prefetch_policy: BACKWARD_PRE
	fsdp_offload_params: true
	fsdp_sharding_strategy: 1
	fsdp_state_dict_type: FULL_STATE_DICT
	fsdp_transformer_layer_cls_to_wrap: T5Block
	machine_rank: 0
	main_training_function: main
	megatron_lm_config: {}
	mixed_precision: bf16
	num_machines: 1
	num_processes: 8
	rdzv_backend: static
	same_network: true
	use_cpu: false
	</pre>
	##
	<pre>
	from accelerate import Accelerator

	+ def main():
	accelerator = Accelerator()

	model = accelerator.prepare(model)

	optimizer, training_dataloader, scheduler = accelerator.prepare(
	optimizer, training_dataloader, scheduler
	)

	for batch in training_dataloader:
	optimizer.zero_grad()
	inputs, targets = batch
	outputs = model(inputs)
	loss = loss_function(outputs, targets)
	accelerator.backward(loss)
	optimizer.step()
	scheduler.step()

	...

	+ if __name__ == "__main__":
	+ main()
	</pre>

	Launching a script using default accelerate config file looks like the following:
	```
	accelerate launch {script_name.py} {--arg1} {--arg2} ...
	```

	Alternatively, you can use `accelerate launch` with right config params for multi-gpu training as shown below
	```
	accelerate launch \
	--use_fsdp \
	--num_processes=8 \
	--mixed_precision=bf16 \
	--fsdp_sharding_strategy=1 \
	--fsdp_auto_wrap_policy=TRANSFORMER_BASED_WRAP \
	--fsdp_transformer_layer_cls_to_wrap=T5Block \
	--fsdp_offload_params=true \
	{script_name.py} {--arg1} {--arg2} ...
	```

	##
	For PyTorch FDSP, you need to prepare the model first before preparing the optimizer since FSDP will shard parameters in-place and this will break any previously initialized optimizers. Same in outlined in the above code snippet. For transformer models, please use `TRANSFORMER_BASED_WRAP` auto wrap policy as shown in the config above.


	##
	To learn more checkout the related documentation:
	- <a href="https://huggingface.co/docs/accelerate/usage_guides/fsdp" target="_blank">How to use FSDP</a>
	- <a href="https://huggingface.co/blog/pytorch-fsdp" target="_blank">Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel</a>