Spaces:

hf-accelerate
/

accelerate_examples

Running on CPU Upgrade

App Files Files Community

accelerate_examples / code_samples /large_scale_training /deepspeed

smangrul

Upload 18 files

e5cadf9 almost 2 years ago

raw

history blame

3.76 kB

	##
	Run `accelerate config` and answer the questionnaire accordingly.
	Below is an example yaml for mixed-precision training using DeepSpeed ZeRO Stage-3 with CPU offloading on 8 GPUs.
	<pre>
	compute_environment: LOCAL_MACHINE
	deepspeed_config:
	gradient_accumulation_steps: 1
	gradient_clipping: 1.0
	offload_optimizer_device: cpu
	offload_param_device: cpu
	zero3_init_flag: true
	zero3_save_16bit_model: true
	zero_stage: 3
	distributed_type: DEEPSPEED
	downcast_bf16: 'no'
	dynamo_backend: 'NO'
	fsdp_config: {}
	machine_rank: 0
	main_training_function: main
	megatron_lm_config: {}
	mixed_precision: fp16
	num_machines: 1
	num_processes: 8
	rdzv_backend: static
	same_network: true
	use_cpu: false
	</pre>
	##
	<pre>
	from accelerate import Accelerator

	+ def main():
	accelerator = Accelerator()

	model, optimizer, training_dataloader, scheduler = accelerator.prepare(
	model, optimizer, training_dataloader, scheduler
	)

	for batch in training_dataloader:
	optimizer.zero_grad()
	inputs, targets = batch
	outputs = model(inputs)
	loss = loss_function(outputs, targets)
	accelerator.backward(loss)
	optimizer.step()
	scheduler.step()

	...

	generated_tokens = accelerator.unwrap_model(model).generate(
	batch["input_ids"],
	attention_mask=batch["attention_mask"],
	**gen_kwargs,
	+ synced_gpus=True #required for ZeRO Stage 3
	)
	...

	accelerator.unwrap_model(model).save_pretrained(
	args.output_dir,
	is_main_process=accelerator.is_main_process,
	save_function=accelerator.save,
	+ state_dict=accelerator.get_state_dict(model), #required for ZeRO Stage 3
	)

	...

	+ if __name__ == "__main__":
	+ main()
	</pre>

	Launching a script using default accelerate config file looks like the following:
	```
	accelerate launch {script_name.py} {--arg1} {--arg2} ...
	```

	Alternatively, you can use `accelerate launch` with right config params for multi-gpu training as shown below
	```
	accelerate launch \
	--use_deepspeed \
	--num_processes=8 \
	--mixed_precision=fp16 \
	--zero_stage=3 \
	--gradient_accumulation_steps=1 \
	--gradient_clipping=1 \
	--zero3_init_flag=True \
	--zero3_save_16bit_model=True \
	--offload_optimizer_device=cpu \
	--offload_param_device=cpu \
	{script_name.py} {--arg1} {--arg2} ...
	```

	##
	For core DeepSpeed features supported via accelerate config file, no changes are required for ZeRO Stages 1 and 2. For ZeRO Stage-3, transformers' `generate` function requires `synced_gpus=True` and `save_pretrained` requires the `state_dict` param due to the fact that model parameters are sharded across the GPUs.

	For advanced users who like granular control via DeepSpeed config file, it is supported wherein you can pass its loaction when running `accelerate config` command. You can also specify values of most of the fields in DeepSpeed config file as `auto` and they are filled automatically via the arguments of `accelerate launch` command and `accelerator.prepare` call thereby making life simple for users. Please refer docs on <a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed#deepspeed-config-file" target="_blank">DeepSpeed Config File</a>

	##
	To learn more checkout the related documentation:
	- <a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed" target="_blank">How to use DeepSpeed</a>
	- <a href="https://huggingface.co/blog/accelerate-deepspeed" target="_blank">Accelerate Large Model Training using DeepSpeed</a>
	- <a href="https://huggingface.co/docs/accelerate/package_reference/deepspeed" target="_blank">DeepSpeed Utilities</a>