Spaces:

hf-accelerate
/

accelerate_examples

Running on CPU Upgrade

App Files Files Community

accelerate_examples / code_samples /large_scale_training /aws_sagemaker

smangrul

Upload 18 files

e5cadf9 almost 2 years ago

raw

history blame

3.06 kB

	##
	Run `accelerate config` on and answer the questionnaire accordingly.
	Below is an example yaml for running code remotely on AWS SageMaker. Replace placeholder `xxxxx` with
	appropriate values.

	<pre>
	base_job_name: accelerate-sagemaker-1
	compute_environment: AMAZON_SAGEMAKER
	distributed_type: 'NO'
	dynamo_backend: 'NO'
	ec2_instance_type: ml.p3.2xlarge
	gpu_ids: all
	iam_role_name: xxxxx
	mixed_precision: 'no'
	num_machines: 1
	profile: xxxxx
	py_version: py38
	pytorch_version: 1.10.2
	region: us-east-1
	transformers_version: 4.17.0
	use_cpu: false
	</pre>
	##
	<pre>
	from accelerate import Accelerator

	def parse_args():
	parser = argparse.ArgumentParser(description="sample task")

	parser.add_argument(
	"--pad_to_max_length",
	- action="store_true",
	+ type=bool,
	+ default=False,
	help="If passed, pad all samples to `max_length`. Otherwise, dynamic padding is used.",
	)

	...


	+ def main():
	accelerator = Accelerator()

	model, optimizer, training_dataloader, scheduler = accelerator.prepare(
	model, optimizer, training_dataloader, scheduler
	)

	for batch in training_dataloader:
	optimizer.zero_grad()
	inputs, targets = batch
	outputs = model(inputs)
	loss = loss_function(outputs, targets)
	accelerator.backward(loss)
	optimizer.step()
	scheduler.step()

	- torch.save('/opt/ml/model`)
	+ accelerator.save('/opt/ml/model')

	+ if __name__ == "__main__":
	+ main()
	</pre>
	Launching a script using default accelerate config file looks like the following:
	```
	accelerate launch {script_name.py} {--arg1} {--arg2} ...
	```
	##
	SageMaker doesn’t support argparse actions. If you want to use, for example, boolean hyperparameters, you need to specify type as bool in your script and provide an explicit True or False value for this hyperparameter. An example for the same is shown above for `pad_to_max_length` argument. Another important point is to save all the output artifacts to `/opt/ml/model` or use `os.environ["SM_MODEL_DIR"]` as your save directory. After training, artifacts in this directory are uploaded to S3, an example is shown in above code snippet.

	You can provide custom docker image, input channels pointing to S3 data locations and use SageMaker metrics logging
	as part of advanced features. Please refer <a href="https://github.com/huggingface/notebooks/tree/main/sagemaker/22_accelerate_sagemaker_examples" target="_blank">Examples showcasing AWS SageMaker integration of 🤗 Accelerate</a>

	##
	To learn more checkout the related documentation:
	- <a href="https://huggingface.co/docs/accelerate/usage_guides/sagemaker" target="_blank">How to use 🤗 Accelerate with SageMaker</a>
	- <a href="https://github.com/huggingface/notebooks/tree/main/sagemaker/22_accelerate_sagemaker_examples" target="_blank">Examples showcasing AWS SageMaker integration of 🤗 Accelerate</a>
	- <a href="https://huggingface.co/docs/accelerate/main/en/package_reference/cli" target="_blank">The Accelerate CLI</a>