Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
## | |
Run `accelerate config` on and answer the questionnaire accordingly. | |
Below is an example yaml for running code remotely on AWS SageMaker. Replace placeholder `xxxxx` with | |
appropriate values. | |
<pre> | |
base_job_name: accelerate-sagemaker-1 | |
compute_environment: AMAZON_SAGEMAKER | |
distributed_type: 'NO' | |
dynamo_backend: 'NO' | |
ec2_instance_type: ml.p3.2xlarge | |
gpu_ids: all | |
iam_role_name: xxxxx | |
mixed_precision: 'no' | |
num_machines: 1 | |
profile: xxxxx | |
py_version: py38 | |
pytorch_version: 1.10.2 | |
region: us-east-1 | |
transformers_version: 4.17.0 | |
use_cpu: false | |
</pre> | |
## | |
<pre> | |
from accelerate import Accelerator | |
def parse_args(): | |
parser = argparse.ArgumentParser(description="sample task") | |
parser.add_argument( | |
"--pad_to_max_length", | |
- action="store_true", | |
+ type=bool, | |
+ default=False, | |
help="If passed, pad all samples to `max_length`. Otherwise, dynamic padding is used.", | |
) | |
... | |
+ def main(): | |
accelerator = Accelerator() | |
model, optimizer, training_dataloader, scheduler = accelerator.prepare( | |
model, optimizer, training_dataloader, scheduler | |
) | |
for batch in training_dataloader: | |
optimizer.zero_grad() | |
inputs, targets = batch | |
outputs = model(inputs) | |
loss = loss_function(outputs, targets) | |
accelerator.backward(loss) | |
optimizer.step() | |
scheduler.step() | |
- torch.save('/opt/ml/model`) | |
+ accelerator.save('/opt/ml/model') | |
+ if __name__ == "__main__": | |
+ main() | |
</pre> | |
Launching a script using default accelerate config file looks like the following: | |
``` | |
accelerate launch {script_name.py} {--arg1} {--arg2} ... | |
``` | |
## | |
SageMaker doesn’t support argparse actions. If you want to use, for example, boolean hyperparameters, you need to specify type as bool in your script and provide an explicit True or False value for this hyperparameter. An example for the same is shown above for `pad_to_max_length` argument. Another important point is to save all the output artifacts to `/opt/ml/model` or use `os.environ["SM_MODEL_DIR"]` as your save directory. After training, artifacts in this directory are uploaded to S3, an example is shown in above code snippet. | |
You can provide custom docker image, input channels pointing to S3 data locations and use SageMaker metrics logging | |
as part of advanced features. Please refer <a href="https://github.com/huggingface/notebooks/tree/main/sagemaker/22_accelerate_sagemaker_examples" target="_blank">Examples showcasing AWS SageMaker integration of 🤗 Accelerate</a> | |
## | |
To learn more checkout the related documentation: | |
- <a href="https://huggingface.co/docs/accelerate/usage_guides/sagemaker" target="_blank">How to use 🤗 Accelerate with SageMaker</a> | |
- <a href="https://github.com/huggingface/notebooks/tree/main/sagemaker/22_accelerate_sagemaker_examples" target="_blank">Examples showcasing AWS SageMaker integration of 🤗 Accelerate</a> | |
- <a href="https://huggingface.co/docs/accelerate/main/en/package_reference/cli" target="_blank">The Accelerate CLI</a> |