Sagemaker Deployment Failing in ml.g5.2xlarge instance

by rishisaraf11 - opened Aug 17, 2023

Aug 17, 2023

I am getting the below error in Cloudwatch. We are trying to deploy it in ml.g5.2xlarge instance. Any resolution for this or we need to deploy it in bigger instance.

torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated : 20.61 GiB
Requested : 172.00 MiB
Device limit : 22.20 GiB
Free (according to CUDA): 15.12 MiB
PyTorch limit (set by user-supplied memory fraction)
: 22.20 GiB
The above exception was the direct cause of the following exception:

senwu

NumbersStation org Aug 18, 2023

The model can be deployed on g5.xlarge with torch.bfloat16.

rishisaraf11

Aug 18, 2023

Thanks @senwu . Can you please tell me how to give torch.bfloat16. configuration in the deployment script. Sorry, I am new to this and don't know many of these configs. Below is the deployment script I am using

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='AmazonSageMaker-ExecutionRole-20230723T133694')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'NumbersStation/nsql-llama-2-7B',
    'SM_NUM_GPUS': json.dumps(1)
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface",version="0.9.3"),
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    container_startup_health_check_timeout=300,
)

predictor.predict({
    "inputs": "Can you please let us know more details about your ",
})```

senwu

NumbersStation org Aug 19, 2023

Hi @rishisaraf11 ，

We haven't used Sagemaker to deploy the model and from the doc it doesn't seem like there is much flexibility. The model prefers torch.bfloat16 but you can still use other dtype.

arviii

Aug 19, 2023

Hi @senwu

I tried different variations of passing `SM_FRAMEWORK_PARAMS` into env for `HuggingFaceModel` class in the script shared by @rishisaraf11 but no luck

hub = {
'HF_MODEL_ID': 'NumbersStation/nsql-llama-2-7B',
'SM_NUM_GPUS': json.dumps(1),
'SM_FRAMEWORK_PARAMS': "{'torch_dtype': 'bfloat16'}"
}

#create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface", version="0.9.3"),
env=hub,
role=role,
)

#deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)

senwu

NumbersStation org Aug 25, 2023

It seems like sagemaker doesn't have full transformer support yet. You can use the default config for the model as well.

You can also use g5.2xlarge machine or low_cpu_mem_usage=True from https://huggingface.co/docs/transformers/main_classes/model to reduce the RAM usage when loading the model.

arviii

Aug 28, 2023

Thank you for the reply @senwu

Problem seems with the overflow of GPU VRAM which is `~22.2 GB's`

for ml.g5.2xlarge which has Nvidia A10g 24 GB GPU.

Error: Sagemaker deployment failed due to memory error

torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated : 20.61 GiB
Requested : 172.00 MiB
Device limit : 22.20 GiB
Free (according to CUDA): 15.12 MiB
PyTorch limit (set by user-supplied memory fraction)
: 22.20 GiB

senwu

NumbersStation org Aug 28, 2023

To torch.float32 version of the model it requires around 26G VRAM. We will adjust the default model type this week.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Sagemaker Deployment Failing in ml.g5.2xlarge instance

I tried different variations of passing SM_FRAMEWORK_PARAMS into env for HuggingFaceModel class in the script shared by @rishisaraf11 but no luck

Thank you for the reply @senwu

Problem seems with the overflow of GPU VRAM which is ~22.2 GB's

Error: Sagemaker deployment failed due to memory error

I tried different variations of passing `SM_FRAMEWORK_PARAMS` into env for `HuggingFaceModel` class in the script shared by @rishisaraf11 but no luck

Problem seems with the overflow of GPU VRAM which is `~22.2 GB's`