meta-llama/Llama-3.2-90B-Vision · Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-3.2-90B-Vision/resolve/main/config.json.

Trying to deploy this model to AWS sagemaker using the following code:

import sagemaker
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri


# Define the IAM role with SageMaker permissions
role = '<aws_role>

# Specify the Hugging Face model
# model_name = "huggingface/llama-3.2-90b"

hub = {
    'HF_MODEL_ID':'meta-llama/Llama-3.2-90B-Vision',
    'HF_TASK': 'Image-Text-to-Text',
    'HF_AUTH_TOKEN': '<hf_token>'
}

# Instantiate the Hugging Face model
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface",version="2.2.0"),
    env=hub,
    role=role,
    transformers_version="4.37",
    pytorch_version="2.1.0",
    py_version="py310"
)

# Deploy the model to an endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.p3.2xlarge",
    container_startup_health_check_timeout=300
)

# Test the deployed endpoint
data = {
    "inputs": "testing ccpgi llm model"
}

result = predictor.predict(data)
print(result)

# Clean up resources
# predictor.delete_endpoint()

But I am getting following error. Any idea how can I resolve this?

2024-11-06T18:18:34.031Z
Traceback (most recent call last): File "/opt/conda/bin/text-generation-server", line 8, in <module> sys.exit(app()) File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 198, in download_weights config = hf_hub_download( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download return _hf_hub_download_to_cache_dir( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir _raise_on_head_call_error(head_call_error, force_download, local_files_only) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error raise head_call_error File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata r = _request_wrapper( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper response = _request_wrapper( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper hf_raise_for_status(response) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status raise GatedRepoError(message, response) from e
2024-11-06T18:18:34.031Z
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-672bb2f9-5eae8dbf17c06d574ff55c0d;86c6da72-f60e-45fa-9a04-b6fe639a57c3)
2024-11-06T18:18:34.031Z
Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-3.2-90B-Vision/resolve/main/config.json.
2024-11-06T18:18:34.031Z
Access to model meta-llama/Llama-3.2-90B-Vision is restricted. You must have access to it and be authenticated to access it. Please log in.
2024-11-06T18:18:37.538Z
Error: DownloadError
2024-11-06T18:18:37.538Z
#033[2m2024-11-06T18:18:37.343440Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Args { model_id: "meta-llama/Llama-3.2-90B-Vision", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_tokens: None, max_input_length: None, max_total_tokens: None, waiting_served_ratio: 0.3, max_batch_prefill_tokens: None, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, cuda_graphs: None, hostname: "container-0.local", port: 8080, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some( "/tmp", ), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-generation-inference.router", cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false, max_client_batch_size: 4, lora_adapters: None, disable_usage_stats: false, disable_crash_reports: false,
2024-11-06T18:18:37.538Z
}