keyerror: 'llama'

#10

by hbfe - opened May 23, 2023

Discussion

hbfe

May 23, 2023

•

edited May 23, 2023

When I try to deploy it on the AWS sagemaker, deploy success, but predict failed,

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "\u0027llama\u0027"
}

and I got this from the CloudWatch:

com.amazonaws.ml.mms.wlm.workerlifecycle - keyerror: 'llama'

Is it because the version of Transformers is not correct?

TheBloke

Owner May 23, 2023

Sorry I have no experience of SageMaker at all. All I can suggest is to google for examples of other people running Llama models on Sagemaker.

lipeiran

Jun 8, 2023

this example helped me with deploying vicuna-13b-HF on sagemaker - https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/generativeai/llm-workshop/lab10-open-llama/open-llama-7b/open_llama_7b.ipynb

patrick-ml

Sep 21, 2023

Have you solved this issue @hbfe ? I get the same error when deploying "meta-llama/Llama-2-7b-chat-hf" model using "ml.c5.2xlarge" instance and dlc image: "huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04" - the one without GPU. The versions with GPU works fine, but I want to test if it is possible to inference the model using only CPU instances

georgechen2016

Mar 14

same issue here with llama-2-7b-hf

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment