Text Generation
Transformers
PyTorch
English
llama
causal-lm
text-generation-inference
Inference Endpoints

keyerror: 'llama'

#10
by hbfe - opened

When I try to deploy it on the AWS sagemaker, deploy success, but predict failed,

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "\u0027llama\u0027"
}

and I got this from the CloudWatch:

com.amazonaws.ml.mms.wlm.workerlifecycle - keyerror: 'llama'

Is it because the version of Transformers is not correct?

Sorry I have no experience of SageMaker at all. All I can suggest is to google for examples of other people running Llama models on Sagemaker.

Have you solved this issue @hbfe ? I get the same error when deploying "meta-llama/Llama-2-7b-chat-hf" model using "ml.c5.2xlarge" instance and dlc image: "huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04" - the one without GPU. The versions with GPU works fine, but I want to test if it is possible to inference the model using only CPU instances

same issue here with llama-2-7b-hf

Sign up or log in to comment