keyerror: 'llama'
When I try to deploy it on the AWS sagemaker, deploy success, but predict failed,
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "\u0027llama\u0027"
}
and I got this from the CloudWatch:
com.amazonaws.ml.mms.wlm.workerlifecycle - keyerror: 'llama'
Is it because the version of Transformers is not correct?
Sorry I have no experience of SageMaker at all. All I can suggest is to google for examples of other people running Llama models on Sagemaker.
this example helped me with deploying vicuna-13b-HF on sagemaker - https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/generativeai/llm-workshop/lab10-open-llama/open-llama-7b/open_llama_7b.ipynb
Have you solved this issue @hbfe ? I get the same error when deploying "meta-llama/Llama-2-7b-chat-hf" model using "ml.c5.2xlarge" instance and dlc image: "huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04" - the one without GPU. The versions with GPU works fine, but I want to test if it is possible to inference the model using only CPU instances
same issue here with llama-2-7b-hf