Model Card for Model ID
This model is an AWS Neuron compiled version of the Korean fine-tuned model MLP-KTLim/llama-3-Korean-Bllossom-8B, available at https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B. It is intended for deployment on Amazon EC2 Inferentia2 and Amazon SageMaker. For detailed information about the model and its license, please refer to the original MLP-KTLim/llama-3-Korean-Bllossom-8B model page
Model Details
This model is compiled with neuronx-cc version, 2.13.66.0. It can be deployed with neuronnx-tgi docker, ghcr.io/huggingface/neuronx-tgi:0.0.23
How to Get Started with the Model
You can pull the docker image ghcr.io/huggingface/neuronx-tgi:0.0.23, downlaod this model and run the command like this example:
docker run \
-p 8080:80 \
-v $(pwd)/data:/data \
--privileged \
ghcr.io/huggingface/neuronx-tgi:latest \
--model-id /data/AWS-Neuron-llama-3-Korean-Bllossom-8B
After deployment, you can inference like this
curl 127.0.0.1:8080/generate \
-X POST \
-d '{"inputs":"๋ฅ๋ฌ๋์ด ๋ญ์ผ?","parameters":{"max_new_tokens":512}}' \
-H 'Content-Type: application/json'
or
curl localhost:8080/v1/chat/completions \
-X POST \
-d '{
"model": "tgi",
"messages": [
{
"role": "system",
"content": "๋น์ ์ ์ธ๊ณต์ง๋ฅ ์ ๋ฌธ๊ฐ ์
๋๋ค."
},
{
"role": "user",
"content": "๋ฅ๋ฌ๋์ด ๋ฌด์์
๋๊น?"
}
],
"stream": false,
"max_tokens": 512
}' \
-H 'Content-Type: application/json'
This model can be deployed to Amazon SageMaker Endtpoint with this guide, Deploy this model to SageMaker Endpoint
In order to do neuron-compilation and depoly in detail , you can refer to Amazon EC2 Inferentia2 ๊ธฐ๋ฐ ์์ ํ๊ตญ์ด ํ์ธ ํ๋ ๋ชจ๋ธ์ ์๋นํ๊ธฐ
Hardware
At a minimum hardware, you can use Amazon EC2 inf2.xlarge and more powerful family such as inf2.8xlarge, inf2.24xlarge and inf2.48xlarge. The detailed information is Amazon EC2 Inf2 Instances
Model Card Contact
Gonsoo Moon, gonsoomoon@gmail.com
- Downloads last month
- 0