FriendliAI
/

Llama-2-7b-chat-hf-fp8

Text Generation

text-generation-inference

8-bit precision

Model card Files Files and versions Community

yunmorning commited on Apr 18

Commit

9c2aa97

•

1 Parent(s): 6342838

Update docker run command

Files changed (1) hide show

README.md +9 -18

README.md CHANGED Viewed

@@ -88,24 +88,15 @@ You should pass the container secret as an environment variable to run the conta
 Once you've prepared the image of Friendli Container, you can launch it to create a serving endpoint.
 ```sh
-export MODEL_DIR=$PWD/FriendliAI--Llama-2-7b-chat-hf-fp8
-export FRIENDLI_CONTAINER_SECRET="YOUR CONTAINER SECRET"
-export FRIENDLI_CONTAINER_IMAGE="registry.friendli.ai/trial"
-export GPU_ENUMERATION='"device=0"'
-huggingface-cli download FriendliAI/Llama-2-7b-chat-hf-fp8 \
-  --local-dir $MODEL_DIR \
-  --local-dir-use-symlinks False
 docker run \
-  --gpus $GPU_ENUMERATION --network=host --ipc=host \
-  -v $MODEL_DIR:/model \
-  -e FRIENDLI_CONTAINER_SECRET=$FRIENDLI_CONTAINER_SECRET \
-  $FRIENDLI_CONTAINER_IMAGE /bin/bash -c \
-  "/root/launcher \
-    --web-server-port 6000 \
-    --ckpt-path /model \
-    --ckpt-type hf_safetensors"
 ```
 ---
@@ -145,7 +136,7 @@ Meta developed and publicly released the Llama 2 family of large language models
 **License** A custom commercial license is available at: [https://ai.meta.com/resources/models-and-libraries/llama-downloads/](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
-**Research Paper** ["Llama-2: Open Foundation and Fine-tuned Chat Models"](arxiv.org/abs/2307.09288)
 ## Intended Use
 **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.

 Once you've prepared the image of Friendli Container, you can launch it to create a serving endpoint.
 ```sh
 docker run \
+  --gpus '"device=0"' \
+  -p 8000:8000 \
+  -v ~/.cache/huggingface:/root/.cache/huggingface \
+  -e FRIENDLI_CONTAINER_SECRET="YOUR CONTAINER SECRET" \
+  -e HF_TOKEN="YOUR HUGGING FACE TOKEN" \
+  registry.friendli.ai/trial \
+    --web-server-port 8000 \
+    --hf-model-name meta-llama/Llama-2-7b-chat-hf-fp8
 ```
 ---
 **License** A custom commercial license is available at: [https://ai.meta.com/resources/models-and-libraries/llama-downloads/](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
+**Research Paper** ["Llama-2: Open Foundation and Fine-tuned Chat Models"](https://arxiv.org/abs/2307.09288)
 ## Intended Use
 **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.