JRosenkranz
commited on
Commit
•
2532bba
1
Parent(s):
924f16b
updated docker run
Browse files
README.md
CHANGED
@@ -7,6 +7,7 @@ To try this out running in a production-like environment, please use the pre-bui
|
|
7 |
```bash
|
8 |
docker pull docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
docker run -d --rm --gpus all \
|
9 |
--name my-tgis-server \
|
|
|
10 |
-v /path/to/all/models:/models \
|
11 |
-e MODEL_NAME=/models/model_weights/llama/13B-F \
|
12 |
-e SPECULATOR_PATH=/models/speculator_weights/llama/13B-F \
|
@@ -15,8 +16,17 @@ docker pull docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-loca
|
|
15 |
-e DTYPE_STR=float16 \
|
16 |
docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
|
17 |
|
|
|
18 |
docker logs my-tgis-server -f
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
```
|
21 |
|
22 |
To try this out with the fms-native compiled model, please execute the following:
|
|
|
7 |
```bash
|
8 |
docker pull docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
docker run -d --rm --gpus all \
|
9 |
--name my-tgis-server \
|
10 |
+
-p 8033:8033 \
|
11 |
-v /path/to/all/models:/models \
|
12 |
-e MODEL_NAME=/models/model_weights/llama/13B-F \
|
13 |
-e SPECULATOR_PATH=/models/speculator_weights/llama/13B-F \
|
|
|
16 |
-e DTYPE_STR=float16 \
|
17 |
docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
|
18 |
|
19 |
+
# check logs and wait for "gRPC server started on port 8033" and "HTTP server started on port 3000"
|
20 |
docker logs my-tgis-server -f
|
21 |
+
|
22 |
+
# get the client sample (Note: The first prompt will take longer as there is a warmup time)
|
23 |
+
conda create -n tgis-env python=3.11
|
24 |
+
conda activate tgis-env
|
25 |
+
git clone --branch speculative-decoding --single-branch https://github.com/tdoublep/text-generation-inference.git
|
26 |
+
cd text-generation-inference/integration_tests
|
27 |
+
make gen-client
|
28 |
+
pip install . --no-cache-dir
|
29 |
+
python sample_client.py
|
30 |
```
|
31 |
|
32 |
To try this out with the fms-native compiled model, please execute the following:
|