JRosenkranz
commited on
Commit
•
7a6c813
1
Parent(s):
5666056
Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@ Training is light-weight and can be completed in only a few days depending on ba
|
|
33 |
|
34 |
_Note: For all samples, your environment must have access to cuda_
|
35 |
|
36 |
-
### Production
|
37 |
|
38 |
*To try this out running in a production-like environment, please use the pre-built docker image:*
|
39 |
|
@@ -101,6 +101,28 @@ python sample_client.py
|
|
101 |
|
102 |
_Note: first prompt may be slower as there is a slight warmup time_
|
103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
### Minimal Sample
|
105 |
|
106 |
#### Install
|
|
|
33 |
|
34 |
_Note: For all samples, your environment must have access to cuda_
|
35 |
|
36 |
+
### Use in IBM Production TGIS
|
37 |
|
38 |
*To try this out running in a production-like environment, please use the pre-built docker image:*
|
39 |
|
|
|
101 |
|
102 |
_Note: first prompt may be slower as there is a slight warmup time_
|
103 |
|
104 |
+
### Use in Huggingface TGI
|
105 |
+
|
106 |
+
#### start the server
|
107 |
+
|
108 |
+
```bash
|
109 |
+
model=ibm-fms/llama3-8b-accelerator
|
110 |
+
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
111 |
+
|
112 |
+
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model
|
113 |
+
```
|
114 |
+
|
115 |
+
_note: for tensor parallel, add --num-shard_
|
116 |
+
|
117 |
+
#### make a request
|
118 |
+
|
119 |
+
```bash
|
120 |
+
curl 127.0.0.1:8080/generate_stream \
|
121 |
+
-X POST \
|
122 |
+
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
123 |
+
-H 'Content-Type: application/json'
|
124 |
+
```
|
125 |
+
|
126 |
### Minimal Sample
|
127 |
|
128 |
#### Install
|