unsubscribe
commited on
Commit
•
dcfaa25
1
Parent(s):
aaf74ae
add serving section in readme
Browse files
README.md
CHANGED
@@ -55,7 +55,34 @@ huggingface-cli download internlm/internlm2_5-7b-chat-gguf internlm2_5-7b-chat-f
|
|
55 |
|
56 |
You can use `llama-cli` for conducting inference. For a detailed explanation of `llama-cli`, please refer to [this guide](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
|
57 |
```shell
|
58 |
-
build/bin/llama-cli -m internlm2_5-7b-chat-fp16.gguf
|
59 |
```
|
60 |
|
61 |
-
## Serving
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
You can use `llama-cli` for conducting inference. For a detailed explanation of `llama-cli`, please refer to [this guide](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
|
57 |
```shell
|
58 |
+
build/bin/llama-cli -m internlm2_5-7b-chat-fp16.gguf -ngl 32
|
59 |
```
|
60 |
|
61 |
+
## Serving
|
62 |
+
|
63 |
+
`llama.cpp` provides an OpenAI API compatible server - `llama-server`. You can deploy `internlm2_5-7b-chat-fp16.gguf` into a service like this:
|
64 |
+
|
65 |
+
```shell
|
66 |
+
./build/bin/llama-server -m ./internlm2_5-7b-chat-fp16.gguf -ngl 32
|
67 |
+
```
|
68 |
+
|
69 |
+
At the client side, you can access the service through OpenAI API:
|
70 |
+
|
71 |
+
```python
|
72 |
+
from openai import OpenAI
|
73 |
+
client = OpenAI(
|
74 |
+
api_key='YOUR_API_KEY',
|
75 |
+
base_url='http://localhost:8080/v1'
|
76 |
+
)
|
77 |
+
model_name = client.models.list().data[0].id
|
78 |
+
response = client.chat.completions.create(
|
79 |
+
model=model_name,
|
80 |
+
messages=[
|
81 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
82 |
+
{"role": "user", "content": " provide three suggestions about time management"},
|
83 |
+
],
|
84 |
+
temperature=0.8,
|
85 |
+
top_p=0.8
|
86 |
+
)
|
87 |
+
print(response)
|
88 |
+
```
|