bofenghuang
commited on
Commit
·
e6b5ed2
1
Parent(s):
0c6e02d
Update README.md
Browse files
README.md
CHANGED
@@ -28,18 +28,9 @@ All previous versions are accessible through branches.
|
|
28 |
- **V1.0**: Trained on 420K chat data.
|
29 |
- **V2.0**: Trained on 520K data. Check out our [release blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) for more details.
|
30 |
|
31 |
-
|
32 |
-
## Quantized Models
|
33 |
-
|
34 |
-
The quantized versions of this model are generously provided by [TheBloke](https://huggingface.co/TheBloke)!
|
35 |
-
|
36 |
-
- AWQ: [TheBloke/Vigogne-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-AWQ)
|
37 |
-
- GTPQ: [TheBloke/Vigogne-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-GPTQ)
|
38 |
-
- GGUF: [TheBloke/Vigogne-2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-GGUF)
|
39 |
-
|
40 |
## Prompt Template
|
41 |
|
42 |
-
We utilized prefix tokens `<user
|
43 |
|
44 |
You can apply this formatting using the [chat template](https://huggingface.co/docs/transformers/main/chat_templating) through the `apply_chat_template()` method.
|
45 |
|
@@ -73,6 +64,18 @@ You will get
|
|
73 |
|
74 |
## Usage
|
75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
```python
|
77 |
from typing import Dict, List, Optional
|
78 |
import torch
|
@@ -139,10 +142,51 @@ response, history = chat("Quand il peut dépasser le lapin ?", history=history)
|
|
139 |
response, history = chat("Écris une histoire imaginative qui met en scène une compétition de course entre un escargot et un lapin.", history=history)
|
140 |
```
|
141 |
|
142 |
-
You can also
|
143 |
|
144 |
<a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
|
145 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
146 |
## Limitations
|
147 |
|
148 |
Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.
|
|
|
28 |
- **V1.0**: Trained on 420K chat data.
|
29 |
- **V2.0**: Trained on 520K data. Check out our [release blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) for more details.
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
## Prompt Template
|
32 |
|
33 |
+
We utilized prefix tokens `<user>:` and `<assistant>:` to distinguish between user and assistant utterances.
|
34 |
|
35 |
You can apply this formatting using the [chat template](https://huggingface.co/docs/transformers/main/chat_templating) through the `apply_chat_template()` method.
|
36 |
|
|
|
64 |
|
65 |
## Usage
|
66 |
|
67 |
+
### Inference using the quantized versions
|
68 |
+
|
69 |
+
The quantized versions of this model are generously provided by [TheBloke](https://huggingface.co/TheBloke)!
|
70 |
+
|
71 |
+
- AWQ for GPU inference: [TheBloke/Vigogne-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-AWQ)
|
72 |
+
- GTPQ for GPU inference: [TheBloke/Vigogne-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-GPTQ)
|
73 |
+
- GGUF for CPU+GPU inference: [TheBloke/Vigogne-2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Vigogne-2-7B-Chat-GGUF)
|
74 |
+
|
75 |
+
These versions facilitate testing and development with various popular frameworks, including [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [vLLM](https://github.com/vllm-project/vllm), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), and more.
|
76 |
+
|
77 |
+
### Inference using the unquantized model with 🤗 Transformers
|
78 |
+
|
79 |
```python
|
80 |
from typing import Dict, List, Optional
|
81 |
import torch
|
|
|
142 |
response, history = chat("Écris une histoire imaginative qui met en scène une compétition de course entre un escargot et un lapin.", history=history)
|
143 |
```
|
144 |
|
145 |
+
You can also use the Google Colab Notebook provided below.
|
146 |
|
147 |
<a href="https://colab.research.google.com/github/bofenghuang/vigogne/blob/main/notebooks/infer_chat.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
|
148 |
|
149 |
+
### Inference using the unquantized model with vLLM
|
150 |
+
|
151 |
+
Set up an OpenAI-compatible server with the following command:
|
152 |
+
|
153 |
+
```bash
|
154 |
+
# Install vLLM
|
155 |
+
# This may take 5-10 minutes.
|
156 |
+
# pip install vllm
|
157 |
+
|
158 |
+
# Start server for Vigogne-Chat models
|
159 |
+
python -m vllm.entrypoints.openai.api_server --model bofenghuang/vigogne-2-7b-chat
|
160 |
+
|
161 |
+
# List models
|
162 |
+
# curl http://localhost:8000/v1/models
|
163 |
+
```
|
164 |
+
|
165 |
+
Query the model using the openai python package.
|
166 |
+
|
167 |
+
```python
|
168 |
+
import openai
|
169 |
+
|
170 |
+
# Modify OpenAI's API key and API base to use vLLM's API server.
|
171 |
+
openai.api_key = "EMPTY"
|
172 |
+
openai.api_base = "http://localhost:8000/v1"
|
173 |
+
|
174 |
+
# First model
|
175 |
+
models = openai.Model.list()
|
176 |
+
model = models["data"][0]["id"]
|
177 |
+
|
178 |
+
# Chat completion API
|
179 |
+
chat_completion = openai.ChatCompletion.create(
|
180 |
+
model=model,
|
181 |
+
messages=[
|
182 |
+
{"role": "user", "content": "Parle-moi de toi-même."},
|
183 |
+
],
|
184 |
+
max_tokens=1024,
|
185 |
+
temperature=0.7,
|
186 |
+
)
|
187 |
+
print("Chat completion results:", chat_completion)
|
188 |
+
```
|
189 |
+
|
190 |
## Limitations
|
191 |
|
192 |
Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.
|