Update README.md
Browse files
README.md
CHANGED
@@ -38,39 +38,21 @@ pipeline_tag: text-generation
|
|
38 |
<img src="https://styles.redditmedia.com/t5_6075m3/styles/profileIcon_71syco7c5lt81.png?width=256&height=256&frame=1&auto=webp&crop=256:256,smart&s=24bd3c71dc11edc5d4f88d0cbc1da72ed7ae1969" alt="RunPod Logo" style="width:30px; vertical-align: middle; display: inline-block; margin-right: 5px; margin-left: 5px; margin-top: 0px; margin-bottom: 0px;"/>
|
39 |
</p>
|
40 |
|
41 |
-
<div style="background-color: white; padding: 0.7em; border-radius: 0.5em; color: black; display: flex; flex-direction: column; justify-content: center; text-align: center
|
42 |
<a href="https://huggingface.co/openchat/openchat-3.5-0106" style="text-decoration: none; color: black;">
|
43 |
-
<span style="font-size: 1.7em; font-family: 'Helvetica'; letter-spacing: 0.1em; font-weight: bold; color: black;">OPENCHAT</span><span style="font-size: 1.8em; font-family: 'Helvetica'; color: #3c72db; ">3.
|
44 |
-
<span style="font-size: 1.0em; font-family: 'Helvetica'; color: white; background-color: #
|
45 |
<span style="font-size: 0.85em; font-family: 'Helvetica'; color: black;">
|
46 |
-
<br> 🏆 The Overall Best Performing Open Source
|
47 |
-
|
48 |
-
<br> 🚀<span style="font-size: 1em; font-family: 'Helvetica'; color: black; font-weight: bold;">15</span>-point improvement in Coding over <span style="font-size: 0.9em;
|
49 |
-
font-family: 'Helvetica'; color: black; font-weight: bold;">OpenChat-3.5🚀</span>
|
50 |
-
<br><br><span style="font-size: 1em; font-family: 'Helvetica'; color: #3c72db; font-weight: bold;">New Features</span>
|
51 |
-
<br> 💡 2 Modes: Coding + Generalist, Mathematical Reasoning 💡
|
52 |
-
<br> 🧑⚖️ Experimental support for Evaluator and Feedback capabilities 🧑⚖️
|
53 |
</span>
|
54 |
</a>
|
55 |
</div>
|
56 |
|
57 |
<div style="display: flex; justify-content: center; align-items: center">
|
58 |
-
<img src="
|
59 |
</div>
|
60 |
|
61 |
-
|
62 |
-
<div>
|
63 |
-
<h3> Table of Contents</h3>
|
64 |
-
</div>
|
65 |
-
|
66 |
-
1. [Usage](#usage)
|
67 |
-
2. [Benchmarks](#benchmarks)
|
68 |
-
3. [Limitations](#limitations)
|
69 |
-
4. [License](#license)
|
70 |
-
6. [Citation](#citation)
|
71 |
-
7. [Acknowledgements](#acknowledgements)
|
72 |
-
|
73 |
-
|
74 |
<div align="center">
|
75 |
<h2> Usage </h2>
|
76 |
</div>
|
@@ -81,56 +63,35 @@ Once started, the server listens at `localhost:18888` for requests and is compat
|
|
81 |
|
82 |
If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.
|
83 |
|
84 |
-
| Model
|
85 |
-
|
86 |
-
| OpenChat-3.
|
87 |
|
88 |
<details>
|
89 |
<summary>Example request (click to expand)</summary>
|
90 |
|
91 |
-
💡 **Default Mode (GPT4 Correct)**: Best for coding, chat and general tasks
|
92 |
-
|
93 |
```bash
|
94 |
curl http://localhost:18888/v1/chat/completions \
|
95 |
-H "Content-Type: application/json" \
|
96 |
-d '{
|
97 |
-
"model": "openchat_3.
|
98 |
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
99 |
}'
|
100 |
```
|
101 |
|
102 |
-
🧮 **Mathematical Reasoning Mode**: Tailored for solving math problems
|
103 |
-
|
104 |
-
```bash
|
105 |
-
curl http://localhost:18888/v1/chat/completions \
|
106 |
-
-H "Content-Type: application/json" \
|
107 |
-
-d '{
|
108 |
-
"model": "openchat_3.5",
|
109 |
-
"condition": "Math Correct",
|
110 |
-
"messages": [{"role": "user", "content": "10.3 − 7988.8133 = "}]
|
111 |
-
}'
|
112 |
-
```
|
113 |
-
|
114 |
</details>
|
115 |
|
116 |
### Conversation templates
|
117 |
|
118 |
-
💡 **Default Mode
|
119 |
|
120 |
```
|
121 |
GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
|
122 |
```
|
123 |
|
124 |
-
🧮 **Mathematical Reasoning Mode**: Tailored for solving math problems
|
125 |
-
|
126 |
-
```
|
127 |
-
Math Correct User: 10.3 − 7988.8133=<|end_of_turn|>Math Correct Assistant:
|
128 |
-
```
|
129 |
-
|
130 |
⚠️ **Notice:** Remember to set `<|end_of_turn|>` as end of generation token.
|
131 |
|
132 |
-
The default
|
133 |
-
which can be used instead of manually specifying the template:
|
134 |
|
135 |
```python
|
136 |
messages = [
|
@@ -139,98 +100,7 @@ messages = [
|
|
139 |
{"role": "user", "content": "How are you today?"}
|
140 |
]
|
141 |
tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
|
142 |
-
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
|
143 |
-
```
|
144 |
-
|
145 |
-
<div align="center">
|
146 |
-
<h2> (Experimental) Evaluator / Feedback Capabilities </h2>
|
147 |
-
</div>
|
148 |
-
|
149 |
-
We've included evaluator capabilities in this release to advance open-source models as evaluators. You can use `Default Mode (GPT4 Correct)` with the following prompt (same as [Prometheus](https://huggingface.co/datasets/kaist-ai/Feedback-Collection)) to evaluate a response.
|
150 |
-
|
151 |
```
|
152 |
-
###Task Description:
|
153 |
-
An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.
|
154 |
-
1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.
|
155 |
-
2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric.
|
156 |
-
3. The output format should look as follows: "Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)"
|
157 |
-
4. Please do not generate any other opening, closing, and explanations.
|
158 |
-
|
159 |
-
###The instruction to evaluate:
|
160 |
-
{orig_instruction}
|
161 |
-
|
162 |
-
###Response to evaluate:
|
163 |
-
{orig_response}
|
164 |
-
|
165 |
-
###Reference Answer (Score 5):
|
166 |
-
{orig_reference_answer}
|
167 |
-
|
168 |
-
###Score Rubrics:
|
169 |
-
[{orig_criteria}]
|
170 |
-
Score 1: {orig_score1_description}
|
171 |
-
Score 2: {orig_score2_description}
|
172 |
-
Score 3: {orig_score3_description}
|
173 |
-
Score 4: {orig_score4_description}
|
174 |
-
Score 5: {orig_score5_description}
|
175 |
-
|
176 |
-
###Feedback:
|
177 |
-
```
|
178 |
-
<div align="center">
|
179 |
-
<h2> Benchmarks </h2>
|
180 |
-
</div>
|
181 |
-
|
182 |
-
| Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
|
183 |
-
|-----------------------|----------|----------|----------|-----------|----------|----------|------------|----------|----------|----------|
|
184 |
-
| **OpenChat-3.5-0106** | **7B** | **64.5** | 7.8 | **71.3** | **51.5** | **49.1** | 61.0 | 65.8 | **77.4** | 62.2 |
|
185 |
-
| OpenChat-3.5-1210 | **7B** | 63.8 | 7.76 | 68.9 | 49.5 | 48.0 | **61.8** | 65.3 | 77.3 | 61.8 |
|
186 |
-
| OpenChat-3.5 | **7B** | 61.6 | 7.81 | 55.5 | 47.6 | 47.4 | 59.1 | 64.3 | 77.3 | 63.5 |
|
187 |
-
| ChatGPT (March)* | ???B | 61.5 | **7.94** | 48.1 | 47.6 | 47.1 | 57.7 | **67.3** | 74.9 | **70.1** |
|
188 |
-
| | | | | | | | | | | |
|
189 |
-
| OpenHermes 2.5 | 7B | 59.3 | 7.54 | 48.2 | 49.4 | 46.5 | 57.5 | 63.8 | 73.5 | 59.9 |
|
190 |
-
| OpenOrca Mistral | 7B | 52.7 | 6.86 | 38.4 | 49.4 | 42.9 | 45.9 | 59.3 | 59.1 | 58.1 |
|
191 |
-
| Zephyr-β^ | 7B | 34.6 | 7.34 | 22.0 | 40.6 | 39.0 | 40.8 | 39.8 | 5.1 | 16.0 |
|
192 |
-
| Mistral | 7B | - | 6.84 | 30.5 | 39.0 | 38.0 | - | 60.1 | 52.2 | - |
|
193 |
-
|
194 |
-
<details>
|
195 |
-
<summary>Evaluation Details(click to expand)</summary>
|
196 |
-
|
197 |
-
*: ChatGPT (March) results are from [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), and our evaluation. Please note that ChatGPT is not a fixed baseline and evolves rapidly over time.
|
198 |
-
|
199 |
-
^: Zephyr-β often fails to follow few-shot CoT instructions, likely because it was aligned with only chat data but not trained on few-shot data.
|
200 |
-
|
201 |
-
**: Mistral and Open-source SOTA results are taken from reported results in instruction-tuned model papers and official repositories.
|
202 |
-
|
203 |
-
All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions in [our repository](https://github.com/imoneoi/openchat/#benchmarks).
|
204 |
-
|
205 |
-
|
206 |
-
</details>
|
207 |
-
<div>
|
208 |
-
<h3>HumanEval+</h3>
|
209 |
-
</div>
|
210 |
-
|
211 |
-
| Model | Size | HumanEval+ pass@1 |
|
212 |
-
|-----------------------------|--------|-------------------|
|
213 |
-
| **OpenChat-3.5-0106** | **7B** | **65.9** |
|
214 |
-
| ChatGPT (December 12, 2023) | ???B | 64.6 |
|
215 |
-
| WizardCoder-Python-34B-V1.0 | 34B | 64.6 |
|
216 |
-
| OpenChat 3.5 1210 | 7B | 63.4 |
|
217 |
-
| OpenHermes 2.5 | 7B | 41.5 |
|
218 |
-
|
219 |
-
<div>
|
220 |
-
<h3>OpenChat-3.5 vs. Grok</h3>
|
221 |
-
</div>
|
222 |
-
|
223 |
-
🔥 OpenChat-3.5-0106 (7B) now outperforms Grok-0 (33B) on **all 4 benchmarks** and Grok-1 (???B) on average and **3/4 benchmarks**.
|
224 |
-
|
225 |
-
| | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
|
226 |
-
|-----------------------|-------------|---------|----------|--------|-----------|----------|----------|
|
227 |
-
| **OpenChat-3.5-0106** | Apache-2.0 | **7B** | **61.0** | 65.8 | **71.3** | **29.3** | **77.4** |
|
228 |
-
| OpenChat-3.5-1210 | Apache-2.0 | **7B** | 60.1 | 65.3 | 68.9 | 28.9 | 77.3 |
|
229 |
-
| OpenChat-3.5 | Apache-2.0 | **7B** | 56.4 | 64.3 | 55.5 | 28.6 | 77.3 |
|
230 |
-
| Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
|
231 |
-
| Grok-1 | Proprietary | ???B | 55.8 | **73** | 63.2 | 23.9 | 62.9 |
|
232 |
-
|
233 |
-
*: Grok results are reported by [X.AI](https://x.ai/).
|
234 |
|
235 |
<div align="center">
|
236 |
<h2> Limitations </h2>
|
@@ -250,10 +120,14 @@ OpenChat may sometimes generate information that does not exist or is not accura
|
|
250 |
OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.
|
251 |
|
252 |
<div align="center">
|
253 |
-
<h2>
|
254 |
</div>
|
255 |
|
256 |
-
|
|
|
|
|
|
|
|
|
257 |
|
258 |
<div align="center">
|
259 |
<h2> Citation </h2>
|
@@ -266,14 +140,4 @@ Our OpenChat 3.5 code and models are distributed under the Apache License 2.0.
|
|
266 |
journal={arXiv preprint arXiv:2309.11235},
|
267 |
year={2023}
|
268 |
}
|
269 |
-
```
|
270 |
-
|
271 |
-
<div align="center">
|
272 |
-
<h2> 💌 Contact </h2>
|
273 |
-
</div>
|
274 |
-
|
275 |
-
We look forward to hearing you and collaborating on this exciting project!
|
276 |
-
|
277 |
-
**Project Lead:**
|
278 |
-
- Guan Wang [imonenext at gmail dot com]
|
279 |
-
- [Alpay Ariyak](https://github.com/alpayariyak) [aariyak at wpi dot edu]
|
|
|
38 |
<img src="https://styles.redditmedia.com/t5_6075m3/styles/profileIcon_71syco7c5lt81.png?width=256&height=256&frame=1&auto=webp&crop=256:256,smart&s=24bd3c71dc11edc5d4f88d0cbc1da72ed7ae1969" alt="RunPod Logo" style="width:30px; vertical-align: middle; display: inline-block; margin-right: 5px; margin-left: 5px; margin-top: 0px; margin-bottom: 0px;"/>
|
39 |
</p>
|
40 |
|
41 |
+
<div style="background-color: white; padding: 0.7em; border-radius: 0.5em; color: black; display: flex; flex-direction: column; justify-content: center; text-align: center">
|
42 |
<a href="https://huggingface.co/openchat/openchat-3.5-0106" style="text-decoration: none; color: black;">
|
43 |
+
<span style="font-size: 1.7em; font-family: 'Helvetica'; letter-spacing: 0.1em; font-weight: bold; color: black;">Llama 3 Version: OPENCHAT</span><span style="font-size: 1.8em; font-family: 'Helvetica'; color: #3c72db; ">3.6</span>
|
44 |
+
<span style="font-size: 1.0em; font-family: 'Helvetica'; color: white; background-color: #03045e; vertical-align: top; border-radius: 6em; padding: 0.066em 0.4em; letter-spacing: 0.1em; font-weight: bold;">20240522</span>
|
45 |
<span style="font-size: 0.85em; font-family: 'Helvetica'; color: black;">
|
46 |
+
<br> 🏆 The Overall Best Performing Open Source 8B Model 🏆
|
47 |
+
<br> 🚀 Outperforms Llama-3-8B-Instruct and open-source finetunes 🚀
|
|
|
|
|
|
|
|
|
|
|
48 |
</span>
|
49 |
</a>
|
50 |
</div>
|
51 |
|
52 |
<div style="display: flex; justify-content: center; align-items: center">
|
53 |
+
<img src="" style="width: 100%; border-radius: 1em">
|
54 |
</div>
|
55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
<div align="center">
|
57 |
<h2> Usage </h2>
|
58 |
</div>
|
|
|
63 |
|
64 |
If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.
|
65 |
|
66 |
+
| Model | Size | Context | Weights | Serving |
|
67 |
+
|-----------------------|------|---------|-------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
|
68 |
+
| OpenChat-3.6-20240522 | 8B | 8192 | [Huggingface](https://huggingface.co/openchat/openchat-3.6-8b-20240522) | `python -m ochat.serving.openai_api_server --model openchat/openchat-3.6-8b-20240522` |
|
69 |
|
70 |
<details>
|
71 |
<summary>Example request (click to expand)</summary>
|
72 |
|
|
|
|
|
73 |
```bash
|
74 |
curl http://localhost:18888/v1/chat/completions \
|
75 |
-H "Content-Type: application/json" \
|
76 |
-d '{
|
77 |
+
"model": "openchat_3.6",
|
78 |
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
79 |
}'
|
80 |
```
|
81 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
</details>
|
83 |
|
84 |
### Conversation templates
|
85 |
|
86 |
+
💡 **Default Mode**: Best for coding, chat and general tasks
|
87 |
|
88 |
```
|
89 |
GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
|
90 |
```
|
91 |
|
|
|
|
|
|
|
|
|
|
|
|
|
92 |
⚠️ **Notice:** Remember to set `<|end_of_turn|>` as end of generation token.
|
93 |
|
94 |
+
The default template is also available as the integrated `tokenizer.chat_template`, which can be used instead of manually specifying the template:
|
|
|
95 |
|
96 |
```python
|
97 |
messages = [
|
|
|
100 |
{"role": "user", "content": "How are you today?"}
|
101 |
]
|
102 |
tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
|
105 |
<div align="center">
|
106 |
<h2> Limitations </h2>
|
|
|
120 |
OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.
|
121 |
|
122 |
<div align="center">
|
123 |
+
<h2> 💌 Contact </h2>
|
124 |
</div>
|
125 |
|
126 |
+
We look forward to hearing you and collaborating on this exciting project!
|
127 |
+
|
128 |
+
**Project Lead:**
|
129 |
+
- Guan Wang [imonenext at gmail dot com]
|
130 |
+
- [Alpay Ariyak](https://github.com/alpayariyak) [aariyak at wpi dot edu]
|
131 |
|
132 |
<div align="center">
|
133 |
<h2> Citation </h2>
|
|
|
140 |
journal={arXiv preprint arXiv:2309.11235},
|
141 |
year={2023}
|
142 |
}
|
143 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|