RichardErkhov commited on
Commit
013a7fc
·
verified ·
1 Parent(s): d65b92d

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +389 -0
README.md ADDED
@@ -0,0 +1,389 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ laser-dolphin-mixtral-2x7b-dpo - bnb 8bits
11
+ - Model creator: https://huggingface.co/macadeliccc/
12
+ - Original model: https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: apache-2.0
20
+ library_name: transformers
21
+ model-index:
22
+ - name: laser-dolphin-mixtral-2x7b-dpo
23
+ results:
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: AI2 Reasoning Challenge (25-Shot)
29
+ type: ai2_arc
30
+ config: ARC-Challenge
31
+ split: test
32
+ args:
33
+ num_few_shot: 25
34
+ metrics:
35
+ - type: acc_norm
36
+ value: 65.96
37
+ name: normalized accuracy
38
+ source:
39
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-dolphin-mixtral-2x7b-dpo
40
+ name: Open LLM Leaderboard
41
+ - task:
42
+ type: text-generation
43
+ name: Text Generation
44
+ dataset:
45
+ name: HellaSwag (10-Shot)
46
+ type: hellaswag
47
+ split: validation
48
+ args:
49
+ num_few_shot: 10
50
+ metrics:
51
+ - type: acc_norm
52
+ value: 85.8
53
+ name: normalized accuracy
54
+ source:
55
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-dolphin-mixtral-2x7b-dpo
56
+ name: Open LLM Leaderboard
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: MMLU (5-Shot)
62
+ type: cais/mmlu
63
+ config: all
64
+ split: test
65
+ args:
66
+ num_few_shot: 5
67
+ metrics:
68
+ - type: acc
69
+ value: 63.17
70
+ name: accuracy
71
+ source:
72
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-dolphin-mixtral-2x7b-dpo
73
+ name: Open LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: TruthfulQA (0-shot)
79
+ type: truthful_qa
80
+ config: multiple_choice
81
+ split: validation
82
+ args:
83
+ num_few_shot: 0
84
+ metrics:
85
+ - type: mc2
86
+ value: 60.76
87
+ source:
88
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-dolphin-mixtral-2x7b-dpo
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: Winogrande (5-shot)
95
+ type: winogrande
96
+ config: winogrande_xl
97
+ split: validation
98
+ args:
99
+ num_few_shot: 5
100
+ metrics:
101
+ - type: acc
102
+ value: 79.01
103
+ name: accuracy
104
+ source:
105
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-dolphin-mixtral-2x7b-dpo
106
+ name: Open LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: GSM8k (5-shot)
112
+ type: gsm8k
113
+ config: main
114
+ split: test
115
+ args:
116
+ num_few_shot: 5
117
+ metrics:
118
+ - type: acc
119
+ value: 48.29
120
+ name: accuracy
121
+ source:
122
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-dolphin-mixtral-2x7b-dpo
123
+ name: Open LLM Leaderboard
124
+ ---
125
+ # Laser-Dolphin-Mixtral-2x7b-dpo
126
+
127
+ ![laser_dolphin_image](./dolphin_moe.png)
128
+
129
+ **New Version out now!**
130
+
131
+ Credit to Fernando Fernandes and Eric Hartford for their project [laserRMT](https://github.com/cognitivecomputations/laserRMT)
132
+
133
+ ## Overview
134
+
135
+ This model is a medium-sized MoE implementation based on [cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser](https://huggingface.co/cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser)
136
+
137
+ + The new version shows ~1 point increase in evaluation performance on average.
138
+
139
+ ## Process
140
+
141
+ + The process is outlined in this [notebook](https://github.com/cognitivecomputations/laserRMT/blob/main/examples/laser-dolphin-mixtral-2x7b.ipynb)
142
+
143
+ + The mergekit_config is in the files.
144
+
145
+ + The models used in the configuration are not lasered, but the final product is. This is an update from the last version.
146
+
147
+ + This process is experimental. Your mileage may vary.
148
+
149
+ ## Future Goals
150
+
151
+ + [ ] Function Calling
152
+ + [ ] v2 with new base model to improve performance
153
+
154
+ ## Quantizations
155
+
156
+ ### ExLlamav2
157
+
158
+ _These are the recommended quantizations for users that are running the model on GPU_
159
+
160
+ Thanks to user [bartowski](https://huggingface.co/bartowski) we now have exllamav2 quantizations in 3.5 through 8 bpw. They are available here:
161
+
162
+ + [bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2)
163
+
164
+ | Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
165
+ | ----- | ---- | ------- | ------ | ------ | ------ | ------------ |
166
+ | [8_0](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/8_0) | 8.0 | 8.0 | 13.7 GB | 15.1 GB | 17.2 GB | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
167
+ | [6_5](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/6_5) | 6.5 | 8.0 | 11.5 GB | 12.9 GB | 15.0 GB | Near unquantized performance at vastly reduced size, **recommended**. |
168
+ | [5_0](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/5_0) | 5.0 | 6.0 | 9.3 GB | 10.7 GB | 12.8 GB | Slightly lower quality vs 6.5, great for 12gb cards with 16k context. |
169
+ | [4_25](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/4_25) | 4.25 | 6.0 | 8.2 GB | 9.6 GB | 11.7 GB | GPTQ equivalent bits per weight. |
170
+ | [3_5](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/3_5) | 3.5 | 6.0 | 7.0 GB | 8.4 GB | 10.5 GB | Lower quality, not recommended. |
171
+
172
+ His quantizations represent the first ~13B model with GQA support. Check out his repo for more information!
173
+
174
+ ### GGUF
175
+
176
+ *Current GGUF [Quantizations](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo-GGUF)*
177
+
178
+ ### AWQ
179
+
180
+ *Current AWQ [Quantizations](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo-AWQ)
181
+
182
+ ### TheBloke
183
+
184
+ **These Quants will result in unpredicted behavior. New quants are available as I have updated the model**
185
+
186
+ Quatizations provided by [TheBloke](https://huggingface.co/TheBloke/laser-dolphin-mixtral-2x7b-dpo-GGUF)
187
+
188
+ ## HF Spaces
189
+ + GGUF chat available [here](https://huggingface.co/spaces/macadeliccc/laser-dolphin-mixtral-chat-GGUF)
190
+ + 4-bit bnb chat available [here](https://huggingface.co/spaces/macadeliccc/laser-dolphin-mixtral-chat)
191
+
192
+ # Ollama
193
+
194
+ ```bash
195
+ ollama run macadeliccc/laser-dolphin-mixtral-2x7b-dpo
196
+ ```
197
+
198
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/oVwa7Dwkt00tk8_MtlJdR.png)
199
+
200
+ ## Code Example
201
+ Switch the commented model definition to use in 4-bit. Should work with 9GB and still exceed the single 7B model by 5-6 points roughly
202
+
203
+ ```python
204
+ from transformers import AutoModelForCausalLM, AutoTokenizer
205
+
206
+ def generate_response(prompt):
207
+ """
208
+ Generate a response from the model based on the input prompt.
209
+
210
+ Args:
211
+ prompt (str): Prompt for the model.
212
+
213
+ Returns:
214
+ str: The generated response from the model.
215
+ """
216
+ # Tokenize the input prompt
217
+ inputs = tokenizer(prompt, return_tensors="pt")
218
+
219
+ # Generate output tokens
220
+ outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
221
+
222
+ # Decode the generated tokens to a string
223
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
224
+
225
+ return response
226
+
227
+ # Load the model and tokenizer
228
+ model_id = "macadeliccc/laser-dolphin-mixtral-2x7b-dpo"
229
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
230
+ model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
231
+
232
+ prompt = "Write a quicksort algorithm in python"
233
+
234
+ # Generate and print responses for each language
235
+ print("Response:")
236
+ print(generate_response(prompt), "\n")
237
+ ```
238
+
239
+ [colab](https://colab.research.google.com/drive/1cmRhAkDWItV7utHNqNANVZnqDqQNsTUr?usp=sharing) with usage example
240
+
241
+ ## Eval
242
+
243
+ ## EQ Bench
244
+
245
+ <pre>----Benchmark Complete----
246
+ 2024-01-31 16:55:37
247
+ Time taken: 31.1 mins
248
+ Prompt Format: ChatML
249
+ Model: macadeliccc/laser-dolphin-mixtral-2x7b-dpo-GGUF
250
+ Score (v2): 72.76
251
+ Parseable: 171.0
252
+ ---------------
253
+ Batch completed
254
+ Time taken: 31.2 mins
255
+ ---------------
256
+ </pre>
257
+
258
+
259
+
260
+ evaluation [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk?usp=sharing)
261
+ ## Summary of previous evaluation
262
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
263
+ |---------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
264
+ |[laser-dolphin-mixtral-2x7b-dpo](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo)| 41.31| 73.67| 61.69| 42.79| 54.87|
265
+
266
+ ## Detailed current evaluation
267
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
268
+ |---------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
269
+ |[laser-dolphin-mixtral-2x7b-dpo](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo)| 42.25| 73.45| 63.44| 43.96| 55.77|
270
+
271
+ ### AGIEval
272
+ | Task |Version| Metric |Value| |Stderr|
273
+ |------------------------------|------:|--------|----:|---|-----:|
274
+ |agieval_aqua_rat | 0|acc |21.26|± | 2.57|
275
+ | | |acc_norm|21.65|± | 2.59|
276
+ |agieval_logiqa_en | 0|acc |34.72|± | 1.87|
277
+ | | |acc_norm|35.64|± | 1.88|
278
+ |agieval_lsat_ar | 0|acc |26.96|± | 2.93|
279
+ | | |acc_norm|26.96|± | 2.93|
280
+ |agieval_lsat_lr | 0|acc |45.88|± | 2.21|
281
+ | | |acc_norm|46.08|± | 2.21|
282
+ |agieval_lsat_rc | 0|acc |59.48|± | 3.00|
283
+ | | |acc_norm|59.48|± | 3.00|
284
+ |agieval_sat_en | 0|acc |73.79|± | 3.07|
285
+ | | |acc_norm|73.79|± | 3.07|
286
+ |agieval_sat_en_without_passage| 0|acc |42.23|± | 3.45|
287
+ | | |acc_norm|41.26|± | 3.44|
288
+ |agieval_sat_math | 0|acc |37.27|± | 3.27|
289
+ | | |acc_norm|33.18|± | 3.18|
290
+
291
+ Average: 42.25%
292
+
293
+ ### GPT4All
294
+ | Task |Version| Metric |Value| |Stderr|
295
+ |-------------|------:|--------|----:|---|-----:|
296
+ |arc_challenge| 0|acc |58.36|± | 1.44|
297
+ | | |acc_norm|58.02|± | 1.44|
298
+ |arc_easy | 0|acc |82.20|± | 0.78|
299
+ | | |acc_norm|77.40|± | 0.86|
300
+ |boolq | 1|acc |87.52|± | 0.58|
301
+ |hellaswag | 0|acc |67.50|± | 0.47|
302
+ | | |acc_norm|84.43|± | 0.36|
303
+ |openbookqa | 0|acc |34.40|± | 2.13|
304
+ | | |acc_norm|47.00|± | 2.23|
305
+ |piqa | 0|acc |81.61|± | 0.90|
306
+ | | |acc_norm|82.59|± | 0.88|
307
+ |winogrande | 0|acc |77.19|± | 1.18|
308
+
309
+
310
+ Average: 73.45%
311
+
312
+ ### GSM8K
313
+ |Task |Version| Metric |Value| |Stderr|
314
+ |-----|------:|-----------------------------|-----|---|------|
315
+ |gsm8k| 2|exact_match,get-answer | 0.75| | |
316
+ | | |exact_match_stderr,get-answer| 0.01| | |
317
+ | | |alias |gsm8k| | |
318
+
319
+ ### TruthfulQA
320
+ | Task |Version|Metric|Value| |Stderr|
321
+ |-------------|------:|------|----:|---|-----:|
322
+ |truthfulqa_mc| 1|mc1 |45.90|± | 1.74|
323
+ | | |mc2 |63.44|± | 1.56|
324
+
325
+ Average: 63.44%
326
+
327
+ ### Bigbench
328
+ | Task |Version| Metric |Value| |Stderr|
329
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
330
+ |bigbench_causal_judgement | 0|multiple_choice_grade|58.42|± | 3.59|
331
+ |bigbench_date_understanding | 0|multiple_choice_grade|60.70|± | 2.55|
332
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|38.37|± | 3.03|
333
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|21.73|± | 2.18|
334
+ | | |exact_str_match | 0.00|± | 0.00|
335
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|35.00|± | 2.14|
336
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|23.57|± | 1.61|
337
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|50.33|± | 2.89|
338
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|45.00|± | 2.23|
339
+ |bigbench_navigate | 0|multiple_choice_grade|50.00|± | 1.58|
340
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|60.35|± | 1.09|
341
+ |bigbench_ruin_names | 0|multiple_choice_grade|51.12|± | 2.36|
342
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|32.26|± | 1.48|
343
+ |bigbench_snarks | 0|multiple_choice_grade|67.96|± | 3.48|
344
+ |bigbench_sports_understanding | 0|multiple_choice_grade|70.59|± | 1.45|
345
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|35.80|± | 1.52|
346
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.56|± | 1.18|
347
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|17.20|± | 0.90|
348
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|50.33|± | 2.89|
349
+
350
+ Average: 43.96%
351
+
352
+ Average score: 55.77%
353
+
354
+ Elapsed time: 02:43:45
355
+ ## Citations
356
+
357
+ Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.
358
+
359
+ ```bibtex
360
+ @article{sharma2023truth,
361
+ title={The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction},
362
+ author={Sharma, Pratyusha and Ash, Jordan T and Misra, Dipendra},
363
+ journal={arXiv preprint arXiv:2312.13558},
364
+ year={2023} }
365
+ ```
366
+
367
+ ```bibtex
368
+ @article{gao2021framework,
369
+ title={A framework for few-shot language model evaluation},
370
+ author={Gao, Leo and Tow, Jonathan and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and McDonell, Kyle and Muennighoff, Niklas and others},
371
+ journal={Version v0. 0.1. Sept},
372
+ year={2021}
373
+ }
374
+ ```
375
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
376
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_macadeliccc__laser-dolphin-mixtral-2x7b-dpo)
377
+
378
+ | Metric |Value|
379
+ |---------------------------------|----:|
380
+ |Avg. |67.16|
381
+ |AI2 Reasoning Challenge (25-Shot)|65.96|
382
+ |HellaSwag (10-Shot) |85.80|
383
+ |MMLU (5-Shot) |63.17|
384
+ |TruthfulQA (0-shot) |60.76|
385
+ |Winogrande (5-shot) |79.01|
386
+ |GSM8k (5-shot) |48.29|
387
+
388
+
389
+