TheBloke commited on
Commit
0a8f96b
·
1 Parent(s): 3d425ab

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +341 -0
README.md ADDED
@@ -0,0 +1,341 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: https://huggingface.co/openaccess-ai-collective/hippogriff-30b-chat
3
+ datasets:
4
+ - QingyiSi/Alpaca-CoT
5
+ - teknium/GPT4-LLM-Cleaned
6
+ - teknium/GPTeacher-General-Instruct
7
+ - metaeval/ScienceQA_text_only
8
+ - hellaswag
9
+ - openai/summarize_from_feedback
10
+ - riddle_sense
11
+ - gsm8k
12
+ - OpenAssistant/oasst1
13
+ inference: false
14
+ language:
15
+ - en
16
+ library_name: transformers
17
+ license: other
18
+ model_creator: Open Access AI Collective
19
+ model_name: Hippogriff 30B Chat
20
+ model_type: llama
21
+ pipeline_tag: text-generation
22
+ prompt_template: 'A chat between a curious user and an artificial intelligence assistant.
23
+ The assistant gives helpful, detailed, and polite answers to the user''s questions.
24
+ USER: {prompt} ASSISTANT:
25
+
26
+ '
27
+ quantized_by: TheBloke
28
+ ---
29
+
30
+ <!-- header start -->
31
+ <!-- 200823 -->
32
+ <div style="width: auto; margin-left: auto; margin-right: auto">
33
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
34
+ </div>
35
+ <div style="display: flex; justify-content: space-between; width: 100%;">
36
+ <div style="display: flex; flex-direction: column; align-items: flex-start;">
37
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
38
+ </div>
39
+ <div style="display: flex; flex-direction: column; align-items: flex-end;">
40
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
41
+ </div>
42
+ </div>
43
+ <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
44
+ <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
45
+ <!-- header end -->
46
+
47
+ # Hippogriff 30B Chat - AWQ
48
+ - Model creator: [Open Access AI Collective](https://huggingface.co/openaccess-ai-collective)
49
+ - Original model: [Hippogriff 30B Chat](https://huggingface.co/openaccess-ai-collective/hippogriff-30b-chat)
50
+
51
+ <!-- description start -->
52
+ ## Description
53
+
54
+ This repo contains AWQ model files for [OpenAccess AI Collective's Hippogriff 30B Chat](https://huggingface.co/openaccess-ai-collective/hippogriff-30b-chat).
55
+
56
+
57
+ ### About AWQ
58
+
59
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.
60
+
61
+ It is also now supported by continuous batching server [vLLM](https://github.com/vllm-project/vllm), allowing use of AWQ models for high-throughput concurrent inference in multi-user server scenarios. Note that, at the time of writing, overall throughput is still lower than running vLLM with unquantised models, however using AWQ enables using much smaller GPUs which can lead to easier deployment and overall cost savings. For example, a 70B model can be run on 1 x 48GB GPU instead of 2 x 80GB.
62
+ <!-- description end -->
63
+ <!-- repositories-available start -->
64
+ ## Repositories available
65
+
66
+ * [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/hippogriff-30b-chat-AWQ)
67
+ * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/hippogriff-30b-chat-GPTQ)
68
+ * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/hippogriff-30b-chat-GGUF)
69
+ * [Open Access AI Collective's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/openaccess-ai-collective/hippogriff-30b-chat)
70
+ <!-- repositories-available end -->
71
+
72
+ <!-- prompt-template start -->
73
+ ## Prompt template: Vicuna
74
+
75
+ ```
76
+ A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:
77
+
78
+ ```
79
+
80
+ <!-- prompt-template end -->
81
+
82
+
83
+ <!-- README_AWQ.md-provided-files start -->
84
+ ## Provided files and AWQ parameters
85
+
86
+ For my first release of AWQ models, I am releasing 128g models only. I will consider adding 32g as well if there is interest, and once I have done perplexity and evaluation comparisons, but at this time 32g models are still not fully tested with AutoAWQ and vLLM.
87
+
88
+ Models are released as sharded safetensors files.
89
+
90
+ | Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
91
+ | ------ | ---- | -- | ----------- | ------- | ---- |
92
+ | [main](https://huggingface.co/TheBloke/hippogriff-30b-chat-AWQ/tree/main) | 4 | 128 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 17.53 GB
93
+
94
+ <!-- README_AWQ.md-provided-files end -->
95
+
96
+ <!-- README_AWQ.md-use-from-vllm start -->
97
+ ## Serving this model from vLLM
98
+
99
+ Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
100
+
101
+ - When using vLLM as a server, pass the `--quantization awq` parameter, for example:
102
+
103
+ ```shell
104
+ python3 python -m vllm.entrypoints.api_server --model TheBloke/hippogriff-30b-chat-AWQ --quantization awq
105
+ ```
106
+
107
+ When using vLLM from Python code, pass the `quantization=awq` parameter, for example:
108
+
109
+ ```python
110
+ from vllm import LLM, SamplingParams
111
+
112
+ prompts = [
113
+ "Hello, my name is",
114
+ "The president of the United States is",
115
+ "The capital of France is",
116
+ "The future of AI is",
117
+ ]
118
+ sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
119
+
120
+ llm = LLM(model="TheBloke/hippogriff-30b-chat-AWQ", quantization="awq")
121
+
122
+ outputs = llm.generate(prompts, sampling_params)
123
+
124
+ # Print the outputs.
125
+ for output in outputs:
126
+ prompt = output.prompt
127
+ generated_text = output.outputs[0].text
128
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
129
+ ```
130
+ <!-- README_AWQ.md-use-from-vllm start -->
131
+
132
+ <!-- README_AWQ.md-use-from-python start -->
133
+ ## How to use this AWQ model from Python code
134
+
135
+ ### Install the necessary packages
136
+
137
+ Requires: [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) 0.0.2 or later
138
+
139
+ ```shell
140
+ pip3 install autoawq
141
+ ```
142
+
143
+ If you have problems installing [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) using the pre-built wheels, install it from source instead:
144
+
145
+ ```shell
146
+ pip3 uninstall -y autoawq
147
+ git clone https://github.com/casper-hansen/AutoAWQ
148
+ cd AutoAWQ
149
+ pip3 install .
150
+ ```
151
+
152
+ ### You can then try the following example code
153
+
154
+ ```python
155
+ from awq import AutoAWQForCausalLM
156
+ from transformers import AutoTokenizer
157
+
158
+ model_name_or_path = "TheBloke/hippogriff-30b-chat-AWQ"
159
+
160
+ # Load model
161
+ model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
162
+ trust_remote_code=False, safetensors=True)
163
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)
164
+
165
+ prompt = "Tell me about AI"
166
+ prompt_template=f'''A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:
167
+
168
+ '''
169
+
170
+ print("\n\n*** Generate:")
171
+
172
+ tokens = tokenizer(
173
+ prompt_template,
174
+ return_tensors='pt'
175
+ ).input_ids.cuda()
176
+
177
+ # Generate output
178
+ generation_output = model.generate(
179
+ tokens,
180
+ do_sample=True,
181
+ temperature=0.7,
182
+ top_p=0.95,
183
+ top_k=40,
184
+ max_new_tokens=512
185
+ )
186
+
187
+ print("Output: ", tokenizer.decode(generation_output[0]))
188
+
189
+ # Inference can also be done using transformers' pipeline
190
+ from transformers import pipeline
191
+
192
+ print("*** Pipeline:")
193
+ pipe = pipeline(
194
+ "text-generation",
195
+ model=model,
196
+ tokenizer=tokenizer,
197
+ max_new_tokens=512,
198
+ do_sample=True,
199
+ temperature=0.7,
200
+ top_p=0.95,
201
+ top_k=40,
202
+ repetition_penalty=1.1
203
+ )
204
+
205
+ print(pipe(prompt_template)[0]['generated_text'])
206
+ ```
207
+ <!-- README_AWQ.md-use-from-python end -->
208
+
209
+ <!-- README_AWQ.md-compatibility start -->
210
+ ## Compatibility
211
+
212
+ The files provided are tested to work with [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), and [vLLM](https://github.com/vllm-project/vllm).
213
+
214
+ [Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) is not yet compatible with AWQ, but a PR is open which should bring support soon: [TGI PR #781](https://github.com/huggingface/text-generation-inference/issues/781).
215
+ <!-- README_AWQ.md-compatibility end -->
216
+
217
+ <!-- footer start -->
218
+ <!-- 200823 -->
219
+ ## Discord
220
+
221
+ For further support, and discussions on these models and AI in general, join us at:
222
+
223
+ [TheBloke AI's Discord server](https://discord.gg/theblokeai)
224
+
225
+ ## Thanks, and how to contribute
226
+
227
+ Thanks to the [chirper.ai](https://chirper.ai) team!
228
+
229
+ Thanks to Clay from [gpus.llm-utils.org](llm-utils)!
230
+
231
+ I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
232
+
233
+ If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
234
+
235
+ Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
236
+
237
+ * Patreon: https://patreon.com/TheBlokeAI
238
+ * Ko-Fi: https://ko-fi.com/TheBlokeAI
239
+
240
+ **Special thanks to**: Aemon Algiz.
241
+
242
+ **Patreon special mentions**: Alicia Loh, Stephen Murray, K, Ajan Kanaga, RoA, Magnesian, Deo Leter, Olakabola, Eugene Pentland, zynix, Deep Realms, Raymond Fosdick, Elijah Stavena, Iucharbius, Erik Bjäreholt, Luis Javier Navarrete Lozano, Nicholas, theTransient, John Detwiler, alfie_i, knownsqashed, Mano Prime, Willem Michiel, Enrico Ros, LangChain4j, OG, Michael Dempsey, Pierre Kircher, Pedro Madruga, James Bentley, Thomas Belote, Luke @flexchar, Leonard Tan, Johann-Peter Hartmann, Illia Dulskyi, Fen Risland, Chadd, S_X, Jeff Scroggin, Ken Nordquist, Sean Connelly, Artur Olbinski, Swaroop Kallakuri, Jack West, Ai Maven, David Ziegler, Russ Johnson, transmissions 11, John Villwock, Alps Aficionado, Clay Pascal, Viktor Bowallius, Subspace Studios, Rainer Wilmers, Trenton Dambrowitz, vamX, Michael Levine, 준교 김, Brandon Frisco, Kalila, Trailburnt, Randy H, Talal Aujan, Nathan Dryer, Vadim, 阿明, ReadyPlayerEmma, Tiffany J. Kim, George Stoitzev, Spencer Kim, Jerry Meng, Gabriel Tamborski, Cory Kujawski, Jeffrey Morgan, Spiking Neurons AB, Edmond Seymore, Alexandros Triantafyllidis, Lone Striker, Cap'n Zoog, Nikolai Manek, danny, ya boyyy, Derek Yates, usrbinkat, Mandus, TL, Nathan LeClaire, subjectnull, Imad Khwaja, webtim, Raven Klaugh, Asp the Wyvern, Gabriel Puliatti, Caitlyn Gatomon, Joseph William Delisle, Jonathan Leane, Luke Pendergrass, SuperWojo, Sebastain Graf, Will Dee, Fred von Graf, Andrey, Dan Guido, Daniel P. Andersen, Nitin Borwankar, Elle, Vitor Caleffi, biorpg, jjj, NimbleBox.ai, Pieter, Matthew Berman, terasurfer, Michael Davis, Alex, Stanislav Ovsiannikov
243
+
244
+
245
+ Thank you to all my generous patrons and donaters!
246
+
247
+ And thank you again to a16z for their generous grant.
248
+
249
+ <!-- footer end -->
250
+
251
+ # Original model card: OpenAccess AI Collective's Hippogriff 30B Chat
252
+
253
+
254
+ # Hippogriff 30B Chat
255
+
256
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
257
+
258
+
259
+ Hippogriff 30B Chat is an experiment that builds on Manticore with new datasets, while removing a few more instruction and chat datasets. It also includes a de-duped subset of the Pygmalion dataset. It also removes all Alpaca style prompts using `###` in favor of
260
+ chat only style prompts using `USER:`,`ASSISTANT:` as well as [pygmalion/metharme prompting](https://huggingface.co/PygmalionAI/metharme-7b#prompting) using `<|system|>, <|user|> and <|model|>` tokens.
261
+
262
+ Questions, comments, feedback, looking to donate, or want to help? Reach out on our [Discord](https://discord.gg/PugNNHAF5r) or email [wing@openaccessaicollective.org](mailto:wing@openaccessaicollective.org)
263
+
264
+ # Training Datasets
265
+
266
+ Hippogriff 30B Chat is a Llama 30B model fine-tuned on the following datasets
267
+
268
+ - OpenAssistant/oasst1 - cleaned dataset, similar to Guanaco
269
+ - synthetic jokes generation and explanation derived from reddit jokes dataset
270
+ - synthetic prose generation and rewriting self-chat
271
+ - Q&A based on provided context
272
+ - self instruct augmented logic_inference_oa
273
+ - de-duped pygmalion dataset, filtered down to RP data, cleaned, english only, 25%
274
+ - [riddle_sense](https://huggingface.co/datasets/riddle_sense) - instruct augmented
275
+ - hellaswag, updated for detailed explanations w 30K+ rows
276
+ - [gsm8k](https://huggingface.co/datasets/gsm8k) - instruct augmented
277
+ - [ewof/code-alpaca-instruct-unfiltered](https://huggingface.co/datasets/ewof/code-alpaca-instruct-unfiltered) synthetic self chat dataset derived from about 1000 rows
278
+ - [subset of QingyiSi/Alpaca-CoT for roleplay and CoT](https://huggingface.co/QingyiSi/Alpaca-CoT)
279
+ - [GPTeacher-General-Instruct](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct)
280
+ - ARC-Easy & ARC-Challenge - instruct augmented for detailed responses, derived from the `train` split
281
+ - [hellaswag](https://huggingface.co/datasets/hellaswag) - 5K row subset of instruct augmented for concise responses, derived from the `train` split
282
+ - [metaeval/ScienceQA_text_only](https://huggingface.co/datasets/metaeval/ScienceQA_text_only) - instruct for concise responses
283
+ - [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback) - instruct augmented tl;dr summarization
284
+
285
+ Hippogriff differs from Manticore as it does not use the WizardLM, WizardVicuna, Alpaca, or ShareGPT datasets.
286
+
287
+ # Initial thoughts
288
+
289
+ Hippogriff follows instructions pretty well. It still struggles with anything that has to do with math. Prose is much better than manticore. Handles in-context QA much better.
290
+
291
+
292
+ # Shoutouts
293
+
294
+ Special thanks to Nanobit for helping with Axolotl, TheBloke for quantizing these models are more accessible to all, 0x000011b for the RP dataset, and the support from everyone in our AI Discord community.
295
+
296
+ # Demo
297
+
298
+ A Spaces demo is not provided for this release due to 30B models currently not fitting in VRAM.
299
+
300
+ ## Build
301
+
302
+ Hippogriff was built with [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) on 8xA100 80GB
303
+ - 1.5 epochs taking approximately 12 hours. No further epochs will be released.
304
+
305
+ ## Bias, Risks, and Limitations
306
+
307
+ Hippogriff has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
308
+ Hippogriff was fine-tuned from the base model LlaMa 30B, please refer to its model card's Limitations Section for relevant information.
309
+
310
+ The core functionality of Llama is taking a string of text and predicting the next token. While language models are widely used for tasks other than this,
311
+ there are a lot of unknowns with this work. When prompting Hippogriff it is important to remember that the statistically most likely next token is often
312
+ not the token that produces the most "accurate" text. Never depend upon Hippogriff to produce factually accurate output.
313
+
314
+ Llama was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending upon use case Hippogriff may produce
315
+ socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile.
316
+
317
+ As with all language models, it is hard to predict in advance how Hippogriff will respond to particular prompts and offensive content may occur without warning.
318
+ We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.
319
+
320
+ ## Examples
321
+
322
+ TBD
323
+
324
+ ```
325
+ USER: write an original short story of at least 300 words in the style of Zora Neale Hurston. Incorporate the following literary devices: Euphemism and Foreshadowing.
326
+ ASSISTANT: Once upon a time, there was a black girl named Rose, whose skin shined like a pearl beneath the hot sun. She lived with her mama and daddy on their farm, where they grew crops to feed the family and neighbors. On this particular day, Rose felt a tinge of excitement as she watched her parents work diligently in their fields. 'It's almost harvest season!' she thought, imagining the delicious meals that awaited them. But, soon after, it became apparent that something sinister lurked amidst the rows of fresh produce.
327
+
328
+ As her father hoed around a vine-covered cucumber plant, he unearthed a peculiar rock. It was the size of an acorn but had a strange shape, unlike anything he or his wife had ever seen. "What have we here?" He said, turning to his wife with curiosity and excitement. She examined the rock for herself and declared, "Why, it looks like a piece of coal!" But as both mother and daughter looked closer, it seemed to be far more than a mere piece of coal. A sense of unease crept up Rose's spine when she noticed that the rock resembled the dark face of a baby in the womb. As Rose tried to suppress a scream, her mother gasped, "Sweet Jesus! Let's bury this thing before anyone sees!" And so, they quickly hid the stone back underground, hoping that nobody would ever discover their secret.
329
+ ```
330
+
331
+ ```
332
+ USER: tell me 5 words that start with "ex" and then the sum of 33 and 77
333
+ ASSISTANT: Excite, exotic, exhibit, excitable, excessive
334
+ ```
335
+
336
+
337
+ ```
338
+ USER: sally (a girl) has 3 brothers. Each brother has 2 sisters. how many sisters does sally have?
339
+ ASSISTANT: Sally has 3 brothers, and each brother has 2 sisters. To find out how many sisters Sally has in total, we need to multiply the number of bro
340
+ thers by the number of sisters per brother, then add 1 to account for Sally herself. So, 3 * 2 + 1 = 6 sisters in total
341
+ ```