TheBloke commited on
Commit
ff977cb
1 Parent(s): f636f13

Update for Transformers GPTQ support

Browse files
README.md CHANGED
@@ -13,17 +13,20 @@ tags:
13
  ---
14
 
15
  <!-- header start -->
16
- <div style="width: 100%;">
17
- <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
 
18
  </div>
19
  <div style="display: flex; justify-content: space-between; width: 100%;">
20
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
21
- <p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
22
  </div>
23
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
24
- <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
25
  </div>
26
  </div>
 
 
27
  <!-- header end -->
28
 
29
  # Georgia Tech Research Institute's Galactica 30B Evol Instruct 70K GPTQ
@@ -75,7 +78,7 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
75
  import argparse
76
 
77
  model_name_or_path = "TheBloke/galactica-30B-evol-instruct-70K-GPTQ"
78
- model_basename = "gptq_model-4bit--1g"
79
 
80
  use_triton = False
81
 
@@ -134,11 +137,12 @@ It was created without group_size to lower VRAM requirements, and with --act-ord
134
  * Parameters: Groupsize = -1. Act Order / desc_act = True.
135
 
136
  <!-- footer start -->
 
137
  ## Discord
138
 
139
  For further support, and discussions on these models and AI in general, join us at:
140
 
141
- [TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)
142
 
143
  ## Thanks, and how to contribute.
144
 
@@ -153,12 +157,15 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
153
  * Patreon: https://patreon.com/TheBlokeAI
154
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
155
 
156
- **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
 
 
157
 
158
- **Patreon special mentions**: vamX, K, Jonathan Leane, Lone Striker, Sean Connelly, Chris McCloskey, WelcomeToTheClub, Nikolai Manek, John Detwiler, Kalila, David Flickinger, Fen Risland, subjectnull, Johann-Peter Hartmann, Talal Aujan, John Villwock, senxiiz, Khalefa Al-Ahmad, Kevin Schuppel, Alps Aficionado, Derek Yates, Mano Prime, Nathan LeClaire, biorpg, trip7s trip, Asp the Wyvern, chris gileta, Iucharbius , Artur Olbinski, Ai Maven, Joseph William Delisle, Luke Pendergrass, Illia Dulskyi, Eugene Pentland, Ajan Kanaga, Willem Michiel, Space Cruiser, Pyrater, Preetika Verma, Junyu Yang, Oscar Rangel, Spiking Neurons AB, Pierre Kircher, webtim, Cory Kujawski, terasurfer , Trenton Dambrowitz, Gabriel Puliatti, Imad Khwaja, Luke.
159
 
160
  Thank you to all my generous patrons and donaters!
161
 
 
 
162
  <!-- footer end -->
163
 
164
  # Original model card: Georgia Tech Research Institute's Galactica 30B Evol Instruct 70K
@@ -174,7 +181,7 @@ The HF dataset for Evol-Instruct-70k can be found [here](https://huggingface.co/
174
 
175
  ## Model Details
176
 
177
- The GALACTICA models are trained on a large-scale scientific corpus and are designed to perform scientific tasks.
178
  Evol-instruct-70k is a dataset of evolved instruction-response pairs generated via the Evol-Instruct framework using gpt-3.5-turbo.
179
 
180
  ## Model Use
@@ -184,11 +191,11 @@ The original GALACTICA models are available under a non-commercial CC BY-NC 4.0
184
 
185
  <details>
186
  <summary> Example Outputs </summary>
187
-
188
  *__Full Prompt:__*
189
  ```
190
  ### Instruction:
191
- Write out and explain Maxwell's equations using markdown and latex syntax. Highlight the key implication of each equation.
192
 
193
  ### Response:
194
  ```
@@ -208,11 +215,11 @@ This equation states that the divergence of the magnetic field B is always zero.
208
  - **Maxwell-Faraday Equation:**
209
  $$\nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t}$$
210
  This equation states that the curl of the electric field is equal to the negative of the time derivative of the magnetic field. This means that changing magnetic fields can create electric fields, and vice versa.
211
-
212
  - **Ampere's Law with Maxwell's Addition:**
213
  $$\nabla \times \mathbf{B} = \mu_0 \left( \mathbf{J} + \epsilon_0 \frac{\partial \mathbf{E}}{\partial t} \right)$$
214
  This equation states that the curl of the magnetic field is equal to the product of the vacuum permeability μ0 and the sum of the electric current density J and the time derivative of the electric field. This means that changing electric fields can create magnetic fields, and that electric currents can also create magnetic fields.
215
-
216
  **Implications**
217
  - Gauss's Laws imply that electric charges must be conserved, since the net charge in a closed system must always be zero.
218
  - The Maxwell-Faraday Equation implies that electromagnetic waves can travel through space, since changing magnetic fields can create electric fields, which can create changing magnetic fields, and so on. This is the basis for radio waves, microwaves, and other forms of electromagnetic radiation.
@@ -273,7 +280,7 @@ Qualitative evaluation suggests that the evol-instruct-70k fine-tuned Galactica
273
 
274
  ```bibtex
275
  @misc{xu2023wizardlm,
276
- title={WizardLM: Empowering Large Language Models to Follow Complex Instructions},
277
  author={Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Daxin Jiang},
278
  year={2023},
279
  eprint={2304.12244},
 
13
  ---
14
 
15
  <!-- header start -->
16
+ <!-- 200823 -->
17
+ <div style="width: auto; margin-left: auto; margin-right: auto">
18
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
19
  </div>
20
  <div style="display: flex; justify-content: space-between; width: 100%;">
21
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
22
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
23
  </div>
24
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
25
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
26
  </div>
27
  </div>
28
+ <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
29
+ <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
30
  <!-- header end -->
31
 
32
  # Georgia Tech Research Institute's Galactica 30B Evol Instruct 70K GPTQ
 
78
  import argparse
79
 
80
  model_name_or_path = "TheBloke/galactica-30B-evol-instruct-70K-GPTQ"
81
+ model_basename = "model"
82
 
83
  use_triton = False
84
 
 
137
  * Parameters: Groupsize = -1. Act Order / desc_act = True.
138
 
139
  <!-- footer start -->
140
+ <!-- 200823 -->
141
  ## Discord
142
 
143
  For further support, and discussions on these models and AI in general, join us at:
144
 
145
+ [TheBloke AI's Discord server](https://discord.gg/theblokeai)
146
 
147
  ## Thanks, and how to contribute.
148
 
 
157
  * Patreon: https://patreon.com/TheBlokeAI
158
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
159
 
160
+ **Special thanks to**: Aemon Algiz.
161
+
162
+ **Patreon special mentions**: Sam, theTransient, Jonathan Leane, Steven Wood, webtim, Johann-Peter Hartmann, Geoffrey Montalvo, Gabriel Tamborski, Willem Michiel, John Villwock, Derek Yates, Mesiah Bishop, Eugene Pentland, Pieter, Chadd, Stephen Murray, Daniel P. Andersen, terasurfer, Brandon Frisco, Thomas Belote, Sid, Nathan LeClaire, Magnesian, Alps Aficionado, Stanislav Ovsiannikov, Alex, Joseph William Delisle, Nikolai Manek, Michael Davis, Junyu Yang, K, J, Spencer Kim, Stefan Sabev, Olusegun Samson, transmissions 11, Michael Levine, Cory Kujawski, Rainer Wilmers, zynix, Kalila, Luke @flexchar, Ajan Kanaga, Mandus, vamX, Ai Maven, Mano Prime, Matthew Berman, subjectnull, Vitor Caleffi, Clay Pascal, biorpg, alfie_i, 阿明, Jeffrey Morgan, ya boyyy, Raymond Fosdick, knownsqashed, Olakabola, Leonard Tan, ReadyPlayerEmma, Enrico Ros, Dave, Talal Aujan, Illia Dulskyi, Sean Connelly, senxiiz, Artur Olbinski, Elle, Raven Klaugh, Fen Risland, Deep Realms, Imad Khwaja, Fred von Graf, Will Dee, usrbinkat, SuperWojo, Alexandros Triantafyllidis, Swaroop Kallakuri, Dan Guido, John Detwiler, Pedro Madruga, Iucharbius, Viktor Bowallius, Asp the Wyvern, Edmond Seymore, Trenton Dambrowitz, Space Cruiser, Spiking Neurons AB, Pyrater, LangChain4j, Tony Hughes, Kacper Wikieł, Rishabh Srivastava, David Ziegler, Luke Pendergrass, Andrey, Gabriel Puliatti, Lone Striker, Sebastain Graf, Pierre Kircher, Randy H, NimbleBox.ai, Vadim, danny, Deo Leter
163
 
 
164
 
165
  Thank you to all my generous patrons and donaters!
166
 
167
+ And thank you again to a16z for their generous grant.
168
+
169
  <!-- footer end -->
170
 
171
  # Original model card: Georgia Tech Research Institute's Galactica 30B Evol Instruct 70K
 
181
 
182
  ## Model Details
183
 
184
+ The GALACTICA models are trained on a large-scale scientific corpus and are designed to perform scientific tasks.
185
  Evol-instruct-70k is a dataset of evolved instruction-response pairs generated via the Evol-Instruct framework using gpt-3.5-turbo.
186
 
187
  ## Model Use
 
191
 
192
  <details>
193
  <summary> Example Outputs </summary>
194
+
195
  *__Full Prompt:__*
196
  ```
197
  ### Instruction:
198
+ Write out and explain Maxwell's equations using markdown and latex syntax. Highlight the key implication of each equation.
199
 
200
  ### Response:
201
  ```
 
215
  - **Maxwell-Faraday Equation:**
216
  $$\nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t}$$
217
  This equation states that the curl of the electric field is equal to the negative of the time derivative of the magnetic field. This means that changing magnetic fields can create electric fields, and vice versa.
218
+
219
  - **Ampere's Law with Maxwell's Addition:**
220
  $$\nabla \times \mathbf{B} = \mu_0 \left( \mathbf{J} + \epsilon_0 \frac{\partial \mathbf{E}}{\partial t} \right)$$
221
  This equation states that the curl of the magnetic field is equal to the product of the vacuum permeability μ0 and the sum of the electric current density J and the time derivative of the electric field. This means that changing electric fields can create magnetic fields, and that electric currents can also create magnetic fields.
222
+
223
  **Implications**
224
  - Gauss's Laws imply that electric charges must be conserved, since the net charge in a closed system must always be zero.
225
  - The Maxwell-Faraday Equation implies that electromagnetic waves can travel through space, since changing magnetic fields can create electric fields, which can create changing magnetic fields, and so on. This is the basis for radio waves, microwaves, and other forms of electromagnetic radiation.
 
280
 
281
  ```bibtex
282
  @misc{xu2023wizardlm,
283
+ title={WizardLM: Empowering Large Language Models to Follow Complex Instructions},
284
  author={Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Daxin Jiang},
285
  year={2023},
286
  eprint={2304.12244},
config.json CHANGED
@@ -1,32 +1,43 @@
1
  {
2
- "_name_or_path": "./galactica-30b-evol-instruct-70k/",
3
- "_remove_final_layer_norm": false,
4
- "activation_dropout": 0.0,
5
- "activation_function": "gelu",
6
- "architectures": [
7
- "OPTForCausalLM"
8
- ],
9
- "attention_dropout": 0.1,
10
- "bos_token_id": 0,
11
- "do_layer_norm_before": true,
12
- "dropout": 0.1,
13
- "enable_bias": true,
14
- "eos_token_id": 2,
15
- "ffn_dim": 28672,
16
- "hidden_size": 7168,
17
- "init_std": 0.02,
18
- "layer_norm_elementwise_affine": true,
19
- "layerdrop": 0.0,
20
- "learned_embeddings": true,
21
- "max_position_embeddings": 2048,
22
- "model_type": "opt",
23
- "num_attention_heads": 56,
24
- "num_hidden_layers": 48,
25
- "pad_token_id": 1,
26
- "scale_embeddings": false,
27
- "torch_dtype": "bfloat16",
28
- "transformers_version": "4.29.2",
29
- "use_cache": true,
30
- "vocab_size": 50000,
31
- "word_embed_proj_dim": 7168
 
 
 
 
 
 
 
 
 
 
 
32
  }
 
1
  {
2
+ "_name_or_path": "./galactica-30b-evol-instruct-70k/",
3
+ "_remove_final_layer_norm": false,
4
+ "activation_dropout": 0.0,
5
+ "activation_function": "gelu",
6
+ "architectures": [
7
+ "OPTForCausalLM"
8
+ ],
9
+ "attention_dropout": 0.1,
10
+ "bos_token_id": 0,
11
+ "do_layer_norm_before": true,
12
+ "dropout": 0.1,
13
+ "enable_bias": true,
14
+ "eos_token_id": 2,
15
+ "ffn_dim": 28672,
16
+ "hidden_size": 7168,
17
+ "init_std": 0.02,
18
+ "layer_norm_elementwise_affine": true,
19
+ "layerdrop": 0.0,
20
+ "learned_embeddings": true,
21
+ "max_position_embeddings": 2048,
22
+ "model_type": "opt",
23
+ "num_attention_heads": 56,
24
+ "num_hidden_layers": 48,
25
+ "pad_token_id": 1,
26
+ "scale_embeddings": false,
27
+ "torch_dtype": "bfloat16",
28
+ "transformers_version": "4.29.2",
29
+ "use_cache": true,
30
+ "vocab_size": 50000,
31
+ "word_embed_proj_dim": 7168,
32
+ "quantization_config": {
33
+ "bits": 4,
34
+ "group_size": -1,
35
+ "damp_percent": 0.01,
36
+ "desc_act": true,
37
+ "sym": true,
38
+ "true_sequential": true,
39
+ "model_name_or_path": null,
40
+ "model_file_base_name": "model",
41
+ "quant_method": "gptq"
42
+ }
43
  }
gptq_model-4bit--1g.safetensors → model.safetensors RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:92168124525e9c8d908944fb6492437d10d22cd786fa5982ad8fce94d00c5852
3
- size 16289789784
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66612ab8598796dc477b4ee8126f79e382261a310997a0b0ee312dbd61f40c5c
3
+ size 16289789840
quantize_config.json CHANGED
@@ -6,5 +6,5 @@
6
  "sym": true,
7
  "true_sequential": true,
8
  "model_name_or_path": null,
9
- "model_file_base_name": null
10
  }
 
6
  "sym": true,
7
  "true_sequential": true,
8
  "model_name_or_path": null,
9
+ "model_file_base_name": "model"
10
  }