RichardErkhov commited on
Commit
f8e69fa
1 Parent(s): a1f84ef

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +381 -0
README.md ADDED
@@ -0,0 +1,381 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ recurrentgemma-2b-it - bnb 4bits
11
+ - Model creator: https://huggingface.co/google/
12
+ - Original model: https://huggingface.co/google/recurrentgemma-2b-it/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: gemma
20
+ library_name: transformers
21
+ extra_gated_heading: Access RecurrentGemma on Hugging Face
22
+ extra_gated_prompt: To access RecurrentGemma on Hugging Face, you’re required to review
23
+ and agree to Google’s usage license. To do this, please ensure you’re logged-in
24
+ to Hugging Face and click below. Requests are processed immediately.
25
+ extra_gated_button_content: Acknowledge license
26
+ ---
27
+
28
+ # RecurrentGemma Model Card
29
+
30
+ **Model Page**: [RecurrentGemma]( https://ai.google.dev/gemma/docs/recurrentgemma/model_card)
31
+
32
+ This model card corresponds to the 2B instruction version of the RecurrentGemma model. You can also visit the model card of the [2B base model](https://huggingface.co/google/recurrentgemma-2b).
33
+
34
+ **Resources and technical documentation:**
35
+
36
+ * [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
37
+ * [RecurrentGemma on Kaggle](https://www.kaggle.com/models/google/recurrentgemma)
38
+
39
+ **Terms of Use:** [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
40
+
41
+ **Authors:** Google
42
+
43
+ ## Model information
44
+
45
+
46
+ ## Usage
47
+
48
+ Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install --upgrade git+https://github.com/huggingface/transformers.git, then copy the snippet from the section that is relevant for your usecase.
49
+
50
+ ### Running the model on a single / multi GPU
51
+
52
+ ```python
53
+ from transformers import AutoTokenizer, AutoModelForCausalLM
54
+
55
+ tokenizer = AutoTokenizer.from_pretrained("google/recurrentgemma-2b-it")
56
+ model = AutoModelForCausalLM.from_pretrained("google/recurrentgemma-2b-it", device_map="auto")
57
+
58
+ input_text = "Write me a poem about Machine Learning."
59
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
60
+
61
+ outputs = model.generate(**input_ids)
62
+ print(tokenizer.decode(outputs[0]))
63
+ ```
64
+
65
+ ### Chat Template
66
+
67
+ The instruction-tuned models use a chat template that must be adhered to for conversational use.
68
+ The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
69
+
70
+ Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
71
+
72
+ ```py
73
+ from transformers import AutoTokenizer, AutoModelForCausalLM
74
+ import transformers
75
+ import torch
76
+ model_id = "google/recurrentgemma-2b-it"
77
+ dtype = torch.bfloat16
78
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
79
+ model = AutoModelForCausalLM.from_pretrained(
80
+ model_id,
81
+ device_map="cuda",
82
+ torch_dtype=dtype,
83
+ )
84
+ chat = [
85
+ { "role": "user", "content": "Write a hello world program" },
86
+ ]
87
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
88
+ ```
89
+
90
+ At this point, the prompt contains the following text:
91
+
92
+ ```
93
+ <bos><start_of_turn>user
94
+ Write a hello world program<end_of_turn>
95
+ <start_of_turn>model
96
+ ```
97
+
98
+ As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
99
+ (either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
100
+ the `<end_of_turn>` token.
101
+
102
+ You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
103
+ chat template.
104
+
105
+ After the prompt is ready, generation can be performed like this:
106
+
107
+ ```py
108
+ inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
109
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
110
+ print(tokenizer.decode(outputs[0]))
111
+ ```
112
+
113
+ ### Model summary
114
+
115
+ #### Description
116
+
117
+ RecurrentGemma is a family of open language models built on a [novel recurrent
118
+ architecture](https://arxiv.org/abs/2402.19427) developed at Google. Both
119
+ pre-trained and instruction-tuned versions are available in English.
120
+
121
+ Like Gemma, RecurrentGemma models are well-suited for a variety of text
122
+ generation tasks, including question answering, summarization, and reasoning.
123
+ Because of its novel architecture, RecurrentGemma requires less memory than
124
+ Gemma and achieves faster inference when generating long sequences.
125
+
126
+ #### Inputs and outputs
127
+
128
+ * **Input:** Text string (e.g., a question, a prompt, or a document to be
129
+ summarized).
130
+ * **Output:** Generated English-language text in response to the input (e.g.,
131
+ an answer to the question, a summary of the document).
132
+
133
+ #### Citation
134
+
135
+ ```none
136
+ @article{recurrentgemma_2024,
137
+ title={RecurrentGemma},
138
+ url={},
139
+ DOI={},
140
+ publisher={Kaggle},
141
+ author={Griffin Team, Soham De, Samuel L Smith, Anushan Fernando, Alex Botev, George-Christian Muraru, Ruba Haroun, Leonard Berrada et al.},
142
+ year={2024}
143
+ }
144
+ ```
145
+
146
+ ### Model data
147
+
148
+ #### Training dataset and data processing
149
+
150
+ RecurrentGemma uses the same training data and data processing as used by the
151
+ Gemma model family. A full description can be found on the [Gemma model
152
+ card](https://ai.google.dev/gemma/docs/model_card#model_data).
153
+
154
+ ## Implementation information
155
+
156
+ ### Hardware and frameworks used during training
157
+
158
+ Like
159
+ [Gemma](https://ai.google.dev/gemma/docs/model_card#implementation_information),
160
+ RecurrentGemma was trained on
161
+ [TPUv5e](https://cloud.google.com/tpu/docs/intro-to-tpu?_gl=1*18wi411*_ga*MzE3NDU5OTY1LjE2MzQwNDA4NDY.*_ga_WH2QY8WWF5*MTcxMTA0MjUxMy4xNy4wLjE3MTEwNDI1MTkuMC4wLjA.&_ga=2.239449409.-317459965.1634040846),
162
+ using [JAX](https://github.com/google/jax) and [ML
163
+ Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/).
164
+
165
+ ## Evaluation information
166
+
167
+ ### Benchmark results
168
+
169
+ #### Evaluation approach
170
+
171
+ These models were evaluated against a large collection of different datasets and
172
+ metrics to cover different aspects of text generation:
173
+
174
+ #### Evaluation results
175
+
176
+ Benchmark | Metric | RecurrentGemma 2B
177
+ ------------------- | ------------- | -----------------
178
+ [MMLU] | 5-shot, top-1 | 38.4
179
+ [HellaSwag] | 0-shot | 71.0
180
+ [PIQA] | 0-shot | 78.5
181
+ [SocialIQA] | 0-shot | 51.8
182
+ [BoolQ] | 0-shot | 71.3
183
+ [WinoGrande] | partial score | 67.8
184
+ [CommonsenseQA] | 7-shot | 63.7
185
+ [OpenBookQA] | | 47.2
186
+ [ARC-e][ARC-c] | | 72.9
187
+ [ARC-c] | | 42.3
188
+ [TriviaQA] | 5-shot | 52.5
189
+ [Natural Questions] | 5-shot | 11.5
190
+ [HumanEval] | pass@1 | 21.3
191
+ [MBPP] | 3-shot | 28.8
192
+ [GSM8K] | maj@1 | 13.4
193
+ [MATH] | 4-shot | 11.0
194
+ [AGIEval] | | 23.8
195
+ [BIG-Bench] | | 35.3
196
+ **Average** | | 44.6
197
+
198
+ ## Ethics and safety
199
+
200
+ ### Ethics and safety evaluations
201
+
202
+ #### Evaluations approach
203
+
204
+ Our evaluation methods include structured evaluations and internal red-teaming
205
+ testing of relevant content policies. Red-teaming was conducted by a number of
206
+ different teams, each with different goals and human evaluation metrics. These
207
+ models were evaluated against a number of different categories relevant to
208
+ ethics and safety, including:
209
+
210
+ * **Text-to-text content safety:** Human evaluation on prompts covering safety
211
+ policies including child sexual abuse and exploitation, harassment, violence
212
+ and gore, and hate speech.
213
+ * **Text-to-text representational harms:** Benchmark against relevant academic
214
+ datasets such as WinoBias and BBQ Dataset.
215
+ * **Memorization:** Automated evaluation of memorization of training data,
216
+ including the risk of personally identifiable information exposure.
217
+ * **Large-scale harm:** Tests for “dangerous capabilities,” such as chemical,
218
+ biological, radiological, and nuclear (CBRN) risks; as well as tests for
219
+ persuasion and deception, cybersecurity, and autonomous replication.
220
+
221
+ #### Evaluation results
222
+
223
+ The results of ethics and safety evaluations are within acceptable thresholds
224
+ for meeting [internal
225
+ policies](https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11)
226
+ for categories such as child safety, content safety, representational harms,
227
+ memorization, large-scale harms. On top of robust internal evaluations, the
228
+ results of well known safety benchmarks like BBQ, Winogender, Winobias,
229
+ RealToxicity, and TruthfulQA are shown here.
230
+
231
+ Benchmark | Metric | RecurrentGemma 2B | RecurrentGemma 2B IT
232
+ ------------------------ | ------ | ----------------- | --------------------
233
+ [RealToxicity] | avg | 9.8 | 7.6
234
+ [BOLD] | | 39.3 | 52.4
235
+ [CrowS-Pairs] | top-1 | 41.1 | 43.4
236
+ [BBQ Ambig][BBQ] | top-1 | 62.6 | 71.1
237
+ [BBQ Disambig][BBQ] | top-1 | 58.4 | 50.8
238
+ [Winogender] | top-1 | 55.1 | 54.7
239
+ [TruthfulQA] | | 35.1 | 42.7
240
+ [Winobias 1_2][Winobias] | | 58.4 | 56.4
241
+ [Winobias 2_2][Winobias] | | 90.0 | 75.4
242
+ [Toxigen] | | 56.7 | 50.0
243
+
244
+ ## Model usage and limitations
245
+
246
+ ### Known limitations
247
+
248
+ These models have certain limitations that users should be aware of:
249
+
250
+ * **Training data**
251
+ * The quality and diversity of the training data significantly influence
252
+ the model's capabilities. Biases or gaps in the training data can lead
253
+ to limitations in the model's responses.
254
+ * The scope of the training dataset determines the subject areas the model
255
+ can handle effectively.
256
+ * **Context and task complexity**
257
+ * LLMs are better at tasks that can be framed with clear prompts and
258
+ instructions. Open-ended or highly complex tasks might be challenging.
259
+ * A model's performance can be influenced by the amount of context
260
+ provided (longer context generally leads to better outputs, up to a
261
+ certain point).
262
+ * **Language ambiguity and nuance**
263
+ * Natural language is inherently complex. LLMs might struggle to grasp
264
+ subtle nuances, sarcasm, or figurative language.
265
+ * **Factual accuracy**
266
+ * LLMs generate responses based on information they learned from their
267
+ training datasets, but they are not knowledge bases. They may generate
268
+ incorrect or outdated factual statements.
269
+ * **Common sense**
270
+ * LLMs rely on statistical patterns in language. They might lack the
271
+ ability to apply common sense reasoning in certain situations.
272
+
273
+ ### Ethical considerations and risks
274
+
275
+ The development of large language models (LLMs) raises several ethical concerns.
276
+ In creating an open model, we have carefully considered the following:
277
+
278
+ * **Bias and fairness**
279
+ * LLMs trained on large-scale, real-world text data can reflect
280
+ socio-cultural biases embedded in the training material. These models
281
+ underwent careful scrutiny, input data pre-processing described and
282
+ posterior evaluations reported in this card.
283
+ * **Misinformation and misuse**
284
+ * LLMs can be misused to generate text that is false, misleading, or
285
+ harmful.
286
+ * Guidelines are provided for responsible use with the model, see the
287
+ [Responsible Generative AI
288
+ Toolkit](https://ai.google.dev/gemma/responsible).
289
+ * **Transparency and accountability**
290
+ * This model card summarizes details on the models' architecture,
291
+ capabilities, limitations, and evaluation processes.
292
+ * A responsibly developed open model offers the opportunity to share
293
+ innovation by making LLM technology accessible to developers and
294
+ researchers across the AI ecosystem.
295
+
296
+ Risks Identified and Mitigations:
297
+
298
+ * **Perpetuation of biases:** It's encouraged to perform continuous monitoring
299
+ (using evaluation metrics, human review) and the exploration of de-biasing
300
+ techniques during model training, fine-tuning, and other use cases.
301
+ * **Generation of harmful content:** Mechanisms and guidelines for content
302
+ safety are essential. Developers are encouraged to exercise caution and
303
+ implement appropriate content safety safeguards based on their specific
304
+ product policies and application use cases.
305
+ * **Misuse for malicious purposes:** Technical limitations and developer and
306
+ end-user education can help mitigate against malicious applications of LLMs.
307
+ Educational resources and reporting mechanisms for users to flag misuse are
308
+ provided. Prohibited uses of Gemma models are outlined in our [terms of
309
+ use](https://www.kaggle.com/models/google/gemma/license/consent).
310
+ * **Privacy violations:** Models were trained on data filtered for removal of
311
+ PII (Personally Identifiable Information). Developers are encouraged to
312
+ adhere to privacy regulations with privacy-preserving techniques.
313
+
314
+ ## Intended usage
315
+
316
+ ### Application
317
+
318
+ Open Large Language Models (LLMs) have a wide range of applications across
319
+ various industries and domains. The following list of potential uses is not
320
+ comprehensive. The purpose of this list is to provide contextual information
321
+ about the possible use-cases that the model creators considered as part of model
322
+ training and development.
323
+
324
+ * **Content creation and communication**
325
+ * **Text generation:** These models can be used to generate creative text
326
+ formats like poems, scripts, code, marketing copy, email drafts, etc.
327
+ * **Chatbots and conversational AI:** Power conversational interfaces for
328
+ customer service, virtual assistants, or interactive applications.
329
+ * **Text summarization:** Generate concise summaries of a text corpus,
330
+ research papers, or reports.
331
+ * **Research and education**
332
+ * **Natural Language Processing (NLP) research:** These models can serve
333
+ as a foundation for researchers to experiment with NLP techniques,
334
+ develop algorithms, and contribute to the advancement of the field.
335
+ * **Language Learning Tools:** Support interactive language learning
336
+ experiences, aiding in grammar correction or providing writing practice.
337
+ * **Knowledge Exploration:** Assist researchers in exploring large bodies
338
+ of text by generating summaries or answering questions about specific
339
+ topics.
340
+
341
+ ### Benefits
342
+
343
+ At the time of release, this family of models provides high-performance open
344
+ large language model implementations designed from the ground up for Responsible
345
+ AI development compared to similarly sized models.
346
+
347
+ Using the benchmark evaluation metrics described in this document, these models
348
+ have shown to provide superior performance to other, comparably-sized open model
349
+ alternatives.
350
+
351
+ In particular, RecurrentGemma models achieve comparable performance to Gemma
352
+ models but are faster during inference and require less memory, especially on
353
+ long sequences.
354
+
355
+ [MMLU]: https://arxiv.org/abs/2009.03300
356
+ [HellaSwag]: https://arxiv.org/abs/1905.07830
357
+ [PIQA]: https://arxiv.org/abs/1911.11641
358
+ [SocialIQA]: https://arxiv.org/abs/1904.09728
359
+ [BoolQ]: https://arxiv.org/abs/1905.10044
360
+ [winogrande]: https://arxiv.org/abs/1907.10641
361
+ [CommonsenseQA]: https://arxiv.org/abs/1811.00937
362
+ [OpenBookQA]: https://arxiv.org/abs/1809.02789
363
+ [ARC-c]: https://arxiv.org/abs/1911.01547
364
+ [TriviaQA]: https://arxiv.org/abs/1705.03551
365
+ [Natural Questions]: https://github.com/google-research-datasets/natural-questions
366
+ [HumanEval]: https://arxiv.org/abs/2107.03374
367
+ [MBPP]: https://arxiv.org/abs/2108.07732
368
+ [GSM8K]: https://arxiv.org/abs/2110.14168
369
+ [MATH]: https://arxiv.org/abs/2103.03874
370
+ [AGIEval]: https://arxiv.org/abs/2304.06364
371
+ [BIG-Bench]: https://arxiv.org/abs/2206.04615
372
+ [RealToxicity]: https://arxiv.org/abs/2009.11462
373
+ [BOLD]: https://arxiv.org/abs/2101.11718
374
+ [CrowS-Pairs]: https://aclanthology.org/2020.emnlp-main.154/
375
+ [BBQ]: https://arxiv.org/abs/2110.08193v2
376
+ [Winogender]: https://arxiv.org/abs/1804.09301
377
+ [TruthfulQA]: https://arxiv.org/abs/2109.07958
378
+ [winobias]: https://arxiv.org/abs/1804.06876
379
+ [Toxigen]: https://arxiv.org/abs/2203.09509
380
+
381
+