yatharth97 commited on
Commit
dfd6a2f
1 Parent(s): e1da866

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +194 -109
README.md CHANGED
@@ -1,199 +1,284 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
11
 
12
- ## Model Details
13
 
14
- ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
- ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
- ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
 
45
 
46
- ### Downstream Use [optional]
 
 
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
- [More Information Needed]
51
 
52
- ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
55
 
56
- [More Information Needed]
 
 
 
 
57
 
58
- ## Bias, Risks, and Limitations
 
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
61
 
62
- [More Information Needed]
63
 
64
- ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
 
69
 
70
- ## How to Get Started with the Model
 
 
 
 
 
71
 
72
- Use the code below to get started with the model.
 
73
 
74
- [More Information Needed]
 
 
75
 
76
- ## Training Details
 
77
 
78
- ### Training Data
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
- [More Information Needed]
83
 
84
- ### Training Procedure
 
 
 
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
 
 
 
87
 
88
- #### Preprocessing [optional]
 
89
 
90
- [More Information Needed]
 
 
91
 
 
92
 
93
- #### Training Hyperparameters
 
 
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
96
 
97
- #### Speeds, Sizes, Times [optional]
 
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
 
100
 
101
- [More Information Needed]
102
 
103
- ## Evaluation
 
 
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
 
 
 
 
106
 
107
- ### Testing Data, Factors & Metrics
 
108
 
109
- #### Testing Data
 
 
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
114
 
115
- #### Factors
 
 
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
 
120
 
121
- #### Metrics
 
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 
 
124
 
125
- [More Information Needed]
126
 
127
- ### Results
 
 
128
 
129
- [More Information Needed]
130
 
131
- #### Summary
 
132
 
 
 
133
 
 
 
 
134
 
135
- ## Model Examination [optional]
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
138
 
139
- [More Information Needed]
140
 
141
- ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
 
 
 
 
 
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
 
155
- ### Model Architecture and Objective
 
 
 
156
 
157
- [More Information Needed]
 
158
 
159
- ### Compute Infrastructure
 
 
 
 
 
160
 
161
- [More Information Needed]
 
 
 
 
162
 
163
- #### Hardware
164
 
165
- [More Information Needed]
 
 
 
 
166
 
167
- #### Software
 
 
168
 
169
- [More Information Needed]
 
170
 
171
- ## Citation [optional]
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
 
 
174
 
175
- **BibTeX:**
176
 
177
- [More Information Needed]
 
 
 
178
 
179
- **APA:**
180
 
181
- [More Information Needed]
182
 
183
- ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ widget:
4
+ - messages:
5
+ - role: user
6
+ content: How does the brain work?
7
+ inference:
8
+ parameters:
9
+ max_new_tokens: 200
10
+ extra_gated_heading: Access Gemma on Hugging Face
11
+ extra_gated_prompt: >-
12
+ To access Gemma on Hugging Face, you’re required to review and agree to
13
+ Google’s usage license. To do this, please ensure you’re logged-in to Hugging
14
+ Face and click below. Requests are processed immediately.
15
+ extra_gated_button_content: Acknowledge license
16
+ datasets:
17
+ - yatharth97/10k_reports_gemma
18
  ---
19
 
20
+ # yatharth-gemma-7b-it-10k 10k Model Card
21
 
22
+ **Reference Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
23
 
24
+ This model card pertains to the version of the Gemma model that has been fine-tuned on a dataset of 10K reports, specifically to enhance performance on tasks related to answering questions about these reports
25
 
26
 
27
+ **Authors**: Yatharth Mahesh Sant
28
 
29
+ ## Model Information
30
 
31
+ Summary description and brief definition of inputs and outputs.
32
 
33
+ ### Description
34
 
35
+ The model presented here is an advanced adaptation of the Gemma 7B-IT, a member of the Gemma family of lightweight yet state-of-the-art models developed by Google. Leveraging the breakthrough research and technology that brought forth the Gemini models, our fine-tuned iteration specializes in parsing and understanding financial texts, particularly those found in 10-K reports.
 
 
 
 
 
 
36
 
37
+ Dubbed the "yatharth-gemma-7B-it-10k" this model retains the text-to-text, decoder-only architecture of its progenitors, functioning optimally in English. What sets it apart is its refined focus on question-answering tasks specific to the intricate domain of 10-K reports — an invaluable resource for financial analysts, investors, and regulatory professionals seeking AI-driven insights.
38
 
39
+ Preserving the open-weights philosophy of the original Gemma models, this variant has been instruction-tuned with a curated dataset of 10-K reports. It not only demonstrates an enhanced proficiency in generating accurate, context-aware responses to user queries but also maintains the flexibility and efficiency that allow deployment in various settings, from personal computers to cloud-based environments.
40
 
41
+ The "yatharth-gemma-7B-it-10k" upholds the Gemma tradition of facilitating text generation tasks such as summarization and complex reasoning. Its unique optimization for financial reports exemplifies our commitment to pushing the boundaries of specialized AI, providing an unparalleled tool for dissecting and interpreting one of the business world's most information-dense documents.
 
 
42
 
43
+ By marrying the accessibility of the Gemma models with the niche expertise required to navigate 10-K reports, we extend the frontiers of what's possible with AI, democratizing cutting-edge technology to empower financial analysis and decision-making.
44
 
45
+ ### Usage
46
 
47
+ Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
48
 
49
+ #### Fine-tuning the model
50
 
51
+ You can find fine-tuning scripts and notebook under the [`examples/` directory](https://huggingface.co/google/gemma-7b/tree/main/examples) of [`google/gemma-7b`](https://huggingface.co/google/gemma-7b) repository. To adapt it to this model, simply change the model-id to `google/gemma-7b-it`.
52
+ In that repository, we provide:
53
 
54
+ * A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA
55
+ * A script to perform SFT using FSDP on TPU devices
56
+ * A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset
57
 
 
58
 
59
+ #### Running the model on a CPU
60
 
61
+ As explained below, we recommend `torch.bfloat16` as the default dtype. You can use [a different precision](#precisions) if necessary.
62
 
63
+ ```python
64
+ from transformers import AutoTokenizer, AutoModelForCausalLM
65
+ import torch
66
 
67
+ tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
68
+ model = AutoModelForCausalLM.from_pretrained(
69
+ "yatharth97/yatharth-gemma-7b-it-10k",
70
+ torch_dtype=torch.bfloat16
71
+ )
72
 
73
+ input_text = 'Can you tell me what the Total Debt was in 2023?'
74
+ input_ids = tokenizer(input_text, return_tensors="pt")
75
 
76
+ outputs = model.generate(**input_ids)
77
+ print(tokenizer.decode(outputs[0]))
78
+ ```
79
 
 
80
 
81
+ #### Running the model on a single / multi GPU
82
 
 
83
 
84
+ ```python
85
+ # pip install accelerate
86
+ from transformers import AutoTokenizer, AutoModelForCausalLM
87
+ import torch
88
 
89
+ tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
90
+ model = AutoModelForCausalLM.from_pretrained(
91
+ "yatharth97/yatharth-gemma-7b-it-10k",
92
+ device_map="auto",
93
+ torch_dtype=torch.bfloat16
94
+ )
95
 
96
+ input_text = 'Can you tell me what the Total Debt was in 2023?'
97
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
98
 
99
+ outputs = model.generate(**input_ids)
100
+ print(tokenizer.decode(outputs[0]))
101
+ ```
102
 
103
+ <a name="precisions"></a>
104
+ #### Running the model on a GPU using different precisions
105
 
106
+ The native weights of this model were exported in `bfloat16` precision. You can use `float16`, which may be faster on certain hardware, indicating the `torch_dtype` when loading the model. For convenience, the `float16` revision of the repo contains a copy of the weights already converted to that precision.
107
 
108
+ You can also use `float32` if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to `float32`). See examples below.
109
 
110
+ * _Using `torch.float16`_
111
 
112
+ ```python
113
+ # pip install accelerate
114
+ from transformers import AutoTokenizer, AutoModelForCausalLM
115
+ import torch
116
 
117
+ tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
118
+ model = AutoModelForCausalLM.from_pretrained(
119
+ "yatharth97/yatharth-gemma-7b-it-10k",
120
+ device_map="auto",
121
+ torch_dtype=torch.float16,
122
+ revision="float16",
123
+ )
124
 
125
+ input_text = 'Can you tell me what the Total Debt was in 2023?'
126
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
127
 
128
+ outputs = model.generate(**input_ids)
129
+ print(tokenizer.decode(outputs[0]))
130
+ ```
131
 
132
+ * _Using `torch.bfloat16`_
133
 
134
+ ```python
135
+ # pip install accelerate
136
+ from transformers import AutoTokenizer, AutoModelForCausalLM
137
 
138
+ tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
139
+ model = AutoModelForCausalLM.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k", device_map="auto", torch_dtype=torch.bfloat16)
140
 
141
+ input_text = 'Can you tell me what the Total Debt was in 2023?'
142
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
143
 
144
+ outputs = model.generate(**input_ids)
145
+ print(tokenizer.decode(outputs[0]))
146
+ ```
147
 
148
+ * _Upcasting to `torch.float32`_
149
 
150
+ ```python
151
+ # pip install accelerate
152
+ from transformers import AutoTokenizer, AutoModelForCausalLM
153
 
154
+ tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
155
+ model = AutoModelForCausalLM.from_pretrained(
156
+ "yatharth97/yatharth-gemma-7b-it-10k",
157
+ device_map="auto"
158
+ )
159
 
160
+ input_text = 'Can you tell me what the Total Debt was in 2023?'
161
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
162
 
163
+ outputs = model.generate(**input_ids)
164
+ print(tokenizer.decode(outputs[0]))
165
+ ```
166
 
167
+ #### Quantized Versions through `bitsandbytes`
168
 
169
+ * _Using 8-bit precision (int8)_
170
 
171
+ ```python
172
+ # pip install bitsandbytes accelerate
173
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
174
 
175
+ quantization_config = BitsAndBytesConfig(load_in_8bit=True)
176
 
177
+ tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
178
+ model = AutoModelForCausalLM.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k", quantization_config=quantization_config)
179
 
180
+ input_text = 'Can you tell me what the Total Debt was in 2023?'
181
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
182
 
183
+ outputs = model.generate(**input_ids)
184
+ print(tokenizer.decode(outputs[0]))
185
+ ```
186
 
187
+ * _Using 4-bit precision_
188
 
189
+ ```python
190
+ # pip install bitsandbytes accelerate
191
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
192
 
193
+ quantization_config = BitsAndBytesConfig(load_in_4bit=True)
194
 
195
+ tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
196
+ model = AutoModelForCausalLM.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k", quantization_config=quantization_config)
197
 
198
+ input_text = 'Can you tell me what the Total Debt was in 2023?'
199
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
200
 
201
+ outputs = model.generate(**input_ids)
202
+ print(tokenizer.decode(outputs[0]))
203
+ ```
204
 
 
205
 
206
+ #### Other optimizations
207
 
208
+ * _Flash Attention 2_
209
 
210
+ First make sure to install `flash-attn` in your environment `pip install flash-attn`
211
 
212
+ ```diff
213
+ model = AutoModelForCausalLM.from_pretrained(
214
+ model_id,
215
+ torch_dtype=torch.float16,
216
+ + attn_implementation="flash_attention_2"
217
+ ).to(0)
218
+ ```
219
 
220
+ ### Chat Template
221
 
222
+ The instruction-tuned models use a chat template that must be adhered to for conversational use.
223
+ The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
 
 
 
224
 
225
+ Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
226
 
227
+ ```py
228
+ from transformers import AutoTokenizer, AutoModelForCausalLM
229
+ import transformers
230
+ import torch
231
 
232
+ model_id = "google/gemma-7b-it"
233
+ dtype = torch.bfloat16
234
 
235
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
236
+ model = AutoModelForCausalLM.from_pretrained(
237
+ model_id,
238
+ device_map="cuda",
239
+ torch_dtype=dtype,
240
+ )
241
 
242
+ chat = [
243
+ { "role": "user", "content": "Can you tell me what the Total Debt was in 2023?" },
244
+ ]
245
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
246
+ ```
247
 
248
+ At this point, the prompt contains the following text:
249
 
250
+ ```
251
+ <bos><start_of_turn>user
252
+ Can you tell me what the Total Debt was in 2023?<end_of_turn>
253
+ <start_of_turn>model
254
+ ```
255
 
256
+ As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
257
+ (either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
258
+ the `<end_of_turn>` token.
259
 
260
+ You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
261
+ chat template.
262
 
263
+ After the prompt is ready, generation can be performed like this:
264
 
265
+ ```py
266
+ inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
267
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
268
+ print(tokenizer.decode(outputs[0]))
269
+ ```
270
 
271
+ ### Inputs and outputs
272
 
273
+ * **Input:** Text string, such as a question, a prompt, or a 10K document to be
274
+ summarized.
275
+ * **Output:** Generated English-language text in response to the input, such
276
+ as an answer to a question, or a summary of uploaded 10K document. For summarization currently a separate model is being used.
277
 
278
+ ## Model Data
279
 
280
+ Data used for model training and how the data was processed.
281
 
282
+ ### Training Dataset
283
 
284
+ This model is fine tuned on the dataset: "yatharth97/10k_reports_gemma" which has a conversational based format allowing the user to ask questions about the uploaded 10K report