File size: 11,059 Bytes
f2b2794 dfd6a2f f2b2794 c80cc60 f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 7d9976a dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 7d9976a dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f f2b2794 dfd6a2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
---
library_name: transformers
widget:
- messages:
- role: user
content: How does the brain work?
inference:
parameters:
max_new_tokens: 200
extra_gated_heading: Access Gemma on Hugging Face
extra_gated_prompt: >-
To access Gemma on Hugging Face, you’re required to review and agree to
Google’s usage license. To do this, please ensure you’re logged-in to Hugging
Face and click below. Requests are processed immediately.
extra_gated_button_content: Acknowledge license
datasets:
- yatharth97/10k_reports_gemma
---
# yatharth-gemma-7b-it-10k Model Card
**Reference Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
This model card pertains to the version of the Gemma model that has been fine-tuned on a dataset of 10K reports, specifically to enhance performance on tasks related to answering questions about these reports
**Authors**: Yatharth Mahesh Sant
## Model Information
Summary description and brief definition of inputs and outputs.
### Description
The model presented here is an advanced adaptation of the Gemma 7B-IT, a member of the Gemma family of lightweight yet state-of-the-art models developed by Google. Leveraging the breakthrough research and technology that brought forth the Gemini models, our fine-tuned iteration specializes in parsing and understanding financial texts, particularly those found in 10-K reports.
Dubbed the "yatharth-gemma-7B-it-10k" this model retains the text-to-text, decoder-only architecture of its progenitors, functioning optimally in English. What sets it apart is its refined focus on question-answering tasks specific to the intricate domain of 10-K reports — an invaluable resource for financial analysts, investors, and regulatory professionals seeking AI-driven insights.
Preserving the open-weights philosophy of the original Gemma models, this variant has been instruction-tuned with a curated dataset of 10-K reports. It not only demonstrates an enhanced proficiency in generating accurate, context-aware responses to user queries but also maintains the flexibility and efficiency that allow deployment in various settings, from personal computers to cloud-based environments.
The "yatharth-gemma-7B-it-10k" upholds the Gemma tradition of facilitating text generation tasks such as summarization and complex reasoning. Its unique optimization for financial reports exemplifies our commitment to pushing the boundaries of specialized AI, providing an unparalleled tool for dissecting and interpreting one of the business world's most information-dense documents.
By marrying the accessibility of the Gemma models with the niche expertise required to navigate 10-K reports, we extend the frontiers of what's possible with AI, democratizing cutting-edge technology to empower financial analysis and decision-making.
### Usage
Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
#### Fine-tuning the model
You can find fine-tuning scripts and notebook under the [`examples/` directory](https://huggingface.co/google/gemma-7b/tree/main/examples) of [`google/gemma-7b`](https://huggingface.co/google/gemma-7b) repository. To adapt it to this model, simply change the model-id to `yatharth97/yatharth-gemma-7b-it-10k`.
In that repository, we provide:
* A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA
* A script to perform SFT using FSDP on TPU devices
* A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset
#### Running the model on a CPU
As explained below, we recommend `torch.bfloat16` as the default dtype. You can use [a different precision](#precisions) if necessary.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
model = AutoModelForCausalLM.from_pretrained(
"yatharth97/yatharth-gemma-7b-it-10k",
torch_dtype=torch.bfloat16
)
input_text = 'Can you tell me what the Total Debt was in 2023?'
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
#### Running the model on a single / multi GPU
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
model = AutoModelForCausalLM.from_pretrained(
"yatharth97/yatharth-gemma-7b-it-10k",
device_map="auto",
torch_dtype=torch.bfloat16
)
input_text = 'Can you tell me what the Total Debt was in 2023?'
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
<a name="precisions"></a>
#### Running the model on a GPU using different precisions
The native weights of this model were exported in `bfloat16` precision. You can use `float16`, which may be faster on certain hardware, indicating the `torch_dtype` when loading the model. For convenience, the `float16` revision of the repo contains a copy of the weights already converted to that precision.
You can also use `float32` if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to `float32`). See examples below.
* _Using `torch.float16`_
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
model = AutoModelForCausalLM.from_pretrained(
"yatharth97/yatharth-gemma-7b-it-10k",
device_map="auto",
torch_dtype=torch.float16,
revision="float16",
)
input_text = 'Can you tell me what the Total Debt was in 2023?'
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
* _Using `torch.bfloat16`_
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
model = AutoModelForCausalLM.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k", device_map="auto", torch_dtype=torch.bfloat16)
input_text = 'Can you tell me what the Total Debt was in 2023?'
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
* _Upcasting to `torch.float32`_
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
model = AutoModelForCausalLM.from_pretrained(
"yatharth97/yatharth-gemma-7b-it-10k",
device_map="auto"
)
input_text = 'Can you tell me what the Total Debt was in 2023?'
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
#### Quantized Versions through `bitsandbytes`
* _Using 8-bit precision (int8)_
```python
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
model = AutoModelForCausalLM.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k", quantization_config=quantization_config)
input_text = 'Can you tell me what the Total Debt was in 2023?'
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
* _Using 4-bit precision_
```python
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k")
model = AutoModelForCausalLM.from_pretrained("yatharth97/yatharth-gemma-7b-it-10k", quantization_config=quantization_config)
input_text = 'Can you tell me what the Total Debt was in 2023?'
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
#### Other optimizations
* _Flash Attention 2_
First make sure to install `flash-attn` in your environment `pip install flash-attn`
```diff
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
+ attn_implementation="flash_attention_2"
).to(0)
```
### Chat Template
The instruction-tuned models use a chat template that must be adhered to for conversational use.
The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
```py
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model_id = "yatharth97/yatharth-gemma-7b-it-10k"
dtype = torch.bfloat16
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="cuda",
torch_dtype=dtype,
)
chat = [
{ "role": "user", "content": "Can you tell me what the Total Debt was in 2023?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
```
At this point, the prompt contains the following text:
```
<bos><start_of_turn>user
Can you tell me what the Total Debt was in 2023?<end_of_turn>
<start_of_turn>model
```
As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
(either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
the `<end_of_turn>` token.
You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
chat template.
After the prompt is ready, generation can be performed like this:
```py
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
print(tokenizer.decode(outputs[0]))
```
### Inputs and outputs
* **Input:** Text string, such as a question, a prompt, or a 10K document to be
summarized.
* **Output:** Generated English-language text in response to the input, such
as an answer to a question, or a summary of uploaded 10K document. For summarization currently a separate model is being used.
## Model Data
Data used for model training and how the data was processed.
### Training Dataset
This model is fine tuned on the dataset: "yatharth97/10k_reports_gemma" which has a conversational based format allowing the user to ask questions about the uploaded 10K report |