jasonmcaffee
commited on
Commit
•
03563b3
1
Parent(s):
e6cff95
Update README.md
Browse files
README.md
CHANGED
@@ -2,20 +2,35 @@
|
|
2 |
license: mit
|
3 |
---
|
4 |
# Overview
|
5 |
-
|
6 |
|
|
|
7 |
SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.
|
8 |
|
9 |
Example entry:
|
10 |
- Dialogue - "Amanda: I baked cookies. Do you want some? Jerry: Sure! Amanda: I'll bring you tomorrow :-)"
|
11 |
- Summary - "Amanda baked cookies and will bring Jerry some tomorrow."
|
12 |
|
|
|
13 |
[LoRA](https://github.com/microsoft/LoRA) is a performant mechanism for fine tuning models to become better at tasks.
|
14 |
> An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency.
|
15 |
|
16 |
In this case we are training the flan-t5 on the SAMsum dataset in order to create a model that is better at dialog summary.
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
## Notebook Source
|
21 |
[Notebook used to create LoRA adapter](https://colab.research.google.com/drive/1z_mZL6CIRRA4AeF6GXe-zpfEGqqdMk-f?usp=sharing)
|
@@ -135,4 +150,17 @@ trainer.train()
|
|
135 |
|
136 |
TrainOutput(global_step=9210, training_loss=1.1780610539108094, metrics={'train_runtime': 19217.7668, 'train_samples_per_second': 3.833, 'train_steps_per_second': 0.479, 'total_flos': 8.541847343333376e+16, 'train_loss': 1.1780610539108094, 'epoch': 5.0})
|
137 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
138 |
|
|
|
|
2 |
license: mit
|
3 |
---
|
4 |
# Overview
|
5 |
+
This model has been fine tuned for text summary creation, and was created using LoRA to fine tunethe flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum).
|
6 |
|
7 |
+
## SAMsum
|
8 |
SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.
|
9 |
|
10 |
Example entry:
|
11 |
- Dialogue - "Amanda: I baked cookies. Do you want some? Jerry: Sure! Amanda: I'll bring you tomorrow :-)"
|
12 |
- Summary - "Amanda baked cookies and will bring Jerry some tomorrow."
|
13 |
|
14 |
+
## LoRA
|
15 |
[LoRA](https://github.com/microsoft/LoRA) is a performant mechanism for fine tuning models to become better at tasks.
|
16 |
> An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency.
|
17 |
|
18 |
In this case we are training the flan-t5 on the SAMsum dataset in order to create a model that is better at dialog summary.
|
19 |
|
20 |
+
## Flan T5
|
21 |
+
Finetuned LAnguage Net Text-to-Text Transfer Transformer (Flan T5) is a LLM published by Google in 2020. This model has improved abilities over the T5 in zero-shot learning.
|
22 |
+
The [flan-t5 model](https://huggingface.co/google/flan-t5-large) is open and free for commercial use.
|
23 |
+
|
24 |
+
Flan T5 capabilities include:
|
25 |
+
- Translate between several languages (more than 60 languages).
|
26 |
+
- Provide summaries of text.
|
27 |
+
- Answer general questions: “how many minutes should I cook my egg?”
|
28 |
+
- Answer historical questions, and questions related to the future.
|
29 |
+
- Solve math problems when giving the reasoning.
|
30 |
+
|
31 |
+
> T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing. This means that for training, we always need an input sequence and a corresponding target sequence. The input sequence is fed to the model using input_ids. The target sequence is shifted to the right, i.e., prepended by a start-sequence token and fed to the decoder using the decoder_input_ids. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the labels. The PAD token is hereby used as the start-sequence token. T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
|
32 |
+
|
33 |
+
# Code to Create The SAMsum LoRA Adapter
|
34 |
|
35 |
## Notebook Source
|
36 |
[Notebook used to create LoRA adapter](https://colab.research.google.com/drive/1z_mZL6CIRRA4AeF6GXe-zpfEGqqdMk-f?usp=sharing)
|
|
|
150 |
|
151 |
TrainOutput(global_step=9210, training_loss=1.1780610539108094, metrics={'train_runtime': 19217.7668, 'train_samples_per_second': 3.833, 'train_steps_per_second': 0.479, 'total_flos': 8.541847343333376e+16, 'train_loss': 1.1780610539108094, 'epoch': 5.0})
|
152 |
|
153 |
+
## Save the model to disk, zip, and download
|
154 |
+
```
|
155 |
+
peft_model_id="flan-t5-large-samsum"
|
156 |
+
trainer.model.save_pretrained(peft_model_id)
|
157 |
+
tokenizer.save_pretrained(peft_model_id)
|
158 |
+
trainer.model.base_model.save_pretrained(peft_model_id)
|
159 |
+
|
160 |
+
!zip -r /content/flan-t5-large-samsum.zip /content/flan-t5-large-samsum
|
161 |
+
|
162 |
+
from google.colab import files
|
163 |
+
files.download("/content/flan-t5-large-samsum.zip")
|
164 |
+
```
|
165 |
|
166 |
+
Upload the contents of that zip file to huggingface
|