jasonmcaffee commited on
Commit
03563b3
1 Parent(s): e6cff95

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -2
README.md CHANGED
@@ -2,20 +2,35 @@
2
  license: mit
3
  ---
4
  # Overview
5
- A LoRA adapter created by fine tuning the flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum).
6
 
 
7
  SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.
8
 
9
  Example entry:
10
  - Dialogue - "Amanda: I baked cookies. Do you want some? Jerry: Sure! Amanda: I'll bring you tomorrow :-)"
11
  - Summary - "Amanda baked cookies and will bring Jerry some tomorrow."
12
 
 
13
  [LoRA](https://github.com/microsoft/LoRA) is a performant mechanism for fine tuning models to become better at tasks.
14
  > An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency.
15
 
16
  In this case we are training the flan-t5 on the SAMsum dataset in order to create a model that is better at dialog summary.
17
 
18
- # Code
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Notebook Source
21
  [Notebook used to create LoRA adapter](https://colab.research.google.com/drive/1z_mZL6CIRRA4AeF6GXe-zpfEGqqdMk-f?usp=sharing)
@@ -135,4 +150,17 @@ trainer.train()
135
 
136
  TrainOutput(global_step=9210, training_loss=1.1780610539108094, metrics={'train_runtime': 19217.7668, 'train_samples_per_second': 3.833, 'train_steps_per_second': 0.479, 'total_flos': 8.541847343333376e+16, 'train_loss': 1.1780610539108094, 'epoch': 5.0})
137
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
 
 
2
  license: mit
3
  ---
4
  # Overview
5
+ This model has been fine tuned for text summary creation, and was created using LoRA to fine tunethe flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum).
6
 
7
+ ## SAMsum
8
  SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.
9
 
10
  Example entry:
11
  - Dialogue - "Amanda: I baked cookies. Do you want some? Jerry: Sure! Amanda: I'll bring you tomorrow :-)"
12
  - Summary - "Amanda baked cookies and will bring Jerry some tomorrow."
13
 
14
+ ## LoRA
15
  [LoRA](https://github.com/microsoft/LoRA) is a performant mechanism for fine tuning models to become better at tasks.
16
  > An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency.
17
 
18
  In this case we are training the flan-t5 on the SAMsum dataset in order to create a model that is better at dialog summary.
19
 
20
+ ## Flan T5
21
+ Finetuned LAnguage Net Text-to-Text Transfer Transformer (Flan T5) is a LLM published by Google in 2020. This model has improved abilities over the T5 in zero-shot learning.
22
+ The [flan-t5 model](https://huggingface.co/google/flan-t5-large) is open and free for commercial use.
23
+
24
+ Flan T5 capabilities include:
25
+ - Translate between several languages (more than 60 languages).
26
+ - Provide summaries of text.
27
+ - Answer general questions: “how many minutes should I cook my egg?”
28
+ - Answer historical questions, and questions related to the future.
29
+ - Solve math problems when giving the reasoning.
30
+
31
+ > T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing. This means that for training, we always need an input sequence and a corresponding target sequence. The input sequence is fed to the model using input_ids. The target sequence is shifted to the right, i.e., prepended by a start-sequence token and fed to the decoder using the decoder_input_ids. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the labels. The PAD token is hereby used as the start-sequence token. T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
32
+
33
+ # Code to Create The SAMsum LoRA Adapter
34
 
35
  ## Notebook Source
36
  [Notebook used to create LoRA adapter](https://colab.research.google.com/drive/1z_mZL6CIRRA4AeF6GXe-zpfEGqqdMk-f?usp=sharing)
 
150
 
151
  TrainOutput(global_step=9210, training_loss=1.1780610539108094, metrics={'train_runtime': 19217.7668, 'train_samples_per_second': 3.833, 'train_steps_per_second': 0.479, 'total_flos': 8.541847343333376e+16, 'train_loss': 1.1780610539108094, 'epoch': 5.0})
152
 
153
+ ## Save the model to disk, zip, and download
154
+ ```
155
+ peft_model_id="flan-t5-large-samsum"
156
+ trainer.model.save_pretrained(peft_model_id)
157
+ tokenizer.save_pretrained(peft_model_id)
158
+ trainer.model.base_model.save_pretrained(peft_model_id)
159
+
160
+ !zip -r /content/flan-t5-large-samsum.zip /content/flan-t5-large-samsum
161
+
162
+ from google.colab import files
163
+ files.download("/content/flan-t5-large-samsum.zip")
164
+ ```
165
 
166
+ Upload the contents of that zip file to huggingface