jasonmcaffee
/

flan-t5-large-samsum

Text2Text Generation

Transformers

PyTorch

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

jasonmcaffee commited on May 17, 2023

Commit

03563b3

1 Parent(s): e6cff95

Update README.md

Browse files

Files changed (1) hide show

README.md +30 -2

README.md CHANGED Viewed

@@ -2,20 +2,35 @@
 license: mit
 ---
 # Overview
-A LoRA adapter created by fine tuning the flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum).
 SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.
 Example entry:
 - Dialogue - "Amanda: I baked cookies. Do you want some? Jerry: Sure! Amanda: I'll bring you tomorrow :-)"
 - Summary - "Amanda baked cookies and will bring Jerry some tomorrow."
 [LoRA](https://github.com/microsoft/LoRA) is a performant mechanism for fine tuning models to become better at tasks.
 > An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency.
 In this case we are training the flan-t5 on the SAMsum dataset in order to create a model that is better at dialog summary.
-# Code
 ## Notebook Source
 [Notebook used to create LoRA adapter](https://colab.research.google.com/drive/1z_mZL6CIRRA4AeF6GXe-zpfEGqqdMk-f?usp=sharing)
@@ -135,4 +150,17 @@ trainer.train()
 TrainOutput(global_step=9210, training_loss=1.1780610539108094, metrics={'train_runtime': 19217.7668, 'train_samples_per_second': 3.833, 'train_steps_per_second': 0.479, 'total_flos': 8.541847343333376e+16, 'train_loss': 1.1780610539108094, 'epoch': 5.0})

 license: mit
 ---
 # Overview
+This model has been fine tuned for text summary creation, and was created using LoRA to fine tunethe flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum).
+## SAMsum
 SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.
 Example entry:
 - Dialogue - "Amanda: I baked cookies. Do you want some? Jerry: Sure! Amanda: I'll bring you tomorrow :-)"
 - Summary - "Amanda baked cookies and will bring Jerry some tomorrow."
+## LoRA
 [LoRA](https://github.com/microsoft/LoRA) is a performant mechanism for fine tuning models to become better at tasks.
 > An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency.
 In this case we are training the flan-t5 on the SAMsum dataset in order to create a model that is better at dialog summary.
+## Flan T5
+Finetuned LAnguage Net Text-to-Text Transfer Transformer (Flan T5) is a LLM published by Google in 2020. This model has improved abilities over the T5 in zero-shot learning.
+The [flan-t5 model](https://huggingface.co/google/flan-t5-large) is open and free for commercial use.
+Flan T5 capabilities include:
+- Translate between several languages (more than 60 languages).
+- Provide summaries of text.
+- Answer general questions: “how many minutes should I cook my egg?”
+- Answer historical questions, and questions related to the future.
+- Solve math problems when giving the reasoning.
+> T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing. This means that for training, we always need an input sequence and a corresponding target sequence. The input sequence is fed to the model using input_ids. The target sequence is shifted to the right, i.e., prepended by a start-sequence token and fed to the decoder using the decoder_input_ids. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the labels. The PAD token is hereby used as the start-sequence token. T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
+# Code to Create The SAMsum LoRA Adapter
 ## Notebook Source
 [Notebook used to create LoRA adapter](https://colab.research.google.com/drive/1z_mZL6CIRRA4AeF6GXe-zpfEGqqdMk-f?usp=sharing)
 TrainOutput(global_step=9210, training_loss=1.1780610539108094, metrics={'train_runtime': 19217.7668, 'train_samples_per_second': 3.833, 'train_steps_per_second': 0.479, 'total_flos': 8.541847343333376e+16, 'train_loss': 1.1780610539108094, 'epoch': 5.0})
+## Save the model to disk, zip, and download
+```
+peft_model_id="flan-t5-large-samsum"
+trainer.model.save_pretrained(peft_model_id)
+tokenizer.save_pretrained(peft_model_id)
+trainer.model.base_model.save_pretrained(peft_model_id)
+!zip -r /content/flan-t5-large-samsum.zip /content/flan-t5-large-samsum
+from google.colab import files
+files.download("/content/flan-t5-large-samsum.zip")
+```
+Upload the contents of that zip file to huggingface