jasonmcaffee
commited on
Commit
•
77aa390
1
Parent(s):
03563b3
Update README.md
Browse files
README.md
CHANGED
@@ -4,6 +4,8 @@ license: mit
|
|
4 |
# Overview
|
5 |
This model has been fine tuned for text summary creation, and was created using LoRA to fine tunethe flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum).
|
6 |
|
|
|
|
|
7 |
## SAMsum
|
8 |
SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.
|
9 |
|
@@ -163,4 +165,62 @@ from google.colab import files
|
|
163 |
files.download("/content/flan-t5-large-samsum.zip")
|
164 |
```
|
165 |
|
166 |
-
Upload the contents of that zip file to huggingface
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
# Overview
|
5 |
This model has been fine tuned for text summary creation, and was created using LoRA to fine tunethe flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum).
|
6 |
|
7 |
+
This document will explain how the model was fine tuned, saved to disk, added to Hugging Face, and then demonstrate how it is used.
|
8 |
+
|
9 |
## SAMsum
|
10 |
SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.
|
11 |
|
|
|
165 |
files.download("/content/flan-t5-large-samsum.zip")
|
166 |
```
|
167 |
|
168 |
+
Upload the contents of that zip file to huggingface
|
169 |
+
|
170 |
+
# Code to utilize the fine tuned model
|
171 |
+
|
172 |
+
## Notebook Source
|
173 |
+
[Notebook using the Hugging Face hosted moded](https://colab.research.google.com/drive/1kqADOA9vaTsdecx4u-7XWJJia62WV0cY?pli=1#scrollTo=KMs70mdIxaam)
|
174 |
+
|
175 |
+
## Load the model, tokenizer, and LoRA adapter (PEFT)
|
176 |
+
|
177 |
+
```
|
178 |
+
# Load the jasonmcaffee/flan-t5-large-samsum model and tokenizer
|
179 |
+
import torch
|
180 |
+
from peft import PeftModel, PeftConfig
|
181 |
+
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
182 |
+
|
183 |
+
peft_model_id = "jasonmcaffee/flan-t5-large-samsum"
|
184 |
+
config = PeftConfig.from_pretrained(peft_model_id)
|
185 |
+
|
186 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, load_in_8bit=True, device_map="auto")
|
187 |
+
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
188 |
+
|
189 |
+
# Load the LoRA adapter
|
190 |
+
model = PeftModel.from_pretrained(model, peft_model_id, device_map="auto")
|
191 |
+
model.eval()
|
192 |
+
```
|
193 |
+
|
194 |
+
## Have the model summarize text!
|
195 |
+
Finally, we now have a model that is capable of summarizing text for us.
|
196 |
+
|
197 |
+
Summarization takes ~30 seconds.
|
198 |
+
```
|
199 |
+
dialogue = """The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
|
200 |
+
"""
|
201 |
+
|
202 |
+
input_ids = tokenizer(dialogue, return_tensors="pt", truncation=True).input_ids.cuda()
|
203 |
+
# with torch.inference_mode():
|
204 |
+
outputs = model.generate(
|
205 |
+
input_ids=input_ids,
|
206 |
+
min_length=20,
|
207 |
+
max_new_tokens=100,
|
208 |
+
length_penalty=1.9, #Exponential penalty to the length that is used with beam-based generation. It is applied as an exponent to the sequence length, which in turn is used to divide the score of the sequence. Since the score is the log likelihood of the sequence (i.e. negative), length_penalty > 0.0 promotes longer sequences, while length_penalty < 0.0 encourages shorter sequences.
|
209 |
+
num_beams=4,
|
210 |
+
temperature=0.9,
|
211 |
+
top_k=150, # default 50
|
212 |
+
repetition_penalty=2.1,
|
213 |
+
# do_sample=True,
|
214 |
+
top_p=0.9,
|
215 |
+
)
|
216 |
+
print(f"input sentence: {dialogue}\n{'---'* 20}")
|
217 |
+
|
218 |
+
summarization = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
|
219 |
+
|
220 |
+
print(f"summary:\n{summarization}")
|
221 |
+
```
|
222 |
+
Prints:
|
223 |
+
> The Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
|
224 |
+
|
225 |
+
The notebook also loads the flan-t5 with no SAMsum training, which produces a summary of:
|
226 |
+
> The Eiffel Tower is the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930.
|