embellish das README
Browse files
README.md
CHANGED
@@ -64,33 +64,35 @@ inference:
|
|
64 |
|
65 |
---
|
66 |
|
67 |
-
# long-t5-tglobal-base-16384-booksum
|
68 |
|
69 |
-
- summarize long text and get a
|
70 |
- generalizes fairly well to academic & narrative text.
|
71 |
|
72 |
## Cheeky Proof-of-Concept
|
73 |
|
74 |
-
A summary of the [
|
75 |
|
76 |
> The narrator tells the audience that he can kill anyone anywhere in the world with his bare hands, and he has access to all of the United States military's weapons.
|
77 |
|
78 |
|
79 |
## Model description
|
80 |
|
81 |
-
|
|
|
82 |
- between different checkpoints, about 20 epochs in total
|
83 |
- all training was done at 16384 token input / 1024 max output
|
84 |
- early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of 1024 **characters**. This was subsequently caught and adjusted to **1024** tokens, and then trained further for at least five epochs.
|
85 |
|
86 |
## Intended uses & limitations
|
87 |
|
88 |
-
- At time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is
|
89 |
- I plan to update this page with newer checkpoints and post some metrics over time.
|
|
|
90 |
|
91 |
## Training and evaluation data
|
92 |
|
93 |
-
|
94 |
|
95 |
## Training procedure
|
96 |
|
|
|
64 |
|
65 |
---
|
66 |
|
67 |
+
# long-t5-tglobal-base-16384-booksum
|
68 |
|
69 |
+
- summarize long text and get a SparkNotes-esque summary of arbitrary topics!
|
70 |
- generalizes fairly well to academic & narrative text.
|
71 |
|
72 |
## Cheeky Proof-of-Concept
|
73 |
|
74 |
+
A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/navy-seal-copypasta):
|
75 |
|
76 |
> The narrator tells the audience that he can kill anyone anywhere in the world with his bare hands, and he has access to all of the United States military's weapons.
|
77 |
|
78 |
|
79 |
## Model description
|
80 |
|
81 |
+
A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
|
82 |
+
|
83 |
- between different checkpoints, about 20 epochs in total
|
84 |
- all training was done at 16384 token input / 1024 max output
|
85 |
- early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of 1024 **characters**. This was subsequently caught and adjusted to **1024** tokens, and then trained further for at least five epochs.
|
86 |
|
87 |
## Intended uses & limitations
|
88 |
|
89 |
+
- At time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
|
90 |
- I plan to update this page with newer checkpoints and post some metrics over time.
|
91 |
+
- Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset.
|
92 |
|
93 |
## Training and evaluation data
|
94 |
|
95 |
+
`kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
|
96 |
|
97 |
## Training procedure
|
98 |
|