maorivgi
commited on
Commit
•
3d0a008
1
Parent(s):
1463294
initial commit
Browse files- README.md +89 -0
- config.json +9 -0
- tokenizer_config.json +5 -0
README.md
CHANGED
@@ -1,3 +1,92 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
language: en
|
4 |
---
|
5 |
+
|
6 |
+
# T5(v1.1)-SLED (SLiding-Encoder and Decoder, base-sized model)
|
7 |
+
|
8 |
+
SLED models use pretrained, short-range encoder-decoder models, and apply them over
|
9 |
+
long-text inputs by splitting the input into multiple overlapping chunks, encoding each independently and perform fusion-in-decoder
|
10 |
+
|
11 |
+
## Model description
|
12 |
+
|
13 |
+
This SLED model is based on the T5(V1.1) model, which is described in its [model card](https://huggingface.co/google/t5-v1_1-large).
|
14 |
+
|
15 |
+
The developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the T5 model:
|
16 |
+
> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.
|
17 |
+
T5 v1.1 includes several improvments on top of the original checkpoint. see its card for details
|
18 |
+
|
19 |
+
## Intended uses & limitations
|
20 |
+
|
21 |
+
You can use the raw model for text infilling. However, the model is mostly meant to be fine-tuned on a supervised dataset.
|
22 |
+
|
23 |
+
### How to use
|
24 |
+
To use the model, you first need to install `py-sled` in your environment (or clone the code from the [official repository](https://github.com/Mivg/SLED/blob/main/README.md))
|
25 |
+
```
|
26 |
+
pip install py-sled
|
27 |
+
```
|
28 |
+
For more installation instructions, see [here](https://github.com/Mivg/SLED#Installation).
|
29 |
+
|
30 |
+
Once installed, SLED is fully compatible with HuggingFace's AutoClasses (AutoTokenizer, AutoConfig, AutoModel
|
31 |
+
and AutoModelForCausalLM) and can be loaded using the from_pretrained methods
|
32 |
+
```python
|
33 |
+
import sled # *** required so that SledModels will be registered for the AutoClasses ***
|
34 |
+
model = AutoModel.from_pretrained('tau/t5-v1_1-large-sled')
|
35 |
+
```
|
36 |
+
|
37 |
+
Here is how to use this model in PyTorch:
|
38 |
+
|
39 |
+
```python
|
40 |
+
from sled import SledTokenizer, SledModel
|
41 |
+
tokenizer = SledTokenizer.from_pretrained('tau/t5-v1_1-large-sled')
|
42 |
+
model = SledModel.from_pretrained('tau/t5-v1_1-large-sled')
|
43 |
+
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
|
44 |
+
outputs = model(**inputs)
|
45 |
+
last_hidden_states = outputs.last_hidden_state
|
46 |
+
```
|
47 |
+
You can also replace SledModel by SledModelForConditionalGeneration for Seq2Seq generation
|
48 |
+
```python
|
49 |
+
model = SledModelForConditionalGeneration.from_pretrained('tau/t5-v1_1-large-sled')
|
50 |
+
```
|
51 |
+
|
52 |
+
In case you wish to apply SLED on a task containing a prefix (e.g. question) which should be given as a context to
|
53 |
+
every chunk, you can pass the `prefix_length` tensor input as well (A LongTensor in the length of the batch size).
|
54 |
+
```python
|
55 |
+
import torch
|
56 |
+
import sled # *** required so that SledModels will be registered for the AutoClasses ***
|
57 |
+
tokenizer = AutoTokenizer.from_pretrained('tau/t5-v1_1-large-sled')
|
58 |
+
model = AutoModel.from_pretrained('tau/t5-v1_1-large-sled')
|
59 |
+
document_input_ids = tokenizer("Dogs are great for you.", return_tensors="pt").input_ids
|
60 |
+
prefix_input_ids = tokenizer("Are dogs good for you?", return_tensors="pt").input_ids
|
61 |
+
input_ids = torch.cat((prefix_input_ids, document_input_ids), dim=-1)
|
62 |
+
attention_mask = torch.ones_like(input_ids)
|
63 |
+
prefix_length = torch.LongTensor([[prefix_input_ids.size(1)]])
|
64 |
+
|
65 |
+
outputs = model(input_ids=input_ids, attention_mask=attention_mask, prefix_length=prefix_length)
|
66 |
+
last_hidden_states = outputs.last_hidden_state
|
67 |
+
```
|
68 |
+
|
69 |
+
### BibTeX entry and citation info
|
70 |
+
|
71 |
+
Please cite both the SLED [paper](https://arxiv.org/abs/2208.00748.pdf) and the T5 [paper](https://arxiv.org/pdf/1910.10683.pdf) by Raffel et al
|
72 |
+
|
73 |
+
```bibtex
|
74 |
+
@inproceedings{Ivgi2022EfficientLU,
|
75 |
+
title={Efficient Long-Text Understanding with Short-Text Models},
|
76 |
+
author={Maor Ivgi and Uri Shaham and Jonathan Berant},
|
77 |
+
year={2022}
|
78 |
+
}
|
79 |
+
```
|
80 |
+
|
81 |
+
```bibtex
|
82 |
+
@article{2020t5,
|
83 |
+
author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
|
84 |
+
title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
|
85 |
+
journal = {Journal of Machine Learning Research},
|
86 |
+
year = {2020},
|
87 |
+
volume = {21},
|
88 |
+
number = {140},
|
89 |
+
pages = {1-67},
|
90 |
+
url = {http://jmlr.org/papers/v21/20-074.html}
|
91 |
+
}
|
92 |
+
```
|
config.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_type": "tau/sled",
|
3 |
+
"underlying_config": "google/t5-v1_1-large",
|
4 |
+
"context_size": 256,
|
5 |
+
"window_fraction": 0.5,
|
6 |
+
"prepend_prefix": true,
|
7 |
+
"encode_prefix": true,
|
8 |
+
"sliding_method": "dynamic"
|
9 |
+
}
|
tokenizer_config.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"tokenizer_class": "SledTokenizer",
|
3 |
+
"base_tokenizer": "google/t5-v1_1-large",
|
4 |
+
"model_max_length": 16384
|
5 |
+
}
|