Update README.md
Browse files
README.md
CHANGED
@@ -6,10 +6,11 @@ language:
|
|
6 |
pipeline_tag: text2text-generation
|
7 |
tags:
|
8 |
- t5x
|
9 |
-
-
|
10 |
---
|
11 |
|
12 |
Pile-T5 XL is an Encoder-Decoder model trained on [the Pile](https://pile.eleuther.ai/) using the [T5x](https://github.com/google-research/t5x) library. The model was trained for 2 million steps or roughly 2 trillion tokens using MLM-objective similar to the original T5 model.
|
|
|
13 |
|
14 |
### Model Details
|
15 |
|
@@ -30,7 +31,7 @@ ai](mailto:contact@eleuther.ai).
|
|
30 |
|
31 |
| Hyperparameter | Value |
|
32 |
| -------------------------- | ----------- |
|
33 |
-
| n<sub>parameters</sub> |
|
34 |
| n<sub>encoder layers</sub> | 24 |
|
35 |
| n<sub>decoder layers</sub> | 24 |
|
36 |
| d<sub>model</sub> | 5120 |
|
@@ -133,16 +134,18 @@ checkpoints that can be used for finetuning with the T5x library, refer to [here
|
|
133 |
|
134 |
### Evaluations
|
135 |
|
136 |
-
|
|
|
137 |
|
138 |
### BibTeX
|
139 |
|
140 |
```
|
141 |
-
@
|
142 |
author = {Lintang Sutawika and Aran Komatsuzaki and Colin Raffel},
|
143 |
-
title = {Pile
|
144 |
year = {2024},
|
145 |
-
url = {}
|
|
|
146 |
}
|
147 |
```
|
148 |
|
|
|
6 |
pipeline_tag: text2text-generation
|
7 |
tags:
|
8 |
- t5x
|
9 |
+
- encoder-decoder
|
10 |
---
|
11 |
|
12 |
Pile-T5 XL is an Encoder-Decoder model trained on [the Pile](https://pile.eleuther.ai/) using the [T5x](https://github.com/google-research/t5x) library. The model was trained for 2 million steps or roughly 2 trillion tokens using MLM-objective similar to the original T5 model.
|
13 |
+
The HF version of Pile-T5 XL borrows UMT5's model implementation as it uses scalable model implementation from T5x and uses `LlamaTokenizer`.
|
14 |
|
15 |
### Model Details
|
16 |
|
|
|
31 |
|
32 |
| Hyperparameter | Value |
|
33 |
| -------------------------- | ----------- |
|
34 |
+
| n<sub>parameters</sub> | 2849804288 |
|
35 |
| n<sub>encoder layers</sub> | 24 |
|
36 |
| n<sub>decoder layers</sub> | 24 |
|
37 |
| d<sub>model</sub> | 5120 |
|
|
|
134 |
|
135 |
### Evaluations
|
136 |
|
137 |
+
Pile-T5 XL was evaluated on SuperGLUE, CodeXGLUE. A Flan-finetuned version was evaluated on Flan Held In tasks, MMLU and BBH.
|
138 |
+
Results can be seen in the [blogpost](https://blog.eleuther.ai/pile-t5/)
|
139 |
|
140 |
### BibTeX
|
141 |
|
142 |
```
|
143 |
+
@misc{2024PileT5,
|
144 |
author = {Lintang Sutawika and Aran Komatsuzaki and Colin Raffel},
|
145 |
+
title = {Pile-T5},
|
146 |
year = {2024},
|
147 |
+
url = {https://blog.eleuther.ai/pile-t5/},
|
148 |
+
note = {Blog post},
|
149 |
}
|
150 |
```
|
151 |
|