hamishivi commited on
Commit
25372d4
1 Parent(s): 4c7e213

Add model description.

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - bigscience/P3
4
+ language:
5
+ - en
6
+ ---
7
+
8
+ An 11B T5 model trained on the [P3](https://huggingface.co/datasets/bigscience/P3) (T0 split) dataset for 20,000 steps with a batch size of 2048 a maximum input sequence length of 1024, a maximum output sequence length of 256, and the Adafactor optimizer with a constant learning rate of 0.001.
9
+ The model is trained from the [T5 v1.1 lm-adapt checkpoint](https://huggingface.co/google/t5-xl-lm-adapt) and fully finetuned.
10
+
11
+ For more details, see [HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation](https://arxiv.org/abs/2212.10315).
12
+
13
+ Performance on T0 held-out tasks (average accuracy across prompts using rank classification):
14
+
15
+ | Model | ANLI (avg) | HellaSwag | StoryCloze | CB | COPA | RTE | WiC | WSC | WinoGrande | Average |
16
+ |--|--|--|--|--|--|--|--|--|--|--|
17
+ | [T0-11B](https://huggingface.co/bigscience/T0) | 41.0 | 33.6 | 92.4 | 70.1 | 91.5 | 81.0 | 56.1 | 61.1 | 59.9 | 65.2 |
18
+ | hypertask_T0_11B (this model) | 46.8 | 34.1 | 98.2 | 81.2 | 96.6 | 84.0 | 52.1 | 62.6 | 64.8 | 68.9 |