afaji commited on
Commit
6cce9c0
1 Parent(s): 43933b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -116
README.md CHANGED
@@ -18,10 +18,68 @@ widget:
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
  should probably proofread and complete it, then remove this comment. -->
20
 
21
- # LaMini-FLAN-T5-Small
22
 
23
  This model is one of our LaMini model series in paper "[LaMini: Distilling Knowledge from Large Language Models]()". This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on [LaMini dataset]() that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository]().
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ## Training Procedure
26
  We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M.
27
 
@@ -41,124 +99,12 @@ The following hyperparameters were used during training:
41
  ## Evaluation
42
  We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
43
 
44
- ## More Models
45
- You can download LaMini model series as follow. Note that not all models are performing as well. More details can be seen in our [paper]().
46
- <details>
47
- <summary> Click to expand </summary>
48
- <table>
49
- <caption>
50
- LaMini Language Models collection.
51
- </caption>
52
- <thead>
53
- <tr>
54
- <th>Name</th>
55
- <th>Architecture</th>
56
- <th>Initialization</th>
57
- </tr>
58
- </thead>
59
- <tbody>
60
- <tr>
61
- <td>LaMini-T5-61M</td>
62
- <td>encoder-decoder</td>
63
- <td>T5-small</td>
64
- </tr>
65
- <tr>
66
- <td>LaMini-T5-223M</td>
67
- <td>encoder-decoder</td>
68
- <td>T5-base</td>
69
- </tr>
70
- <tr>
71
- <td>LaMini-T5-738M</td>
72
- <td>encoder-decoder</td>
73
- <td>T5-large</td>
74
- </tr>
75
- <tr>
76
- <td>LaMini-Flan-T5-77M</td>
77
- <td>encoder-decoder</td>
78
- <td>Flan-T5-small</td>
79
- </tr>
80
- <tr>
81
- <td>LaMini-Flan-T5-248M</td>
82
- <td>encoder-decoder</td>
83
- <td>Flan-T5-base</td>
84
- </tr>
85
- <tr>
86
- <td>LaMini-Flan-T5-783M</td>
87
- <td>encoder-decoder</td>
88
- <td>Flan-T5-large</td>
89
- </tr>
90
- <tr>
91
- <td>LaMini-Cb-111M</td>
92
- <td>decoder-only</td>
93
- <td>Cerebras-GPT-111M</td>
94
- </tr>
95
- <tr>
96
- <td>LaMini-Cb-256M</td>
97
- <td>decoder-only</td>
98
- <td>Cerebras-GPT-256M</td>
99
- </tr>
100
- <tr>
101
- <td>LaMini-Cb-590M</td>
102
- <td>decoder-only</td>
103
- <td>Cerebras-GPT-590M</td>
104
- </tr>
105
- <tr>
106
- <td>LaMini-Cb-1.3B</td>
107
- <td>decoder-only</td>
108
- <td>Cerebras-GPT-1.3B</td>
109
- </tr>
110
- <tr>
111
- <td>LaMini-GPT-124M</td>
112
- <td>decoder-only</td>
113
- <td>GPT-2</td>
114
- </tr>
115
- <tr>
116
- <td>LaMini-GPT-774M</td>
117
- <td>decoder-only</td>
118
- <td>GPT-2 large</td>
119
- </tr>
120
- <tr>
121
- <td>LaMini-GPT-1.5B</td>
122
- <td>decoder-only</td>
123
- <td>GPT-2 xl</td>
124
- </tr>
125
- </tbody>
126
- </table>
127
-
128
- </details>
129
-
130
-
131
  ## Use
132
 
133
  ### Intended use
134
  We recommend to use model to reponse to human instructions wrote in natural language.
135
 
136
  We now show you how to load and use our model using HuggingFace `pipline()`.
137
- ### CPU
138
-
139
- <details>
140
- <summary> Click to expand </summary>
141
-
142
- ```python
143
- # pip install -q transformers
144
- from transformers import pipeline
145
-
146
- checkpoint = "{model_name}"
147
-
148
- model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True)
149
-
150
- input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
151
- generated_text = generator(input_prompt, max_length=512, do_sample=True, repetition_penalty=1.5)[0]['generated_text']
152
-
153
- print("Response": generated_text)
154
- ```
155
-
156
- </details>
157
-
158
- ### GPU
159
-
160
- <details>
161
- <summary> Click to expand </summary>
162
 
163
  ```python
164
  # pip install -q transformers
@@ -169,13 +115,11 @@ checkpoint = "{model_name}"
169
  model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True, device=0)
170
 
171
  input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
172
- generated_text = generator(input_prompt, max_length=512, do_sample=True, repetition_penalty=1.5)[0]['generated_text']
173
 
174
  print("Response": generated_text)
175
  ```
176
 
177
- </details>
178
-
179
  ## Limitations
180
 
181
  More information needed
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
  should probably proofread and complete it, then remove this comment. -->
20
 
21
+ # LaMini-FLAN-T5-77M
22
 
23
  This model is one of our LaMini model series in paper "[LaMini: Distilling Knowledge from Large Language Models]()". This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on [LaMini dataset]() that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository]().
24
 
25
+ You can view other LaMini model series as follow. Note that not all models are performing as well. Models with ✩ are those with the best overall performance given their size/architecture. More details can be seen in our paper.
26
+
27
+ <table>
28
+ <thead>
29
+ <tr>
30
+ <th>Base model</th>
31
+ <th colspan="4">LaMini series (#parameters)</th>
32
+ </tr>
33
+ </thead>
34
+ <tbody>
35
+ <tr>
36
+ <td>T5</td>
37
+ <td>LaMini-T5-61M</td>
38
+ <td>LaMini-T5-223M</td>
39
+ <td>LaMini-T5-738M</td>
40
+ <td></td>
41
+ </tr>
42
+ <tr>
43
+ <td>Flan-T5</td>
44
+ <td>LaMini-Flan-T5-77M</td>
45
+ <td>LaMini-Flan-T5-248M</td>
46
+ <td>LaMini-Flan-T5-783M</td>
47
+ <td></td>
48
+ </tr>
49
+ <tr>
50
+ <td>Cerebras-GPT</td>
51
+ <td>LaMini-Cerebras-111M</td>
52
+ <td>LaMini-Cerebras-256M</td>
53
+ <td>LaMini-Cerebras-590M</td>
54
+ <td>LaMini-Cerebras-1.3B</td>
55
+ </tr>
56
+ <tr>
57
+ <td>GPT-2</td>
58
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-124m" target="_blank" rel="noopener noreferrer">LaMini-GPT-124M</a></td>
59
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-774m" target="_blank" rel="noopener noreferrer">LaMini-GPT-774M</a></td>
60
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-1.5b" target="_blank" rel="noopener noreferrer">LaMini-GPT-1.5B</a></td>
61
+ <td></td>
62
+ </tr>
63
+ <tr>
64
+ <td>GPT-Neo</td>
65
+ <td>LaMini-Neo-125M</td>
66
+ <td>LaMini-Neo-1.3B</td>
67
+ <td></td>
68
+ <td></td>
69
+ </tr>
70
+ <tr>
71
+ <td>GPT-J</td>
72
+ <td colspan="4">coming soon</td>
73
+ </tr>
74
+ <tr>
75
+ <td>LLaMA</td>
76
+ <td colspan="4">coming soon</td>
77
+ </tr>
78
+
79
+
80
+ </tbody>
81
+ </table>
82
+
83
  ## Training Procedure
84
  We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M.
85
 
 
99
  ## Evaluation
100
  We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
101
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  ## Use
103
 
104
  ### Intended use
105
  We recommend to use model to reponse to human instructions wrote in natural language.
106
 
107
  We now show you how to load and use our model using HuggingFace `pipline()`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
  ```python
110
  # pip install -q transformers
 
115
  model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True, device=0)
116
 
117
  input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
118
+ generated_text = generator(input_prompt, max_length=512, do_sample=True)[0]['generated_text']
119
 
120
  print("Response": generated_text)
121
  ```
122
 
 
 
123
  ## Limitations
124
 
125
  More information needed