andrewzamai commited on
Commit
2d7e780
1 Parent(s): 76efe2f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -28
README.md CHANGED
@@ -18,10 +18,8 @@ Instructed on a reduced number of samples, it is designed to tackle never-seen-b
18
  Currently existing approaches fine-tune on an extensive number of entity classes (around 13K) and assess zero-shot NER capabilities on Out-Of-Distribution input domains.
19
  SLIMER performs comparably to these state-of-the-art models on OOD input domains, while being trained only a reduced number of samples and a set of NE tags that overlap in lesser degree with test set.
20
 
21
- <img src="https://huggingface.co/expertai/SLIMER/resolve/main/OOD_evals.png">
22
-
23
  We extend the standard zero-shot evaluations on BUSTER, which is characterized by financial entities that are rather far from the more traditional tags observed by all models during training.
24
- An inverse trend can be observed, with SLIMER instead emerging as the most effective in dealing with these unseen labels, thanks to its lighter instruction tuning methodology and the use of definition and guidelines.
25
 
26
  <table>
27
  <thead>
@@ -31,6 +29,7 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
31
  <th>#Params</th>
32
  <th colspan="2">MIT</th>
33
  <th colspan="5">CrossNER</th>
 
34
  <th>AVG</th>
35
  </tr>
36
  <tr>
@@ -45,6 +44,7 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
45
  <th>Politics</th>
46
  <th>Science</th>
47
  <th></th>
 
48
  </tr>
49
  </thead>
50
  <tbody>
@@ -59,7 +59,8 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
59
  <td>66.6</td>
60
  <td>68.5</td>
61
  <td>67.0</td>
62
- <td>47.5</td>
 
63
  </tr>
64
  <tr>
65
  <td>InstructUIE</td>
@@ -72,7 +73,8 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
72
  <td>53.2</td>
73
  <td>48.2</td>
74
  <td>49.3</td>
75
- <td>47.3</td>
 
76
  </tr>
77
  <tr>
78
  <td>UniNER-type</td>
@@ -85,7 +87,8 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
85
  <td>65.0</td>
86
  <td>60.8</td>
87
  <td>61.1</td>
88
- <td>53.4</td>
 
89
  </tr>
90
  <tr>
91
  <td>UniNER-def</td>
@@ -98,7 +101,8 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
98
  <td>55.8</td>
99
  <td>57.5</td>
100
  <td>52.9</td>
101
- <td>45.0</td>
 
102
  </tr>
103
  <tr>
104
  <td>UniNER-type+sup.</td>
@@ -111,7 +115,8 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
111
  <td>70.6</td>
112
  <td>66.9</td>
113
  <td>70.8</td>
114
- <td>61.8</td>
 
115
  </tr>
116
  <tr>
117
  <td>GoLLIE</td>
@@ -124,7 +129,8 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
124
  <td>67.8</td>
125
  <td>57.2</td>
126
  <td>55.5</td>
127
- <td>58.4</td>
 
128
  </tr>
129
  <tr>
130
  <td>GLiNER-L</td>
@@ -137,7 +143,8 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
137
  <td>69.6</td>
138
  <td>72.6</td>
139
  <td>62.6</td>
140
- <td>60.9</td>
 
141
  </tr>
142
  <tr>
143
  <td>GNER-T5</td>
@@ -150,7 +157,8 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
150
  <td>81.2</td>
151
  <td>75.1</td>
152
  <td>76.7</td>
153
- <td>69.1</td>
 
154
  </tr>
155
  <tr>
156
  <td>GNER-LLaMA</td>
@@ -163,33 +171,36 @@ An inverse trend can be observed, with SLIMER instead emerging as the most effec
163
  <td>75.7</td>
164
  <td>69.4</td>
165
  <td>69.9</td>
166
- <td>66.1</td>
 
167
  </tr>
168
  <tr>
169
  <td>SLIMER w/o D&amp;G</td>
170
  <td>LLaMA-2-chat</td>
171
  <td>7B</td>
172
- <td>46.4 &plusmn; 1.8</td>
173
- <td>36.3 &plusmn; 2.1</td>
174
- <td>49.6 &plusmn; 3.2</td>
175
- <td>58.4 &plusmn; 1.7</td>
176
- <td>56.8 &plusmn; 2.1</td>
177
- <td>57.9 &plusmn; 2.1</td>
178
- <td>53.8 &plusmn; 1.7</td>
179
- <td>51.3 &plusmn; 2.0</td>
 
180
  </tr>
181
  <tr>
182
  <td><b>SLIMER</b></td>
183
  <td><b>LLaMA-2-chat</b></td>
184
  <td><b>7B</b></td>
185
- <td><b>50.9 &plusmn; 0.9</b></td>
186
- <td><b>38.2 &plusmn; 0.3</b></td>
187
- <td><b>50.1 &plusmn; 2.4</b></td>
188
- <td><b>58.7 &plusmn; 0.2</b></td>
189
- <td><b>60.0 &plusmn; 0.5</b></td>
190
- <td><b>63.9 &plusmn; 1.0</b></td>
191
- <td><b>56.3 &plusmn; 0.6</b></td>
192
- <td><b>54.0 &plusmn; 0.5</b></td>
 
193
  </tr>
194
  </tbody>
195
  </table>
 
18
  Currently existing approaches fine-tune on an extensive number of entity classes (around 13K) and assess zero-shot NER capabilities on Out-Of-Distribution input domains.
19
  SLIMER performs comparably to these state-of-the-art models on OOD input domains, while being trained only a reduced number of samples and a set of NE tags that overlap in lesser degree with test set.
20
 
 
 
21
  We extend the standard zero-shot evaluations on BUSTER, which is characterized by financial entities that are rather far from the more traditional tags observed by all models during training.
22
+ An inverse trend can be observed, with SLIMER emerging as the most effective in dealing with these unseen labels, thanks to its lighter instruction tuning methodology and the use of definition and guidelines.
23
 
24
  <table>
25
  <thead>
 
29
  <th>#Params</th>
30
  <th colspan="2">MIT</th>
31
  <th colspan="5">CrossNER</th>
32
+ <th>BUSTER</th>
33
  <th>AVG</th>
34
  </tr>
35
  <tr>
 
44
  <th>Politics</th>
45
  <th>Science</th>
46
  <th></th>
47
+ <th></th>
48
  </tr>
49
  </thead>
50
  <tbody>
 
59
  <td>66.6</td>
60
  <td>68.5</td>
61
  <td>67.0</td>
62
+ <td>-</td>
63
+ <td>-</td>
64
  </tr>
65
  <tr>
66
  <td>InstructUIE</td>
 
73
  <td>53.2</td>
74
  <td>48.2</td>
75
  <td>49.3</td>
76
+ <td>-</td>
77
+ <td>-</td>
78
  </tr>
79
  <tr>
80
  <td>UniNER-type</td>
 
87
  <td>65.0</td>
88
  <td>60.8</td>
89
  <td>61.1</td>
90
+ <td>34.8</td>
91
+ <td>51.1</td>
92
  </tr>
93
  <tr>
94
  <td>UniNER-def</td>
 
101
  <td>55.8</td>
102
  <td>57.5</td>
103
  <td>52.9</td>
104
+ <td>33.6</td>
105
+ <td>43.6</td>
106
  </tr>
107
  <tr>
108
  <td>UniNER-type+sup.</td>
 
115
  <td>70.6</td>
116
  <td>66.9</td>
117
  <td>70.8</td>
118
+ <td>37.8</td>
119
+ <td>58.8</td>
120
  </tr>
121
  <tr>
122
  <td>GoLLIE</td>
 
129
  <td>67.8</td>
130
  <td>57.2</td>
131
  <td>55.5</td>
132
+ <td>27.7</td>
133
+ <td>54.6</td>
134
  </tr>
135
  <tr>
136
  <td>GLiNER-L</td>
 
143
  <td>69.6</td>
144
  <td>72.6</td>
145
  <td>62.6</td>
146
+ <td>26.6</td>
147
+ <td>56.6</td>
148
  </tr>
149
  <tr>
150
  <td>GNER-T5</td>
 
157
  <td>81.2</td>
158
  <td>75.1</td>
159
  <td>76.7</td>
160
+ <td>27.9</td>
161
+ <td>63.9</td>
162
  </tr>
163
  <tr>
164
  <td>GNER-LLaMA</td>
 
171
  <td>75.7</td>
172
  <td>69.4</td>
173
  <td>69.9</td>
174
+ <td>23.6</td>
175
+ <td>60.8</td>
176
  </tr>
177
  <tr>
178
  <td>SLIMER w/o D&amp;G</td>
179
  <td>LLaMA-2-chat</td>
180
  <td>7B</td>
181
+ <td>46.4</td>
182
+ <td>36.3</td>
183
+ <td>49.6</td>
184
+ <td>58.4</td>
185
+ <td>56.8</td>
186
+ <td>57.9</td>
187
+ <td>53.8</td>
188
+ <td>40.4</td>
189
+ <td>49.9</td>
190
  </tr>
191
  <tr>
192
  <td><b>SLIMER</b></td>
193
  <td><b>LLaMA-2-chat</b></td>
194
  <td><b>7B</b></td>
195
+ <td><b>50.9</b></td>
196
+ <td><b>38.2</b></td>
197
+ <td><b>50.1</b></td>
198
+ <td><b>58.7</b></td>
199
+ <td><b>60.0</b></td>
200
+ <td><b>63.9</b></td>
201
+ <td><b>56.3</b></td>
202
+ <td><b>45.3</b></td>
203
+ <td><b>52.9</b></td>
204
  </tr>
205
  </tbody>
206
  </table>