File size: 6,199 Bytes
26b46c0 fc743ec 26b46c0 d069798 fc743ec 33efdd5 5dc088a b5c1274 5b28de0 041c287 1776656 497b810 7068ade 76efe2f 2d7e780 4946b33 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 2d7e780 76efe2f 51cc6c6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
---
license: llama2
datasets:
- Universal-NER/Pile-NER-type
language:
- en
pipeline_tag: text-generation
---
# SLIMER: Show Less Instruct More Entity Recognition
SLIMER is an instruction-tuned LLM for zero-shot NER.
Instructed on a reduced number of samples, it is designed to tackle never-seen-before Named Entity tags by leveraging a prompt enriched with a DEFINITION and GUIDELINES for the NE to be extracted.
<img src="https://huggingface.co/expertai/SLIMER/resolve/main/SLIMER_instruction_prompt.png" width="200">
Currently existing approaches fine-tune on an extensive number of entity classes (around 13K) and assess zero-shot NER capabilities on Out-Of-Distribution input domains.
SLIMER performs comparably to these state-of-the-art models on OOD input domains, while being trained only a reduced number of samples and a set of NE tags that overlap in lesser degree with test set.
We extend the standard zero-shot evaluations on BUSTER, which is characterized by financial entities that are rather far from the more traditional tags observed by all models during training.
An inverse trend can be observed, with SLIMER emerging as the most effective in dealing with these unseen labels, thanks to its lighter instruction tuning methodology and the use of definition and guidelines.
<table>
<thead>
<tr>
<th>Model</th>
<th>Backbone</th>
<th>#Params</th>
<th colspan="2">MIT</th>
<th colspan="5">CrossNER</th>
<th>BUSTER</th>
<th>AVG</th>
</tr>
<tr>
<th></th>
<th></th>
<th></th>
<th>Movie</th>
<th>Restaurant</th>
<th>AI</th>
<th>Literature</th>
<th>Music</th>
<th>Politics</th>
<th>Science</th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>ChatGPT</td>
<td>gpt-3.5-turbo</td>
<td>-</td>
<td>5.3</td>
<td>32.8</td>
<td>52.4</td>
<td>39.8</td>
<td>66.6</td>
<td>68.5</td>
<td>67.0</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>InstructUIE</td>
<td>Flan-T5-xxl</td>
<td>11B</td>
<td>63.0</td>
<td>21.0</td>
<td>49.0</td>
<td>47.2</td>
<td>53.2</td>
<td>48.2</td>
<td>49.3</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>UniNER-type</td>
<td>LLaMA-1</td>
<td>7B</td>
<td>42.4</td>
<td>31.7</td>
<td>53.5</td>
<td>59.4</td>
<td>65.0</td>
<td>60.8</td>
<td>61.1</td>
<td>34.8</td>
<td>51.1</td>
</tr>
<tr>
<td>UniNER-def</td>
<td>LLaMA-1</td>
<td>7B</td>
<td>27.1</td>
<td>27.9</td>
<td>44.5</td>
<td>49.2</td>
<td>55.8</td>
<td>57.5</td>
<td>52.9</td>
<td>33.6</td>
<td>43.6</td>
</tr>
<tr>
<td>UniNER-type+sup.</td>
<td>LLaMA-1</td>
<td>7B</td>
<td>61.2</td>
<td>35.2</td>
<td>62.9</td>
<td>64.9</td>
<td>70.6</td>
<td>66.9</td>
<td>70.8</td>
<td>37.8</td>
<td>58.8</td>
</tr>
<tr>
<td>GoLLIE</td>
<td>Code-LLaMA</td>
<td>7B</td>
<td>63.0</td>
<td>43.4</td>
<td>59.1</td>
<td>62.7</td>
<td>67.8</td>
<td>57.2</td>
<td>55.5</td>
<td>27.7</td>
<td>54.6</td>
</tr>
<tr>
<td>GLiNER-L</td>
<td>DeBERTa-v3</td>
<td>0.3B</td>
<td>57.2</td>
<td>42.9</td>
<td>57.2</td>
<td>64.4</td>
<td>69.6</td>
<td>72.6</td>
<td>62.6</td>
<td>26.6</td>
<td>56.6</td>
</tr>
<tr>
<td>GNER-T5</td>
<td>Flan-T5-xxl</td>
<td>11B</td>
<td>62.5</td>
<td>51.0</td>
<td>68.2</td>
<td>68.7</td>
<td>81.2</td>
<td>75.1</td>
<td>76.7</td>
<td>27.9</td>
<td>63.9</td>
</tr>
<tr>
<td>GNER-LLaMA</td>
<td>LLaMA-1</td>
<td>7B</td>
<td>68.6</td>
<td>47.5</td>
<td>63.1</td>
<td>68.2</td>
<td>75.7</td>
<td>69.4</td>
<td>69.9</td>
<td>23.6</td>
<td>60.8</td>
</tr>
<tr>
<td>SLIMER w/o D&G</td>
<td>LLaMA-2-chat</td>
<td>7B</td>
<td>46.4</td>
<td>36.3</td>
<td>49.6</td>
<td>58.4</td>
<td>56.8</td>
<td>57.9</td>
<td>53.8</td>
<td>40.4</td>
<td>49.9</td>
</tr>
<tr>
<td><b>SLIMER</b></td>
<td><b>LLaMA-2-chat</b></td>
<td><b>7B</b></td>
<td><b>50.9</b></td>
<td><b>38.2</b></td>
<td><b>50.1</b></td>
<td><b>58.7</b></td>
<td><b>60.0</b></td>
<td><b>63.9</b></td>
<td><b>56.3</b></td>
<td><b>45.3</b></td>
<td><b>52.9</b></td>
</tr>
</tbody>
</table>
```python
from vllm import LLM, SamplingParams
vllm_model = LLM(model="expertai/SLIMER")
sampling_params = SamplingParams(temperature=0, max_tokens=128, stop=['</s>'])
prompts = [prompter.generate_prompt(instruction, input) for instruction, input in instruction_input_pairs]
responses = vllm_model.generate(prompts, sampling_params)
``` |