disi-unibo-nlp
/

MedGENIE-fid-flan-t5-base-medqa

@@ -53,50 +53,50 @@ MedGENIE comprises a collection of language models designed to utilize generated
 At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOTA model on MedQA-USMLS benchmark:
-| Model                            | Ground (Source)    | Learning                  | Params          | Accuracy                      |
 |----------------------------------|--------------------|---------------------------|-----------------|-------------------------------|
 | **MedGENIE-FID-Flan-T5**         | G (PMC-LLaMA)      | Fine-tuned                | 250M            | **53.1**                      |
-| Codex\tnote{1}                   | &empty;            | 0-zhot                    | 175B            | 52.5                          |
-| Codex\tnote{1}                   | R (Wikipedia)      | 0-shot                    | 175B            | 52.5                          |
-| GPT-3.5-Turbo\tnote{6}           | R (Wikipedia)      | k-shot                    | --              | 52.3                          |
-| MEDITRON\tnote{2}                | &empty;            | Fine-tuned                | 7B              | 52.0                          |
-| Zephyr-$\beta$                   | R (MedWiki)        | 2-shot                    | 7B              | 50.4                          |
-| BioMedGPT\tnote{3}               | &empty;            | k-shot                    | 10B             | 50.4                          |
-| BioMedLM\tnote{4}                | &empty;            | Fine-tuned                | 2.7B            | 50.3                          |
-| PMC-LLaMA\tnote{*}               | &empty;            | Fine-tuned                | 13B             | 50.2                          |
-| LLaMA-2\tnote{2}                 | &empty;            | Fine-tuned                | 7B              | 49.6                          |
-| Zephyr-$\beta$                   | &empty;            | 2-shot                    | 7B              | 49.6                          |
-| Zephyr-$\beta$\tnote{2}          | &empty;            | 3-shot                    | 7B              | 49.2                          |
-| PMC-LLaMA\tnote{2}               | &empty;            | Fine-tuned                | 7B              | 49.2                          |
-| DRAGON\tnote{7}                  | R (UMLS)           | Fine-tuned                | 360M            | 47.5                          |
-| InstructGPT\tnote{1}             | R (Wikipedia)      | 0-shot                    | 175B            | 47.3                          |
-| Flan-PaLM\tnote{4}               | &empty;            | 5-shot                    | 62B             | 46.1                          |
-| InstructGPT\tnote{1}             | &empty;            | 0-shot                    | 175B            | 46.0                          |
-| VOD\tnote{8}                     | R (MedWiki)        | Fine-tuned                | 220M            | 45.8                          |
-| Vicuna 1.3\tnote{1}              | &empty;            | 0-shot                    | 33B             | 45.2                          |
-| BioLinkBERT\tnote{4}             | &empty;            | Fine-tuned                | 340M            | 45.1                          |
 | Mistral-Instruct                 | R (MedWiki)        | 2-shot                    | 7B              | 45.1                          |
 | Galactica                        | &empty;            | 0-shot                    | 120B            | 44.4                          |
-| LLaMA-2\tnote{1}                 | &empty;            | 0-shot                    | 70B             | 43.4                          |
-| BioReader\tnote{9}               | R (PubMed-RCT)     | Fine-tuned                | 230M            | 43.0                          |
-| Guanaco\tnote{1}                 | &empty;            | 0-shot                    | 33B             | 42.9                          |
-| LLaMA-2-chat\tnote{1}            | &empty;            | 0-shot                    | 70B             | 42.3                          |
-| Vicuna 1.5\tnote{1}              | &empty;            | 0-shot                    | 65B             | 41.6                          |
-| Mistral-Instruct\tnote{2}        | &empty;            | 3-shot                    | 7B              | 41.1                          |
-| PaLM\tnote{4}                    | &empty;            | 5-shot                    | 62B             | 40.9                          |
-| Guanaco\tnote{1}                 | &empty;            | 0-shot                    | 65B             | 40.8                          |
-| Falcon-Instruct\tnote{1}         | &empty;            | 0-shot                    | 40B             | 39.0                          |
-| Vicuna 1.3\tnote{1}              | &empty;            | 0-shot                    | 13B             | 38.7                          |
-| GreaseLM\tnote{10}               | R (UMLS)           | Fine-tuned                | 359M            | 38.5                          |
-| PubMedBERT\tnote{4}              | &empty;            | Fine-tuned                | 110M            | 38.1                          |
-| QA-GNN\tnote{11}                 | R (UMLS)           | Fine-tuned                | 360M            | 38.0                          |
-| LLaMA-2\tnote{6}                 | R (Wikipedia)      | k-shot                    | 13B             | 37.6                          |
 | LLaMA-2-chat                     | R (MedWiki)        | 2-shot                    | 7B              | 37.2                          |
 | LLaMA-2-chat                     | &empty;            | 2-shot                    | 7B              | 37.2                          |
-| BioBERT\tnote{5}                 | &empty;            | Fine-tuned                | 110M            | 36.7                          |
-| MTP-Instruct\tnote{1}            | &empty;            | 0-shot                    | 30B             | 35.1                          |
-| GPT-Neo\tnote{4}                 | &empty;            | Fine-tuned                | 2.5B            | 33.3                          |
-| LLaMa-2-chat\tnote{1}            | &empty;            | 0-shot                    | 13B             | 32.2                          |
-| LLaMa-2\tnote{1}                 | &empty;            | 0-shot                    | 13B             | 31.1                          |
-| GPT-NeoX\tnote{1}                | &empty;            | 0-shot                    | 20B             | 26.9                          |

 At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOTA model on MedQA-USMLS benchmark:
+| Model                            | Ground (Source)    | Learning                  | Params          | Accuracy (&darr;)             |
 |----------------------------------|--------------------|---------------------------|-----------------|-------------------------------|
 | **MedGENIE-FID-Flan-T5**         | G (PMC-LLaMA)      | Fine-tuned                | 250M            | **53.1**                      |
+| Codex <small>(Liévin et al. 2022)</small>                   | &empty;            | 0-zhot                    | 175B            | 52.5                          |
+| Codex <small>(Liévin et al. 2022)</small>                  | R (Wikipedia)      | 0-shot                    | 175B            | 52.5                          |
+| GPT-3.5-Turbo <small>(Yang et al.)</small>           | R (Wikipedia)      | k-shot                    | --              | 52.3                          |
+| MEDITRON <small>(Chen et al.)</small>                | &empty;            | Fine-tuned                | 7B              | 52.0                          |
+| Zephyr-&beta;                   | R (MedWiki)        | 2-shot                    | 7B              | 50.4                          |
+| BioMedGPT <small>(Luo et al.)</small>              | &empty;            | k-shot                    | 10B             | 50.4                          |
+| BioMedLM <small>(Singhal et al.)</small>               | &empty;            | Fine-tuned                | 2.7B            | 50.3                          |
+| PMC-LLaMA (AWQ)              | &empty;            | Fine-tuned                | 13B             | 50.2                          |
+| LLaMA-2 <small>(Chen et al.)</small>              | &empty;            | Fine-tuned                | 7B              | 49.6                          |
+| Zephyr-&beta;                  | &empty;            | 2-shot                    | 7B              | 49.6                          |
+| Zephyr-&beta; <small>(Chen et al.)</small>          | &empty;            | 3-shot                    | 7B              | 49.2                          |
+| PMC-LLaMA <small>(Chen et al.)</small>              | &empty;            | Fine-tuned                | 7B              | 49.2                          |
+| DRAGON <small>(Yasunaga et al.)</small>                  | R (UMLS)           | Fine-tuned                | 360M            | 47.5                          |
+| InstructGPT <small>(Liévin et al.)</small>             | R (Wikipedia)      | 0-shot                    | 175B            | 47.3                          |
+| Flan-PaLM <small>(Singhal et al.)</small>             | &empty;            | 5-shot                    | 62B             | 46.1                          |
+| InstructGPT <small>(Liévin et al.)</small>             | &empty;            | 0-shot                    | 175B            | 46.0                          |
+| VOD <small>(Liévin et al. 2023)</small>                    | R (MedWiki)        | Fine-tuned                | 220M            | 45.8                          |
+| Vicuna 1.3 <small>(Liévin et al.)</small>              | &empty;            | 0-shot                    | 33B             | 45.2                          |
+| BioLinkBERT <small>(Singhal et al.)</small>             | &empty;            | Fine-tuned                | 340M            | 45.1                          |
 | Mistral-Instruct                 | R (MedWiki)        | 2-shot                    | 7B              | 45.1                          |
 | Galactica                        | &empty;            | 0-shot                    | 120B            | 44.4                          |
+| LLaMA-2 <small>(Liévin et al.)</small>                 | &empty;            | 0-shot                    | 70B             | 43.4                          |
+| BioReader <small>(Frison et al.)</small>               | R (PubMed-RCT)     | Fine-tuned                | 230M            | 43.0                          |
+| Guanaco <small>(Liévin et al.)</small>             | &empty;            | 0-shot                    | 33B             | 42.9                          |
+| LLaMA-2-chat <small>(Liévin et al.)</small>          | &empty;            | 0-shot                    | 70B             | 42.3                          |
+| Vicuna 1.5 <small>(Liévin et al.)</small>              | &empty;            | 0-shot                    | 65B             | 41.6                          |
+| Mistral-Instruct <small>(Chen et al.)</small>        | &empty;            | 3-shot                    | 7B              | 41.1                          |
+| PaLM <small>(Singhal et al.)</small>                 | &empty;            | 5-shot                    | 62B             | 40.9                          |
+| Guanaco <small>(Liévin et al.)</small>             | &empty;            | 0-shot                    | 65B             | 40.8                          |
+| Falcon-Instruct <small>(Liévin et al.)</small>         | &empty;            | 0-shot                    | 40B             | 39.0                          |
+| Vicuna 1.3 <small>(Liévin et al.)</small>              | &empty;            | 0-shot                    | 13B             | 38.7                          |
+| GreaseLM <small>(Zhang et al.)</small>              | R (UMLS)           | Fine-tuned                | 359M            | 38.5                          |
+| PubMedBERT <small>(Singhal et al.)</small>              | &empty;            | Fine-tuned                | 110M            | 38.1                          |
+| QA-GNN <small>(Yasunaga et al.)</small>               | R (UMLS)           | Fine-tuned                | 360M            | 38.0                          |
+| LLaMA-2 <small>(Yang et al.)</small>               | R (Wikipedia)      | k-shot                    | 13B             | 37.6                          |
 | LLaMA-2-chat                     | R (MedWiki)        | 2-shot                    | 7B              | 37.2                          |
 | LLaMA-2-chat                     | &empty;            | 2-shot                    | 7B              | 37.2                          |
+| BioBERT <small>(Lee et al.)</small>                 | &empty;            | Fine-tuned                | 110M            | 36.7                          |
+| MTP-Instruct <small>(Liévin et al.)</small>          | &empty;            | 0-shot                    | 30B             | 35.1                          |
+| GPT-Neo <small>(Singhal et al.)</small>                 | &empty;            | Fine-tuned                | 2.5B            | 33.3                          |
+| LLaMa-2-chat <small>(Liévin et al.)</small>         | &empty;            | 0-shot                    | 13B             | 32.2                          |
+| LLaMa-2 <small>(Liévin et al.)</small>                 | &empty;            | 0-shot                    | 13B             | 31.1                          |
+| GPT-NeoX <small>(Liévin et al.) </small>               | &empty;            | 0-shot                    | 20B             | 26.9                          |