disi-unibo-nlp
/

MedGENIE-fid-flan-t5-base-medqa

@@ -40,4 +40,63 @@ widget:
 ---
 # Model Card for MedGENIE-fid-flan-t5-base-medqa
-MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical domain. Specifically, MedGENIE-fid-flan-t5-base-medqa is a fusion-in-decoder model based on flan-t5-base architecture, trained on the [MedQA-USMLE](https://huggingface.co/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE) dataset augmented with artificially generated contexts from PMC-LLaMA-13B. This model achieves a new state-of-the-art performance over the corresponding test set.

 ---
 # Model Card for MedGENIE-fid-flan-t5-base-medqa
+MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical domain. Specifically, MedGENIE-fid-flan-t5-base-medqa is a fusion-in-decoder model based on flan-t5-base architecture, trained on the [MedQA-USMLE](https://huggingface.co/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE) dataset augmented with artificially generated contexts from PMC-LLaMA-13B. This model achieves a new state-of-the-art performance over the corresponding test set.
+## Model description
+- **Language(s) (NLP):** English
+- **License:** MIT
+- **Finetuned from model:** [google/flan-t5-base](https://huggingface.co/google/flan-t5-base)
+- **Repository:** https://github.com/disi-unibo-nlp/medgenie
+## Performance
+At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOTA model on MedQA-USMLS benchmark:
+| Model                            | Ground (Source)    | Learning                  | Params          | Accuracy                      |
+|----------------------------------|--------------------|---------------------------|-----------------|-------------------------------|
+| **MedGENIE-FID-Flan-T5**         | G (PMC-LLaMA)      | Fine-tuned                | 250M            | **53.1**                      |
+| Codex\tnote{1}                   | &empty;            | 0-zhot                    | 175B            | 52.5                          |
+| Codex\tnote{1}                   | R (Wikipedia)      | 0-shot                    | 175B            | 52.5                          |
+| GPT-3.5-Turbo\tnote{6}           | R (Wikipedia)      | k-shot                    | --              | 52.3                          |
+| MEDITRON\tnote{2}                | &empty;            | Fine-tuned                | 7B              | 52.0                          |
+| Zephyr-$\beta$                   | R (MedWiki)        | 2-shot                    | 7B              | 50.4                          |
+| BioMedGPT\tnote{3}               | &empty;            | k-shot                    | 10B             | 50.4                          |
+| BioMedLM\tnote{4}                | &empty;            | Fine-tuned                | 2.7B            | 50.3                          |
+| PMC-LLaMA\tnote{*}               | &empty;            | Fine-tuned                | 13B             | 50.2                          |
+| LLaMA-2\tnote{2}                 | &empty;            | Fine-tuned                | 7B              | 49.6                          |
+| Zephyr-$\beta$                   | &empty;            | 2-shot                    | 7B              | 49.6                          |
+| Zephyr-$\beta$\tnote{2}          | &empty;            | 3-shot                    | 7B              | 49.2                          |
+| PMC-LLaMA\tnote{2}               | &empty;            | Fine-tuned                | 7B              | 49.2                          |
+| DRAGON\tnote{7}                  | R (UMLS)           | Fine-tuned                | 360M            | 47.5                          |
+| InstructGPT\tnote{1}             | R (Wikipedia)      | 0-shot                    | 175B            | 47.3                          |
+| Flan-PaLM\tnote{4}               | &empty;            | 5-shot                    | 62B             | 46.1                          |
+| InstructGPT\tnote{1}             | &empty;            | 0-shot                    | 175B            | 46.0                          |
+| VOD\tnote{8}                     | R (MedWiki)        | Fine-tuned                | 220M            | 45.8                          |
+| Vicuna 1.3\tnote{1}              | &empty;            | 0-shot                    | 33B             | 45.2                          |
+| BioLinkBERT\tnote{4}             | &empty;            | Fine-tuned                | 340M            | 45.1                          |
+| Mistral-Instruct                 | R (MedWiki)        | 2-shot                    | 7B              | 45.1                          |
+| Galactica                        | &empty;            | 0-shot                    | 120B            | 44.4                          |
+| LLaMA-2\tnote{1}                 | &empty;            | 0-shot                    | 70B             | 43.4                          |
+| BioReader\tnote{9}               | R (PubMed-RCT)     | Fine-tuned                | 230M            | 43.0                          |
+| Guanaco\tnote{1}                 | &empty;            | 0-shot                    | 33B             | 42.9                          |
+| LLaMA-2-chat\tnote{1}            | &empty;            | 0-shot                    | 70B             | 42.3                          |
+| Vicuna 1.5\tnote{1}              | &empty;            | 0-shot                    | 65B             | 41.6                          |
+| Mistral-Instruct\tnote{2}        | &empty;            | 3-shot                    | 7B              | 41.1                          |
+| PaLM\tnote{4}                    | &empty;            | 5-shot                    | 62B             | 40.9                          |
+| Guanaco\tnote{1}                 | &empty;            | 0-shot                    | 65B             | 40.8                          |
+| Falcon-Instruct\tnote{1}         | &empty;            | 0-shot                    | 40B             | 39.0                          |
+| Vicuna 1.3\tnote{1}              | &empty;            | 0-shot                    | 13B             | 38.7                          |
+| GreaseLM\tnote{10}               | R (UMLS)           | Fine-tuned                | 359M            | 38.5                          |
+| PubMedBERT\tnote{4}              | &empty;            | Fine-tuned                | 110M            | 38.1                          |
+| QA-GNN\tnote{11}                 | R (UMLS)           | Fine-tuned                | 360M            | 38.0                          |
+| LLaMA-2\tnote{6}                 | R (Wikipedia)      | k-shot                    | 13B             | 37.6                          |
+| LLaMA-2-chat                     | R (MedWiki)        | 2-shot                    | 7B              | 37.2                          |
+| LLaMA-2-chat                     | &empty;            | 2-shot                    | 7B              | 37.2                          |
+| BioBERT\tnote{5}                 | &empty;            | Fine-tuned                | 110M            | 36.7                          |
+| MTP-Instruct\tnote{1}            | &empty;            | 0-shot                    | 30B             | 35.1                          |
+| GPT-Neo\tnote{4}                 | &empty;            | Fine-tuned                | 2.5B            | 33.3                          |
+| LLaMa-2-chat\tnote{1}            | &empty;            | 0-shot                    | 13B             | 32.2                          |
+| LLaMa-2\tnote{1}                 | &empty;            | 0-shot                    | 13B             | 31.1                          |
+| GPT-NeoX\tnote{1}                | &empty;            | 0-shot                    | 20B             | 26.9                          |