Update README.md
Browse files
README.md
CHANGED
@@ -40,4 +40,63 @@ widget:
|
|
40 |
---
|
41 |
# Model Card for MedGENIE-fid-flan-t5-base-medqa
|
42 |
|
43 |
-
MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical domain. Specifically, MedGENIE-fid-flan-t5-base-medqa is a fusion-in-decoder model based on flan-t5-base architecture, trained on the [MedQA-USMLE](https://huggingface.co/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE) dataset augmented with artificially generated contexts from PMC-LLaMA-13B. This model achieves a new state-of-the-art performance over the corresponding test set.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
---
|
41 |
# Model Card for MedGENIE-fid-flan-t5-base-medqa
|
42 |
|
43 |
+
MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical domain. Specifically, MedGENIE-fid-flan-t5-base-medqa is a fusion-in-decoder model based on flan-t5-base architecture, trained on the [MedQA-USMLE](https://huggingface.co/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE) dataset augmented with artificially generated contexts from PMC-LLaMA-13B. This model achieves a new state-of-the-art performance over the corresponding test set.
|
44 |
+
|
45 |
+
## Model description
|
46 |
+
|
47 |
+
- **Language(s) (NLP):** English
|
48 |
+
- **License:** MIT
|
49 |
+
- **Finetuned from model:** [google/flan-t5-base](https://huggingface.co/google/flan-t5-base)
|
50 |
+
- **Repository:** https://github.com/disi-unibo-nlp/medgenie
|
51 |
+
|
52 |
+
## Performance
|
53 |
+
|
54 |
+
At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOTA model on MedQA-USMLS benchmark:
|
55 |
+
|
56 |
+
| Model | Ground (Source) | Learning | Params | Accuracy |
|
57 |
+
|----------------------------------|--------------------|---------------------------|-----------------|-------------------------------|
|
58 |
+
| **MedGENIE-FID-Flan-T5** | G (PMC-LLaMA) | Fine-tuned | 250M | **53.1** |
|
59 |
+
| Codex\tnote{1} | ∅ | 0-zhot | 175B | 52.5 |
|
60 |
+
| Codex\tnote{1} | R (Wikipedia) | 0-shot | 175B | 52.5 |
|
61 |
+
| GPT-3.5-Turbo\tnote{6} | R (Wikipedia) | k-shot | -- | 52.3 |
|
62 |
+
| MEDITRON\tnote{2} | ∅ | Fine-tuned | 7B | 52.0 |
|
63 |
+
| Zephyr-$\beta$ | R (MedWiki) | 2-shot | 7B | 50.4 |
|
64 |
+
| BioMedGPT\tnote{3} | ∅ | k-shot | 10B | 50.4 |
|
65 |
+
| BioMedLM\tnote{4} | ∅ | Fine-tuned | 2.7B | 50.3 |
|
66 |
+
| PMC-LLaMA\tnote{*} | ∅ | Fine-tuned | 13B | 50.2 |
|
67 |
+
| LLaMA-2\tnote{2} | ∅ | Fine-tuned | 7B | 49.6 |
|
68 |
+
| Zephyr-$\beta$ | ∅ | 2-shot | 7B | 49.6 |
|
69 |
+
| Zephyr-$\beta$\tnote{2} | ∅ | 3-shot | 7B | 49.2 |
|
70 |
+
| PMC-LLaMA\tnote{2} | ∅ | Fine-tuned | 7B | 49.2 |
|
71 |
+
| DRAGON\tnote{7} | R (UMLS) | Fine-tuned | 360M | 47.5 |
|
72 |
+
| InstructGPT\tnote{1} | R (Wikipedia) | 0-shot | 175B | 47.3 |
|
73 |
+
| Flan-PaLM\tnote{4} | ∅ | 5-shot | 62B | 46.1 |
|
74 |
+
| InstructGPT\tnote{1} | ∅ | 0-shot | 175B | 46.0 |
|
75 |
+
| VOD\tnote{8} | R (MedWiki) | Fine-tuned | 220M | 45.8 |
|
76 |
+
| Vicuna 1.3\tnote{1} | ∅ | 0-shot | 33B | 45.2 |
|
77 |
+
| BioLinkBERT\tnote{4} | ∅ | Fine-tuned | 340M | 45.1 |
|
78 |
+
| Mistral-Instruct | R (MedWiki) | 2-shot | 7B | 45.1 |
|
79 |
+
| Galactica | ∅ | 0-shot | 120B | 44.4 |
|
80 |
+
| LLaMA-2\tnote{1} | ∅ | 0-shot | 70B | 43.4 |
|
81 |
+
| BioReader\tnote{9} | R (PubMed-RCT) | Fine-tuned | 230M | 43.0 |
|
82 |
+
| Guanaco\tnote{1} | ∅ | 0-shot | 33B | 42.9 |
|
83 |
+
| LLaMA-2-chat\tnote{1} | ∅ | 0-shot | 70B | 42.3 |
|
84 |
+
| Vicuna 1.5\tnote{1} | ∅ | 0-shot | 65B | 41.6 |
|
85 |
+
| Mistral-Instruct\tnote{2} | ∅ | 3-shot | 7B | 41.1 |
|
86 |
+
| PaLM\tnote{4} | ∅ | 5-shot | 62B | 40.9 |
|
87 |
+
| Guanaco\tnote{1} | ∅ | 0-shot | 65B | 40.8 |
|
88 |
+
| Falcon-Instruct\tnote{1} | ∅ | 0-shot | 40B | 39.0 |
|
89 |
+
| Vicuna 1.3\tnote{1} | ∅ | 0-shot | 13B | 38.7 |
|
90 |
+
| GreaseLM\tnote{10} | R (UMLS) | Fine-tuned | 359M | 38.5 |
|
91 |
+
| PubMedBERT\tnote{4} | ∅ | Fine-tuned | 110M | 38.1 |
|
92 |
+
| QA-GNN\tnote{11} | R (UMLS) | Fine-tuned | 360M | 38.0 |
|
93 |
+
| LLaMA-2\tnote{6} | R (Wikipedia) | k-shot | 13B | 37.6 |
|
94 |
+
| LLaMA-2-chat | R (MedWiki) | 2-shot | 7B | 37.2 |
|
95 |
+
| LLaMA-2-chat | ∅ | 2-shot | 7B | 37.2 |
|
96 |
+
| BioBERT\tnote{5} | ∅ | Fine-tuned | 110M | 36.7 |
|
97 |
+
| MTP-Instruct\tnote{1} | ∅ | 0-shot | 30B | 35.1 |
|
98 |
+
| GPT-Neo\tnote{4} | ∅ | Fine-tuned | 2.5B | 33.3 |
|
99 |
+
| LLaMa-2-chat\tnote{1} | ∅ | 0-shot | 13B | 32.2 |
|
100 |
+
| LLaMa-2\tnote{1} | ∅ | 0-shot | 13B | 31.1 |
|
101 |
+
| GPT-NeoX\tnote{1} | ∅ | 0-shot | 20B | 26.9 |
|
102 |
+
|