Update README.md
Browse files
README.md
CHANGED
@@ -53,50 +53,50 @@ MedGENIE comprises a collection of language models designed to utilize generated
|
|
53 |
|
54 |
At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOTA model on MedQA-USMLS benchmark:
|
55 |
|
56 |
-
| Model | Ground (Source) | Learning | Params | Accuracy
|
57 |
|----------------------------------|--------------------|---------------------------|-----------------|-------------------------------|
|
58 |
| **MedGENIE-FID-Flan-T5** | G (PMC-LLaMA) | Fine-tuned | 250M | **53.1** |
|
59 |
-
| Codex
|
60 |
-
| Codex
|
61 |
-
| GPT-3.5-Turbo
|
62 |
-
| MEDITRON
|
63 |
-
| Zephyr
|
64 |
-
| BioMedGPT
|
65 |
-
| BioMedLM
|
66 |
-
| PMC-LLaMA
|
67 |
-
| LLaMA-2
|
68 |
-
| Zephyr
|
69 |
-
| Zephyr
|
70 |
-
| PMC-LLaMA
|
71 |
-
| DRAGON
|
72 |
-
| InstructGPT
|
73 |
-
| Flan-PaLM
|
74 |
-
| InstructGPT
|
75 |
-
| VOD
|
76 |
-
| Vicuna 1.3
|
77 |
-
| BioLinkBERT
|
78 |
| Mistral-Instruct | R (MedWiki) | 2-shot | 7B | 45.1 |
|
79 |
| Galactica | ∅ | 0-shot | 120B | 44.4 |
|
80 |
-
| LLaMA-2
|
81 |
-
| BioReader
|
82 |
-
| Guanaco
|
83 |
-
| LLaMA-2-chat
|
84 |
-
| Vicuna 1.5
|
85 |
-
| Mistral-Instruct
|
86 |
-
| PaLM
|
87 |
-
| Guanaco
|
88 |
-
| Falcon-Instruct
|
89 |
-
| Vicuna 1.3
|
90 |
-
| GreaseLM
|
91 |
-
| PubMedBERT
|
92 |
-
| QA-GNN
|
93 |
-
| LLaMA-2
|
94 |
| LLaMA-2-chat | R (MedWiki) | 2-shot | 7B | 37.2 |
|
95 |
| LLaMA-2-chat | ∅ | 2-shot | 7B | 37.2 |
|
96 |
-
| BioBERT
|
97 |
-
| MTP-Instruct
|
98 |
-
| GPT-Neo
|
99 |
-
| LLaMa-2-chat
|
100 |
-
| LLaMa-2
|
101 |
-
| GPT-NeoX
|
102 |
|
|
|
53 |
|
54 |
At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOTA model on MedQA-USMLS benchmark:
|
55 |
|
56 |
+
| Model | Ground (Source) | Learning | Params | Accuracy (↓) |
|
57 |
|----------------------------------|--------------------|---------------------------|-----------------|-------------------------------|
|
58 |
| **MedGENIE-FID-Flan-T5** | G (PMC-LLaMA) | Fine-tuned | 250M | **53.1** |
|
59 |
+
| Codex <small>(Liévin et al. 2022)</small> | ∅ | 0-zhot | 175B | 52.5 |
|
60 |
+
| Codex <small>(Liévin et al. 2022)</small> | R (Wikipedia) | 0-shot | 175B | 52.5 |
|
61 |
+
| GPT-3.5-Turbo <small>(Yang et al.)</small> | R (Wikipedia) | k-shot | -- | 52.3 |
|
62 |
+
| MEDITRON <small>(Chen et al.)</small> | ∅ | Fine-tuned | 7B | 52.0 |
|
63 |
+
| Zephyr-β | R (MedWiki) | 2-shot | 7B | 50.4 |
|
64 |
+
| BioMedGPT <small>(Luo et al.)</small> | ∅ | k-shot | 10B | 50.4 |
|
65 |
+
| BioMedLM <small>(Singhal et al.)</small> | ∅ | Fine-tuned | 2.7B | 50.3 |
|
66 |
+
| PMC-LLaMA (AWQ) | ∅ | Fine-tuned | 13B | 50.2 |
|
67 |
+
| LLaMA-2 <small>(Chen et al.)</small> | ∅ | Fine-tuned | 7B | 49.6 |
|
68 |
+
| Zephyr-β | ∅ | 2-shot | 7B | 49.6 |
|
69 |
+
| Zephyr-β <small>(Chen et al.)</small> | ∅ | 3-shot | 7B | 49.2 |
|
70 |
+
| PMC-LLaMA <small>(Chen et al.)</small> | ∅ | Fine-tuned | 7B | 49.2 |
|
71 |
+
| DRAGON <small>(Yasunaga et al.)</small> | R (UMLS) | Fine-tuned | 360M | 47.5 |
|
72 |
+
| InstructGPT <small>(Liévin et al.)</small> | R (Wikipedia) | 0-shot | 175B | 47.3 |
|
73 |
+
| Flan-PaLM <small>(Singhal et al.)</small> | ∅ | 5-shot | 62B | 46.1 |
|
74 |
+
| InstructGPT <small>(Liévin et al.)</small> | ∅ | 0-shot | 175B | 46.0 |
|
75 |
+
| VOD <small>(Liévin et al. 2023)</small> | R (MedWiki) | Fine-tuned | 220M | 45.8 |
|
76 |
+
| Vicuna 1.3 <small>(Liévin et al.)</small> | ∅ | 0-shot | 33B | 45.2 |
|
77 |
+
| BioLinkBERT <small>(Singhal et al.)</small> | ∅ | Fine-tuned | 340M | 45.1 |
|
78 |
| Mistral-Instruct | R (MedWiki) | 2-shot | 7B | 45.1 |
|
79 |
| Galactica | ∅ | 0-shot | 120B | 44.4 |
|
80 |
+
| LLaMA-2 <small>(Liévin et al.)</small> | ∅ | 0-shot | 70B | 43.4 |
|
81 |
+
| BioReader <small>(Frison et al.)</small> | R (PubMed-RCT) | Fine-tuned | 230M | 43.0 |
|
82 |
+
| Guanaco <small>(Liévin et al.)</small> | ∅ | 0-shot | 33B | 42.9 |
|
83 |
+
| LLaMA-2-chat <small>(Liévin et al.)</small> | ∅ | 0-shot | 70B | 42.3 |
|
84 |
+
| Vicuna 1.5 <small>(Liévin et al.)</small> | ∅ | 0-shot | 65B | 41.6 |
|
85 |
+
| Mistral-Instruct <small>(Chen et al.)</small> | ∅ | 3-shot | 7B | 41.1 |
|
86 |
+
| PaLM <small>(Singhal et al.)</small> | ∅ | 5-shot | 62B | 40.9 |
|
87 |
+
| Guanaco <small>(Liévin et al.)</small> | ∅ | 0-shot | 65B | 40.8 |
|
88 |
+
| Falcon-Instruct <small>(Liévin et al.)</small> | ∅ | 0-shot | 40B | 39.0 |
|
89 |
+
| Vicuna 1.3 <small>(Liévin et al.)</small> | ∅ | 0-shot | 13B | 38.7 |
|
90 |
+
| GreaseLM <small>(Zhang et al.)</small> | R (UMLS) | Fine-tuned | 359M | 38.5 |
|
91 |
+
| PubMedBERT <small>(Singhal et al.)</small> | ∅ | Fine-tuned | 110M | 38.1 |
|
92 |
+
| QA-GNN <small>(Yasunaga et al.)</small> | R (UMLS) | Fine-tuned | 360M | 38.0 |
|
93 |
+
| LLaMA-2 <small>(Yang et al.)</small> | R (Wikipedia) | k-shot | 13B | 37.6 |
|
94 |
| LLaMA-2-chat | R (MedWiki) | 2-shot | 7B | 37.2 |
|
95 |
| LLaMA-2-chat | ∅ | 2-shot | 7B | 37.2 |
|
96 |
+
| BioBERT <small>(Lee et al.)</small> | ∅ | Fine-tuned | 110M | 36.7 |
|
97 |
+
| MTP-Instruct <small>(Liévin et al.)</small> | ∅ | 0-shot | 30B | 35.1 |
|
98 |
+
| GPT-Neo <small>(Singhal et al.)</small> | ∅ | Fine-tuned | 2.5B | 33.3 |
|
99 |
+
| LLaMa-2-chat <small>(Liévin et al.)</small> | ∅ | 0-shot | 13B | 32.2 |
|
100 |
+
| LLaMa-2 <small>(Liévin et al.)</small> | ∅ | 0-shot | 13B | 31.1 |
|
101 |
+
| GPT-NeoX <small>(Liévin et al.) </small> | ∅ | 0-shot | 20B | 26.9 |
|
102 |
|