alecocc commited on
Commit
793a2fb
·
verified ·
1 Parent(s): 823ff56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -34
README.md CHANGED
@@ -40,7 +40,7 @@ widget:
40
  ---
41
  # Model Card for MedGENIE-fid-flan-t5-base-medqa
42
 
43
- MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical domain. Specifically, MedGENIE-fid-flan-t5-base-medqa is a fusion-in-decoder model based on flan-t5-base architecture, trained on the [MedQA-USMLE](https://huggingface.co/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE) dataset augmented with artificially generated contexts from PMC-LLaMA-13B. This model achieves a new state-of-the-art performance over the corresponding test set.
44
 
45
  ## Model description
46
 
@@ -56,47 +56,47 @@ At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOT
56
  | Model | Ground (Source) | Learning | Params | Accuracy (↓) |
57
  |----------------------------------|--------------------|---------------------------|-----------------|-------------------------------|
58
  | **MedGENIE-FID-Flan-T5** | G (PMC-LLaMA) | Fine-tuned | 250M | **53.1** |
59
- | Codex <small>(Liévin et al. 2022)</small> | &empty; | 0-zhot | 175B | 52.5 |
60
- | Codex <small>(Liévin et al. 2022)</small> | R (Wikipedia) | 0-shot | 175B | 52.5 |
61
- | GPT-3.5-Turbo <small>(Yang et al.)</small> | R (Wikipedia) | k-shot | -- | 52.3 |
62
- | MEDITRON <small>(Chen et al.)</small> | &empty; | Fine-tuned | 7B | 52.0 |
63
  | Zephyr-&beta; | R (MedWiki) | 2-shot | 7B | 50.4 |
64
- | BioMedGPT <small>(Luo et al.)</small> | &empty; | k-shot | 10B | 50.4 |
65
- | BioMedLM <small>(Singhal et al.)</small> | &empty; | Fine-tuned | 2.7B | 50.3 |
66
  | PMC-LLaMA (AWQ) | &empty; | Fine-tuned | 13B | 50.2 |
67
- | LLaMA-2 <small>(Chen et al.)</small> | &empty; | Fine-tuned | 7B | 49.6 |
68
  | Zephyr-&beta; | &empty; | 2-shot | 7B | 49.6 |
69
- | Zephyr-&beta; <small>(Chen et al.)</small> | &empty; | 3-shot | 7B | 49.2 |
70
- | PMC-LLaMA <small>(Chen et al.)</small> | &empty; | Fine-tuned | 7B | 49.2 |
71
- | DRAGON <small>(Yasunaga et al.)</small> | R (UMLS) | Fine-tuned | 360M | 47.5 |
72
- | InstructGPT <small>(Liévin et al.)</small> | R (Wikipedia) | 0-shot | 175B | 47.3 |
73
- | Flan-PaLM <small>(Singhal et al.)</small> | &empty; | 5-shot | 62B | 46.1 |
74
- | InstructGPT <small>(Liévin et al.)</small> | &empty; | 0-shot | 175B | 46.0 |
75
- | VOD <small>(Liévin et al. 2023)</small> | R (MedWiki) | Fine-tuned | 220M | 45.8 |
76
- | Vicuna 1.3 <small>(Liévin et al.)</small> | &empty; | 0-shot | 33B | 45.2 |
77
- | BioLinkBERT <small>(Singhal et al.)</small> | &empty; | Fine-tuned | 340M | 45.1 |
78
  | Mistral-Instruct | R (MedWiki) | 2-shot | 7B | 45.1 |
79
  | Galactica | &empty; | 0-shot | 120B | 44.4 |
80
- | LLaMA-2 <small>(Liévin et al.)</small> | &empty; | 0-shot | 70B | 43.4 |
81
- | BioReader <small>(Frison et al.)</small> | R (PubMed-RCT) | Fine-tuned | 230M | 43.0 |
82
- | Guanaco <small>(Liévin et al.)</small> | &empty; | 0-shot | 33B | 42.9 |
83
- | LLaMA-2-chat <small>(Liévin et al.)</small> | &empty; | 0-shot | 70B | 42.3 |
84
- | Vicuna 1.5 <small>(Liévin et al.)</small> | &empty; | 0-shot | 65B | 41.6 |
85
- | Mistral-Instruct <small>(Chen et al.)</small> | &empty; | 3-shot | 7B | 41.1 |
86
- | PaLM <small>(Singhal et al.)</small> | &empty; | 5-shot | 62B | 40.9 |
87
- | Guanaco <small>(Liévin et al.)</small> | &empty; | 0-shot | 65B | 40.8 |
88
- | Falcon-Instruct <small>(Liévin et al.)</small> | &empty; | 0-shot | 40B | 39.0 |
89
- | Vicuna 1.3 <small>(Liévin et al.)</small> | &empty; | 0-shot | 13B | 38.7 |
90
  | GreaseLM <small>(Zhang et al.)</small> | R (UMLS) | Fine-tuned | 359M | 38.5 |
91
- | PubMedBERT <small>(Singhal et al.)</small> | &empty; | Fine-tuned | 110M | 38.1 |
92
  | QA-GNN <small>(Yasunaga et al.)</small> | R (UMLS) | Fine-tuned | 360M | 38.0 |
93
- | LLaMA-2 <small>(Yang et al.)</small> | R (Wikipedia) | k-shot | 13B | 37.6 |
94
  | LLaMA-2-chat | R (MedWiki) | 2-shot | 7B | 37.2 |
95
  | LLaMA-2-chat | &empty; | 2-shot | 7B | 37.2 |
96
  | BioBERT <small>(Lee et al.)</small> | &empty; | Fine-tuned | 110M | 36.7 |
97
- | MTP-Instruct <small>(Liévin et al.)</small> | &empty; | 0-shot | 30B | 35.1 |
98
- | GPT-Neo <small>(Singhal et al.)</small> | &empty; | Fine-tuned | 2.5B | 33.3 |
99
- | LLaMa-2-chat <small>(Liévin et al.)</small> | &empty; | 0-shot | 13B | 32.2 |
100
- | LLaMa-2 <small>(Liévin et al.)</small> | &empty; | 0-shot | 13B | 31.1 |
101
- | GPT-NeoX <small>(Liévin et al.) </small> | &empty; | 0-shot | 20B | 26.9 |
102
 
 
40
  ---
41
  # Model Card for MedGENIE-fid-flan-t5-base-medqa
42
 
43
+ MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical domain. Specifically, MedGENIE-fid-flan-t5-base-medqa is a fusion-in-decoder model based on flan-t5-base architecture, trained on the [MedQA-USMLE](https://huggingface.co/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE) dataset augmented with artificially generated contexts from PMC-LLaMA-13B. This model achieves a new state-of-the-art (SOTA) performance over the corresponding test set.
44
 
45
  ## Model description
46
 
 
56
  | Model | Ground (Source) | Learning | Params | Accuracy (&darr;) |
57
  |----------------------------------|--------------------|---------------------------|-----------------|-------------------------------|
58
  | **MedGENIE-FID-Flan-T5** | G (PMC-LLaMA) | Fine-tuned | 250M | **53.1** |
59
+ | Codex <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-zhot | 175B | 52.5 |
60
+ | Codex <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | R (Wikipedia) | 0-shot | 175B | 52.5 |
61
+ | GPT-3.5-Turbo <small>([Yang et al.](https://arxiv.org/abs/2309.02233))</small> | R (Wikipedia) | k-shot | -- | 52.3 |
62
+ | MEDITRON <small>([Chen et al.](https://arxiv.org/abs/2311.16079))</small> | &empty; | Fine-tuned | 7B | 52.0 |
63
  | Zephyr-&beta; | R (MedWiki) | 2-shot | 7B | 50.4 |
64
+ | BioMedGPT <small>([Luo et al.](https://arxiv.org/abs/2308.09442v2))</small> | &empty; | k-shot | 10B | 50.4 |
65
+ | BioMedLM <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | &empty; | Fine-tuned | 2.7B | 50.3 |
66
  | PMC-LLaMA (AWQ) | &empty; | Fine-tuned | 13B | 50.2 |
67
+ | LLaMA-2 <small>([Chen et al.](https://arxiv.org/abs/2311.16079))</small> | &empty; | Fine-tuned | 7B | 49.6 |
68
  | Zephyr-&beta; | &empty; | 2-shot | 7B | 49.6 |
69
+ | Zephyr-&beta; <small>([Chen et al.](https://arxiv.org/abs/2311.16079))</small> | &empty; | 3-shot | 7B | 49.2 |
70
+ | PMC-LLaMA <small>([Chen et al.](https://arxiv.org/abs/2311.16079))</small> | &empty; | Fine-tuned | 7B | 49.2 |
71
+ | DRAGON <small>([Yasunaga et al.](https://arxiv.org/abs/2210.09338))</small> | R (UMLS) | Fine-tuned | 360M | 47.5 |
72
+ | InstructGPT <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | R (Wikipedia) | 0-shot | 175B | 47.3 |
73
+ | Flan-PaLM <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | &empty; | 5-shot | 62B | 46.1 |
74
+ | InstructGPT <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 175B | 46.0 |
75
+ | VOD <small>([Liévin et al. 2023](https://arxiv.org/abs/2210.06345))</small> | R (MedWiki) | Fine-tuned | 220M | 45.8 |
76
+ | Vicuna 1.3 <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 33B | 45.2 |
77
+ | BioLinkBERT <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | &empty; | Fine-tuned | 340M | 45.1 |
78
  | Mistral-Instruct | R (MedWiki) | 2-shot | 7B | 45.1 |
79
  | Galactica | &empty; | 0-shot | 120B | 44.4 |
80
+ | LLaMA-2 <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 70B | 43.4 |
81
+ | BioReader <small>(Frisoni et al.)</small> | R (PubMed-RCT) | Fine-tuned | 230M | 43.0 |
82
+ | Guanaco <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 33B | 42.9 |
83
+ | LLaMA-2-chat <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 70B | 42.3 |
84
+ | Vicuna 1.5 <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 65B | 41.6 |
85
+ | Mistral-Instruct <small>([Chen et al.](https://arxiv.org/abs/2311.16079))</small> | &empty; | 3-shot | 7B | 41.1 |
86
+ | PaLM <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | &empty; | 5-shot | 62B | 40.9 |
87
+ | Guanaco <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 65B | 40.8 |
88
+ | Falcon-Instruct <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 40B | 39.0 |
89
+ | Vicuna 1.3 <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 13B | 38.7 |
90
  | GreaseLM <small>(Zhang et al.)</small> | R (UMLS) | Fine-tuned | 359M | 38.5 |
91
+ | PubMedBERT <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | &empty; | Fine-tuned | 110M | 38.1 |
92
  | QA-GNN <small>(Yasunaga et al.)</small> | R (UMLS) | Fine-tuned | 360M | 38.0 |
93
+ | LLaMA-2 <small>([Yang et al.](https://arxiv.org/abs/2309.02233))</small> | R (Wikipedia) | k-shot | 13B | 37.6 |
94
  | LLaMA-2-chat | R (MedWiki) | 2-shot | 7B | 37.2 |
95
  | LLaMA-2-chat | &empty; | 2-shot | 7B | 37.2 |
96
  | BioBERT <small>(Lee et al.)</small> | &empty; | Fine-tuned | 110M | 36.7 |
97
+ | MTP-Instruct <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 30B | 35.1 |
98
+ | GPT-Neo <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | &empty; | Fine-tuned | 2.5B | 33.3 |
99
+ | LLaMa-2-chat <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 13B | 32.2 |
100
+ | LLaMa-2 <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 13B | 31.1 |
101
+ | GPT-NeoX <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | &empty; | 0-shot | 20B | 26.9 |
102