File size: 13,309 Bytes
4e96e0b 9f78c04 19c1068 7f682b5 c790f55 b28447d e94e001 b28447d 9f78c04 b28447d 823ff56 b28447d 6e73657 793a2fb 2054688 823ff56 793a2fb 8775347 793a2fb 823ff56 793a2fb 2054688 793a2fb b28447d 2054688 b28447d 793a2fb 4ea418f 793a2fb 4ea418f 793a2fb 4ea418f 793a2fb b28447d 4ea418f 793a2fb b28447d 83bf16e ec0d50b 48937fe 83bf16e 48937fe 83bf16e d8fddf7 8ccc690 9607fcc 605c8a8 9607fcc ca8d815 9607fcc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
license: mit
datasets:
- disi-unibo-nlp/medqa-5-opt-MedGENIE
language:
- en
metrics:
- accuracy
tags:
- medical
- question-answering
- fusion-in-decoder
pipeline_tag: question-answering
widget:
- text: >-
A junior orthopaedic surgery resident is completing a carpal tunnel repair
with the department chairman as the attending physician. During the case,
the resident inadvertently cuts a flexor tendon. The tendon is repaired
without complication. The attending tells the resident that the patient will
do fine, and there is no need to report this minor complication that will
not harm the patient, as he does not want to make the patient worry
unnecessarily. He tells the resident to leave this complication out of the
operative report. Which of the following is the correct next action for the
resident to take? A. Disclose the error to the patient and put it in the
operative report B. Tell the attending that he cannot fail to disclose this
mistake C. Report the physician to the ethics committee D. Refuse to dictate
the operative reporty.
context: >-
Inadvertent Cutting of Tendon is a complication, it should be in the
Operative Reports The resident must put this complication in the operative
report and disscuss it with the patient. If there was no harm to the patent
and correction was done then theres nothing major for worry. But disclosing
this as per ethical guidelines, is mandatory
example_title: Example 1
---
# Model Card for MedGENIE-fid-flan-t5-base-medqa
MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical field. Specifically, **MedGENIE-fid-flan-t5-base-medqa** is a *fusion-in-decoder* (FID) model based on [flan-t5-base](https://huggingface.co/google/flan-t5-base), trained on the [MedQA-USMLE](https://huggingface.co/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE) dataset and grounded on artificial contexts generated by [PMC-LLaMA-13B](https://huggingface.co/axiong/PMC_LLaMA_13B). This model achieves a new *state-of-the-art* (SOTA) performance over the corresponding test set.
## Model description
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** [google/flan-t5-base](https://huggingface.co/google/flan-t5-base)
- **Repository:** https://github.com/disi-unibo-nlp/medgenie
- **Paper:** [To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering](https://arxiv.org/abs/2403.01924)
## Performance
At the time of release (February 2024), **MedGENIE-fid-flan-t5-base-medqa** is a new lightweight SOTA model on MedQA-USMLE benchmark:
| Model | Ground (Source) | Learning | Params | Accuracy (↓) |
|----------------------------------|--------------------|---------------------------|-----------------|-------------------------------|
| **MedGENIE-FID-Flan-T5** | **G (PMC-LLaMA)** | **Fine-tuned** | **250M** | **53.1** |
| Codex <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-zhot | 175B | 52.5 |
| Codex <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | R (Wikipedia) | 0-shot | 175B | 52.5 |
| GPT-3.5-Turbo <small>([Yang et al.](https://arxiv.org/abs/2309.02233))</small> | R (Wikipedia) | k-shot | -- | 52.3 |
| MEDITRON <small>([Chen et al.](https://arxiv.org/abs/2311.16079))</small> | ∅ | Fine-tuned | 7B | 52.0 |
| BioMistral DARE <small> ([Labrak et al.](https://arxiv.org/abs/2402.10373)) </small> | ∅ | Fine-tuned | 7B | 51.1 |
| BioMistral <small> ([Labrak et al.](https://arxiv.org/abs/2402.10373)) </small> | ∅ | Fine-tuned | 7B | 50.6 |
| Zephyr-β | R (MedWiki) | 2-shot | 7B | 50.4 |
| BioMedGPT <small>([Luo et al.](https://arxiv.org/abs/2308.09442v2))</small> | ∅ | k-shot | 10B | 50.4 |
| BioMedLM <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | ∅ | Fine-tuned | 2.7B | 50.3 |
| PMC-LLaMA <small>(awq 4 bit)</small> | ∅ | Fine-tuned | 13B | 50.2 |
| LLaMA-2 <small>([Chen et al.](https://arxiv.org/abs/2311.16079))</small> | ∅ | Fine-tuned | 7B | 49.6 |
| Zephyr-β | ∅ | 2-shot | 7B | 49.6 |
| Zephyr-β <small>([Chen et al.](https://arxiv.org/abs/2311.16079))</small> | ∅ | 3-shot | 7B | 49.2 |
| PMC-LLaMA <small>([Chen et al.](https://arxiv.org/abs/2311.16079))</small> | ∅ | Fine-tuned | 7B | 49.2 |
| DRAGON <small>([Yasunaga et al.](https://arxiv.org/abs/2210.09338))</small> | R (UMLS) | Fine-tuned | 360M | 47.5 |
| InstructGPT <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | R (Wikipedia) | 0-shot | 175B | 47.3 |
| BioMistral DARE <small> ([Labrak et al.](https://arxiv.org/abs/2402.10373)) </small> | ∅ | 3-shot | 7B | 47.0 |
| Flan-PaLM <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | ∅ | 5-shot | 62B | 46.1 |
| InstructGPT <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 175B | 46.0 |
| VOD <small>([Liévin et al. 2023](https://arxiv.org/abs/2210.06345))</small> | R (MedWiki) | Fine-tuned | 220M | 45.8 |
| Vicuna 1.3 <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 33B | 45.2 |
| BioLinkBERT <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | ∅ | Fine-tuned | 340M | 45.1 |
| Mistral-Instruct | R (MedWiki) | 2-shot | 7B | 45.1 |
| BioMistral <small> ([Labrak et al.](https://arxiv.org/abs/2402.10373)) </small> | ∅ | 3-shot | 7B | 44.4 |
| Galactica | ∅ | 0-shot | 120B | 44.4 |
| LLaMA-2 <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 70B | 43.4 |
| BioReader <small>([Frisoni et al.](https://aclanthology.org/2022.emnlp-main.390/))</small> | R (PubMed-RCT) | Fine-tuned | 230M | 43.0 |
| Guanaco <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 33B | 42.9 |
| LLaMA-2-chat <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 70B | 42.3 |
| Vicuna 1.5 <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 65B | 41.6 |
| Mistral-Instruct <small>([Chen et al.](https://arxiv.org/abs/2311.16079))</small> | ∅ | 3-shot | 7B | 41.1 |
| PaLM <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | ∅ | 5-shot | 62B | 40.9 |
| Guanaco <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 65B | 40.8 |
| Falcon-Instruct <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 40B | 39.0 |
| Vicuna 1.3 <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 13B | 38.7 |
| GreaseLM <small>([Zhang et al.](https://arxiv.org/abs/2201.08860))</small> | R (UMLS) | Fine-tuned | 359M | 38.5 |
| PubMedBERT <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | ∅ | Fine-tuned | 110M | 38.1 |
| QA-GNN <small>([Yasunaga et al.](https://arxiv.org/abs/2104.06378))</small> | R (UMLS) | Fine-tuned | 360M | 38.0 |
| LLaMA-2 <small>([Yang et al.](https://arxiv.org/abs/2309.02233))</small> | R (Wikipedia) | k-shot | 13B | 37.6 |
| LLaMA-2-chat | R (MedWiki) | 2-shot | 7B | 37.2 |
| LLaMA-2-chat | ∅ | 2-shot | 7B | 37.2 |
| BioBERT <small>([Lee et al.](https://arxiv.org/abs/1901.08746))</small> | ∅ | Fine-tuned | 110M | 36.7 |
| MTP-Instruct <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 30B | 35.1 |
| GPT-Neo <small>([Singhal et al.](https://arxiv.org/abs/2212.13138))</small> | ∅ | Fine-tuned | 2.5B | 33.3 |
| LLaMa-2-chat <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 13B | 32.2 |
| LLaMa-2 <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 13B | 31.1 |
| GPT-NeoX <small>([Liévin et al.](https://arxiv.org/abs/2207.08143))</small> | ∅ | 0-shot | 20B | 26.9 |
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- n_context: 5
- per_gpu_batch_size: 1
- accumulation_steps: 4
- total_steps: 40,712
- eval_freq: 10,178
- optimizer: AdamW
- scheduler: linear
- weight_decay: 0.01
- warmup_ratio: 0.1
- text_maxlength: 1024
### Bias, Risk and Limitation
Our model is trained on artificially generated contextual documents, which might inadvertently magnify inherent biases and depart from clinical and societal norms. This could lead to the spread of convincing medical misinformation. To mitigate this risk, we recommend a cautious approach: domain experts should manually review any output before real-world use. This ethical safeguard is crucial to prevent the dissemination of potentially erroneous or misleading information, particularly within clinical and scientific circles.
## Citation
If you find MedGENIE-fid-flan-t5-base-medqa is useful in your work, please cite it with:
```
@misc{frisoni2024generate,
title={To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering},
author={Giacomo Frisoni and Alessio Cocchieri and Alex Presepi and Gianluca Moro and Zaiqiao Meng},
year={2024},
eprint={2403.01924},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
``` |