|
--- |
|
license: mit |
|
datasets: |
|
- disi-unibo-nlp/medqa-5-opt-MedGENIE |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
tags: |
|
- medical |
|
pipeline_tag: question-answering |
|
widget: |
|
- text: >- |
|
A junior orthopaedic surgery resident is completing a carpal tunnel repair |
|
with the department chairman as the attending physician. During the case, |
|
the resident inadvertently cuts a flexor tendon. The tendon is repaired |
|
without complication. The attending tells the resident that the patient will |
|
do fine, and there is no need to report this minor complication that will |
|
not harm the patient, as he does not want to make the patient worry |
|
unnecessarily. He tells the resident to leave this complication out of the |
|
operative report. Which of the following is the correct next action for the |
|
resident to take? |
|
|
|
A. Disclose the error to the patient and put it in the operative report |
|
|
|
B. Tell the attending that he cannot fail to disclose this mistake |
|
|
|
C. Report the physician to the ethics committee |
|
|
|
D. Refuse to dictate the operative reporty. |
|
context: >- |
|
Inadvertent Cutting of Tendon is a complication, it should be in the |
|
Operative Reports |
|
|
|
The resident must put this complication in the operative report and |
|
disscuss it with the patient. If there was no harm to the patent and |
|
correction was done then theres nothing major for worry. But disclosing |
|
this as per ethical guidelines, is mandatory |
|
example_title: Example 1 |
|
--- |
|
# Model Card for MedGENIE-fid-flan-t5-base-medqa |
|
|
|
MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical domain. Specifically, MedGENIE-fid-flan-t5-base-medqa is a fusion-in-decoder model based on flan-t5-base architecture, trained on the [MedQA-USMLE](https://huggingface.co/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE) dataset augmented with artificially generated contexts from PMC-LLaMA-13B. This model achieves a new state-of-the-art performance over the corresponding test set. |
|
|
|
## Model description |
|
|
|
- **Language(s) (NLP):** English |
|
- **License:** MIT |
|
- **Finetuned from model:** [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) |
|
- **Repository:** https://github.com/disi-unibo-nlp/medgenie |
|
|
|
## Performance |
|
|
|
At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOTA model on MedQA-USMLS benchmark: |
|
|
|
| Model | Ground (Source) | Learning | Params | Accuracy (↓) | |
|
|----------------------------------|--------------------|---------------------------|-----------------|-------------------------------| |
|
| **MedGENIE-FID-Flan-T5** | G (PMC-LLaMA) | Fine-tuned | 250M | **53.1** | |
|
| Codex <small>(Liévin et al. 2022)</small> | ∅ | 0-zhot | 175B | 52.5 | |
|
| Codex <small>(Liévin et al. 2022)</small> | R (Wikipedia) | 0-shot | 175B | 52.5 | |
|
| GPT-3.5-Turbo <small>(Yang et al.)</small> | R (Wikipedia) | k-shot | -- | 52.3 | |
|
| MEDITRON <small>(Chen et al.)</small> | ∅ | Fine-tuned | 7B | 52.0 | |
|
| Zephyr-β | R (MedWiki) | 2-shot | 7B | 50.4 | |
|
| BioMedGPT <small>(Luo et al.)</small> | ∅ | k-shot | 10B | 50.4 | |
|
| BioMedLM <small>(Singhal et al.)</small> | ∅ | Fine-tuned | 2.7B | 50.3 | |
|
| PMC-LLaMA (AWQ) | ∅ | Fine-tuned | 13B | 50.2 | |
|
| LLaMA-2 <small>(Chen et al.)</small> | ∅ | Fine-tuned | 7B | 49.6 | |
|
| Zephyr-β | ∅ | 2-shot | 7B | 49.6 | |
|
| Zephyr-β <small>(Chen et al.)</small> | ∅ | 3-shot | 7B | 49.2 | |
|
| PMC-LLaMA <small>(Chen et al.)</small> | ∅ | Fine-tuned | 7B | 49.2 | |
|
| DRAGON <small>(Yasunaga et al.)</small> | R (UMLS) | Fine-tuned | 360M | 47.5 | |
|
| InstructGPT <small>(Liévin et al.)</small> | R (Wikipedia) | 0-shot | 175B | 47.3 | |
|
| Flan-PaLM <small>(Singhal et al.)</small> | ∅ | 5-shot | 62B | 46.1 | |
|
| InstructGPT <small>(Liévin et al.)</small> | ∅ | 0-shot | 175B | 46.0 | |
|
| VOD <small>(Liévin et al. 2023)</small> | R (MedWiki) | Fine-tuned | 220M | 45.8 | |
|
| Vicuna 1.3 <small>(Liévin et al.)</small> | ∅ | 0-shot | 33B | 45.2 | |
|
| BioLinkBERT <small>(Singhal et al.)</small> | ∅ | Fine-tuned | 340M | 45.1 | |
|
| Mistral-Instruct | R (MedWiki) | 2-shot | 7B | 45.1 | |
|
| Galactica | ∅ | 0-shot | 120B | 44.4 | |
|
| LLaMA-2 <small>(Liévin et al.)</small> | ∅ | 0-shot | 70B | 43.4 | |
|
| BioReader <small>(Frison et al.)</small> | R (PubMed-RCT) | Fine-tuned | 230M | 43.0 | |
|
| Guanaco <small>(Liévin et al.)</small> | ∅ | 0-shot | 33B | 42.9 | |
|
| LLaMA-2-chat <small>(Liévin et al.)</small> | ∅ | 0-shot | 70B | 42.3 | |
|
| Vicuna 1.5 <small>(Liévin et al.)</small> | ∅ | 0-shot | 65B | 41.6 | |
|
| Mistral-Instruct <small>(Chen et al.)</small> | ∅ | 3-shot | 7B | 41.1 | |
|
| PaLM <small>(Singhal et al.)</small> | ∅ | 5-shot | 62B | 40.9 | |
|
| Guanaco <small>(Liévin et al.)</small> | ∅ | 0-shot | 65B | 40.8 | |
|
| Falcon-Instruct <small>(Liévin et al.)</small> | ∅ | 0-shot | 40B | 39.0 | |
|
| Vicuna 1.3 <small>(Liévin et al.)</small> | ∅ | 0-shot | 13B | 38.7 | |
|
| GreaseLM <small>(Zhang et al.)</small> | R (UMLS) | Fine-tuned | 359M | 38.5 | |
|
| PubMedBERT <small>(Singhal et al.)</small> | ∅ | Fine-tuned | 110M | 38.1 | |
|
| QA-GNN <small>(Yasunaga et al.)</small> | R (UMLS) | Fine-tuned | 360M | 38.0 | |
|
| LLaMA-2 <small>(Yang et al.)</small> | R (Wikipedia) | k-shot | 13B | 37.6 | |
|
| LLaMA-2-chat | R (MedWiki) | 2-shot | 7B | 37.2 | |
|
| LLaMA-2-chat | ∅ | 2-shot | 7B | 37.2 | |
|
| BioBERT <small>(Lee et al.)</small> | ∅ | Fine-tuned | 110M | 36.7 | |
|
| MTP-Instruct <small>(Liévin et al.)</small> | ∅ | 0-shot | 30B | 35.1 | |
|
| GPT-Neo <small>(Singhal et al.)</small> | ∅ | Fine-tuned | 2.5B | 33.3 | |
|
| LLaMa-2-chat <small>(Liévin et al.)</small> | ∅ | 0-shot | 13B | 32.2 | |
|
| LLaMa-2 <small>(Liévin et al.)</small> | ∅ | 0-shot | 13B | 31.1 | |
|
| GPT-NeoX <small>(Liévin et al.) </small> | ∅ | 0-shot | 20B | 26.9 | |
|
|
|
|