metadata

license: mit
datasets:
  - disi-unibo-nlp/medqa-5-opt-MedGENIE
language:
  - en
metrics:
  - accuracy
tags:
  - medical
pipeline_tag: question-answering
widget:
  - text: >-
      A junior orthopaedic surgery resident is completing a carpal tunnel repair
      with the department chairman as the attending physician. During the case,
      the resident inadvertently cuts a flexor tendon. The tendon is repaired
      without complication. The attending tells the resident that the patient
      will do fine, and there is no need to report this minor complication that
      will not harm the patient, as he does not want to make the patient worry
      unnecessarily. He tells the resident to leave this complication out of the
      operative report. Which of the following is the correct next action for
      the resident to take?

      A. Disclose the error to the patient and put it in the operative report

      B. Tell the attending that he cannot fail to disclose this mistake

      C. Report the physician to the ethics committee

      D. Refuse to dictate the operative reporty.
    context: >-
      Inadvertent Cutting of Tendon is a complication, it should be in the
      Operative Reports

      The resident must put this complication in the operative report and
      disscuss it with the patient. If there was no harm to the patent and
      correction was done then theres nothing major for worry. But disclosing
      this as per ethical guidelines, is mandatory
    example_title: Example 1

Model Card for MedGENIE-fid-flan-t5-base-medqa

MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical domain. Specifically, MedGENIE-fid-flan-t5-base-medqa is a fusion-in-decoder model based on flan-t5-base architecture, trained on the MedQA-USMLE dataset augmented with artificially generated contexts from PMC-LLaMA-13B. This model achieves a new state-of-the-art performance over the corresponding test set.

Model description

Language(s) (NLP): English
License: MIT
Finetuned from model: google/flan-t5-base
Repository: https://github.com/disi-unibo-nlp/medgenie

Performance

At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOTA model on MedQA-USMLS benchmark:

Model	Ground (Source)	Learning	Params	Accuracy (↓)
MedGENIE-FID-Flan-T5	G (PMC-LLaMA)	Fine-tuned	250M	53.1
Codex (Liévin et al. 2022)	∅	0-zhot	175B	52.5
Codex (Liévin et al. 2022)	R (Wikipedia)	0-shot	175B	52.5
GPT-3.5-Turbo (Yang et al.)	R (Wikipedia)	k-shot	--	52.3
MEDITRON (Chen et al.)	∅	Fine-tuned	7B	52.0
Zephyr-β	R (MedWiki)	2-shot	7B	50.4
BioMedGPT (Luo et al.)	∅	k-shot	10B	50.4
BioMedLM (Singhal et al.)	∅	Fine-tuned	2.7B	50.3
PMC-LLaMA (AWQ)	∅	Fine-tuned	13B	50.2
LLaMA-2 (Chen et al.)	∅	Fine-tuned	7B	49.6
Zephyr-β	∅	2-shot	7B	49.6
Zephyr-β (Chen et al.)	∅	3-shot	7B	49.2
PMC-LLaMA (Chen et al.)	∅	Fine-tuned	7B	49.2
DRAGON (Yasunaga et al.)	R (UMLS)	Fine-tuned	360M	47.5
InstructGPT (Liévin et al.)	R (Wikipedia)	0-shot	175B	47.3
Flan-PaLM (Singhal et al.)	∅	5-shot	62B	46.1
InstructGPT (Liévin et al.)	∅	0-shot	175B	46.0
VOD (Liévin et al. 2023)	R (MedWiki)	Fine-tuned	220M	45.8
Vicuna 1.3 (Liévin et al.)	∅	0-shot	33B	45.2
BioLinkBERT (Singhal et al.)	∅	Fine-tuned	340M	45.1
Mistral-Instruct	R (MedWiki)	2-shot	7B	45.1
Galactica	∅	0-shot	120B	44.4
LLaMA-2 (Liévin et al.)	∅	0-shot	70B	43.4
BioReader (Frison et al.)	R (PubMed-RCT)	Fine-tuned	230M	43.0
Guanaco (Liévin et al.)	∅	0-shot	33B	42.9
LLaMA-2-chat (Liévin et al.)	∅	0-shot	70B	42.3
Vicuna 1.5 (Liévin et al.)	∅	0-shot	65B	41.6
Mistral-Instruct (Chen et al.)	∅	3-shot	7B	41.1
PaLM (Singhal et al.)	∅	5-shot	62B	40.9
Guanaco (Liévin et al.)	∅	0-shot	65B	40.8
Falcon-Instruct (Liévin et al.)	∅	0-shot	40B	39.0
Vicuna 1.3 (Liévin et al.)	∅	0-shot	13B	38.7
GreaseLM (Zhang et al.)	R (UMLS)	Fine-tuned	359M	38.5
PubMedBERT (Singhal et al.)	∅	Fine-tuned	110M	38.1
QA-GNN (Yasunaga et al.)	R (UMLS)	Fine-tuned	360M	38.0
LLaMA-2 (Yang et al.)	R (Wikipedia)	k-shot	13B	37.6
LLaMA-2-chat	R (MedWiki)	2-shot	7B	37.2
LLaMA-2-chat	∅	2-shot	7B	37.2
BioBERT (Lee et al.)	∅	Fine-tuned	110M	36.7
MTP-Instruct (Liévin et al.)	∅	0-shot	30B	35.1
GPT-Neo (Singhal et al.)	∅	Fine-tuned	2.5B	33.3
LLaMa-2-chat (Liévin et al.)	∅	0-shot	13B	32.2
LLaMa-2 (Liévin et al.)	∅	0-shot	13B	31.1
GPT-NeoX (Liévin et al.)	∅	0-shot	20B	26.9