MedGENIE-fid-flan-t5-base-medqa / README.md

Update README.md

823ff56 verified about 1 year ago

9.5 kB

	---
	license: mit
	datasets:
	- disi-unibo-nlp/medqa-5-opt-MedGENIE
	language:
	- en
	metrics:
	- accuracy
	tags:
	- medical
	pipeline_tag: question-answering
	widget:
	- text: >-
	A junior orthopaedic surgery resident is completing a carpal tunnel repair
	with the department chairman as the attending physician. During the case,
	the resident inadvertently cuts a flexor tendon. The tendon is repaired
	without complication. The attending tells the resident that the patient will
	do fine, and there is no need to report this minor complication that will
	not harm the patient, as he does not want to make the patient worry
	unnecessarily. He tells the resident to leave this complication out of the
	operative report. Which of the following is the correct next action for the
	resident to take?

	A. Disclose the error to the patient and put it in the operative report

	B. Tell the attending that he cannot fail to disclose this mistake

	C. Report the physician to the ethics committee

	D. Refuse to dictate the operative reporty.
	context: >-
	Inadvertent Cutting of Tendon is a complication, it should be in the
	Operative Reports

	The resident must put this complication in the operative report and
	disscuss it with the patient. If there was no harm to the patent and
	correction was done then theres nothing major for worry. But disclosing
	this as per ethical guidelines, is mandatory
	example_title: Example 1
	---
	# Model Card for MedGENIE-fid-flan-t5-base-medqa

	MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical domain. Specifically, MedGENIE-fid-flan-t5-base-medqa is a fusion-in-decoder model based on flan-t5-base architecture, trained on the [MedQA-USMLE](https://huggingface.co/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE) dataset augmented with artificially generated contexts from PMC-LLaMA-13B. This model achieves a new state-of-the-art performance over the corresponding test set.

	## Model description

	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: [google/flan-t5-base](https://huggingface.co/google/flan-t5-base)
	- Repository: https://github.com/disi-unibo-nlp/medgenie

	## Performance

	At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOTA model on MedQA-USMLS benchmark:

	\| Model \| Ground (Source) \| Learning \| Params \| Accuracy (↓) \|
	\|----------------------------------\|--------------------\|---------------------------\|-----------------\|-------------------------------\|
	\| MedGENIE-FID-Flan-T5 \| G (PMC-LLaMA) \| Fine-tuned \| 250M \| 53.1 \|
	\| Codex <small>(Liévin et al. 2022)</small> \| ∅ \| 0-zhot \| 175B \| 52.5 \|
	\| Codex <small>(Liévin et al. 2022)</small> \| R (Wikipedia) \| 0-shot \| 175B \| 52.5 \|
	\| GPT-3.5-Turbo <small>(Yang et al.)</small> \| R (Wikipedia) \| k-shot \| -- \| 52.3 \|
	\| MEDITRON <small>(Chen et al.)</small> \| ∅ \| Fine-tuned \| 7B \| 52.0 \|
	\| Zephyr-β \| R (MedWiki) \| 2-shot \| 7B \| 50.4 \|
	\| BioMedGPT <small>(Luo et al.)</small> \| ∅ \| k-shot \| 10B \| 50.4 \|
	\| BioMedLM <small>(Singhal et al.)</small> \| ∅ \| Fine-tuned \| 2.7B \| 50.3 \|
	\| PMC-LLaMA (AWQ) \| ∅ \| Fine-tuned \| 13B \| 50.2 \|
	\| LLaMA-2 <small>(Chen et al.)</small> \| ∅ \| Fine-tuned \| 7B \| 49.6 \|
	\| Zephyr-β \| ∅ \| 2-shot \| 7B \| 49.6 \|
	\| Zephyr-β <small>(Chen et al.)</small> \| ∅ \| 3-shot \| 7B \| 49.2 \|
	\| PMC-LLaMA <small>(Chen et al.)</small> \| ∅ \| Fine-tuned \| 7B \| 49.2 \|
	\| DRAGON <small>(Yasunaga et al.)</small> \| R (UMLS) \| Fine-tuned \| 360M \| 47.5 \|
	\| InstructGPT <small>(Liévin et al.)</small> \| R (Wikipedia) \| 0-shot \| 175B \| 47.3 \|
	\| Flan-PaLM <small>(Singhal et al.)</small> \| ∅ \| 5-shot \| 62B \| 46.1 \|
	\| InstructGPT <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 175B \| 46.0 \|
	\| VOD <small>(Liévin et al. 2023)</small> \| R (MedWiki) \| Fine-tuned \| 220M \| 45.8 \|
	\| Vicuna 1.3 <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 33B \| 45.2 \|
	\| BioLinkBERT <small>(Singhal et al.)</small> \| ∅ \| Fine-tuned \| 340M \| 45.1 \|
	\| Mistral-Instruct \| R (MedWiki) \| 2-shot \| 7B \| 45.1 \|
	\| Galactica \| ∅ \| 0-shot \| 120B \| 44.4 \|
	\| LLaMA-2 <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 70B \| 43.4 \|
	\| BioReader <small>(Frison et al.)</small> \| R (PubMed-RCT) \| Fine-tuned \| 230M \| 43.0 \|
	\| Guanaco <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 33B \| 42.9 \|
	\| LLaMA-2-chat <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 70B \| 42.3 \|
	\| Vicuna 1.5 <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 65B \| 41.6 \|
	\| Mistral-Instruct <small>(Chen et al.)</small> \| ∅ \| 3-shot \| 7B \| 41.1 \|
	\| PaLM <small>(Singhal et al.)</small> \| ∅ \| 5-shot \| 62B \| 40.9 \|
	\| Guanaco <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 65B \| 40.8 \|
	\| Falcon-Instruct <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 40B \| 39.0 \|
	\| Vicuna 1.3 <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 13B \| 38.7 \|
	\| GreaseLM <small>(Zhang et al.)</small> \| R (UMLS) \| Fine-tuned \| 359M \| 38.5 \|
	\| PubMedBERT <small>(Singhal et al.)</small> \| ∅ \| Fine-tuned \| 110M \| 38.1 \|
	\| QA-GNN <small>(Yasunaga et al.)</small> \| R (UMLS) \| Fine-tuned \| 360M \| 38.0 \|
	\| LLaMA-2 <small>(Yang et al.)</small> \| R (Wikipedia) \| k-shot \| 13B \| 37.6 \|
	\| LLaMA-2-chat \| R (MedWiki) \| 2-shot \| 7B \| 37.2 \|
	\| LLaMA-2-chat \| ∅ \| 2-shot \| 7B \| 37.2 \|
	\| BioBERT <small>(Lee et al.)</small> \| ∅ \| Fine-tuned \| 110M \| 36.7 \|
	\| MTP-Instruct <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 30B \| 35.1 \|
	\| GPT-Neo <small>(Singhal et al.)</small> \| ∅ \| Fine-tuned \| 2.5B \| 33.3 \|
	\| LLaMa-2-chat <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 13B \| 32.2 \|
	\| LLaMa-2 <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 13B \| 31.1 \|
	\| GPT-NeoX <small>(Liévin et al.) </small> \| ∅ \| 0-shot \| 20B \| 26.9 \|

	---
	license: mit
	datasets:
	- disi-unibo-nlp/medqa-5-opt-MedGENIE
	language:
	- en
	metrics:
	- accuracy
	tags:
	- medical
	pipeline_tag: question-answering
	widget:
	- text: >-
	A junior orthopaedic surgery resident is completing a carpal tunnel repair
	with the department chairman as the attending physician. During the case,
	the resident inadvertently cuts a flexor tendon. The tendon is repaired
	without complication. The attending tells the resident that the patient will
	do fine, and there is no need to report this minor complication that will
	not harm the patient, as he does not want to make the patient worry
	unnecessarily. He tells the resident to leave this complication out of the
	operative report. Which of the following is the correct next action for the
	resident to take?

	A. Disclose the error to the patient and put it in the operative report

	B. Tell the attending that he cannot fail to disclose this mistake

	C. Report the physician to the ethics committee

	D. Refuse to dictate the operative reporty.
	context: >-
	Inadvertent Cutting of Tendon is a complication, it should be in the
	Operative Reports

	The resident must put this complication in the operative report and
	disscuss it with the patient. If there was no harm to the patent and
	correction was done then theres nothing major for worry. But disclosing
	this as per ethical guidelines, is mandatory
	example_title: Example 1
	---
	# Model Card for MedGENIE-fid-flan-t5-base-medqa

	MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical domain. Specifically, MedGENIE-fid-flan-t5-base-medqa is a fusion-in-decoder model based on flan-t5-base architecture, trained on the [MedQA-USMLE](https://huggingface.co/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE) dataset augmented with artificially generated contexts from PMC-LLaMA-13B. This model achieves a new state-of-the-art performance over the corresponding test set.

	## Model description

	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: [google/flan-t5-base](https://huggingface.co/google/flan-t5-base)
	- Repository: https://github.com/disi-unibo-nlp/medgenie

	## Performance

	At the time of release, MedGENIE-fid-flan-t5-base-medqa is a new lightweight SOTA model on MedQA-USMLS benchmark:

	\| Model \| Ground (Source) \| Learning \| Params \| Accuracy (↓) \|
	\|----------------------------------\|--------------------\|---------------------------\|-----------------\|-------------------------------\|
	\| MedGENIE-FID-Flan-T5 \| G (PMC-LLaMA) \| Fine-tuned \| 250M \| 53.1 \|
	\| Codex <small>(Liévin et al. 2022)</small> \| ∅ \| 0-zhot \| 175B \| 52.5 \|
	\| Codex <small>(Liévin et al. 2022)</small> \| R (Wikipedia) \| 0-shot \| 175B \| 52.5 \|
	\| GPT-3.5-Turbo <small>(Yang et al.)</small> \| R (Wikipedia) \| k-shot \| -- \| 52.3 \|
	\| MEDITRON <small>(Chen et al.)</small> \| ∅ \| Fine-tuned \| 7B \| 52.0 \|
	\| Zephyr-β \| R (MedWiki) \| 2-shot \| 7B \| 50.4 \|
	\| BioMedGPT <small>(Luo et al.)</small> \| ∅ \| k-shot \| 10B \| 50.4 \|
	\| BioMedLM <small>(Singhal et al.)</small> \| ∅ \| Fine-tuned \| 2.7B \| 50.3 \|
	\| PMC-LLaMA (AWQ) \| ∅ \| Fine-tuned \| 13B \| 50.2 \|
	\| LLaMA-2 <small>(Chen et al.)</small> \| ∅ \| Fine-tuned \| 7B \| 49.6 \|
	\| Zephyr-β \| ∅ \| 2-shot \| 7B \| 49.6 \|
	\| Zephyr-β <small>(Chen et al.)</small> \| ∅ \| 3-shot \| 7B \| 49.2 \|
	\| PMC-LLaMA <small>(Chen et al.)</small> \| ∅ \| Fine-tuned \| 7B \| 49.2 \|
	\| DRAGON <small>(Yasunaga et al.)</small> \| R (UMLS) \| Fine-tuned \| 360M \| 47.5 \|
	\| InstructGPT <small>(Liévin et al.)</small> \| R (Wikipedia) \| 0-shot \| 175B \| 47.3 \|
	\| Flan-PaLM <small>(Singhal et al.)</small> \| ∅ \| 5-shot \| 62B \| 46.1 \|
	\| InstructGPT <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 175B \| 46.0 \|
	\| VOD <small>(Liévin et al. 2023)</small> \| R (MedWiki) \| Fine-tuned \| 220M \| 45.8 \|
	\| Vicuna 1.3 <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 33B \| 45.2 \|
	\| BioLinkBERT <small>(Singhal et al.)</small> \| ∅ \| Fine-tuned \| 340M \| 45.1 \|
	\| Mistral-Instruct \| R (MedWiki) \| 2-shot \| 7B \| 45.1 \|
	\| Galactica \| ∅ \| 0-shot \| 120B \| 44.4 \|
	\| LLaMA-2 <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 70B \| 43.4 \|
	\| BioReader <small>(Frison et al.)</small> \| R (PubMed-RCT) \| Fine-tuned \| 230M \| 43.0 \|
	\| Guanaco <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 33B \| 42.9 \|
	\| LLaMA-2-chat <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 70B \| 42.3 \|
	\| Vicuna 1.5 <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 65B \| 41.6 \|
	\| Mistral-Instruct <small>(Chen et al.)</small> \| ∅ \| 3-shot \| 7B \| 41.1 \|
	\| PaLM <small>(Singhal et al.)</small> \| ∅ \| 5-shot \| 62B \| 40.9 \|
	\| Guanaco <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 65B \| 40.8 \|
	\| Falcon-Instruct <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 40B \| 39.0 \|
	\| Vicuna 1.3 <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 13B \| 38.7 \|
	\| GreaseLM <small>(Zhang et al.)</small> \| R (UMLS) \| Fine-tuned \| 359M \| 38.5 \|
	\| PubMedBERT <small>(Singhal et al.)</small> \| ∅ \| Fine-tuned \| 110M \| 38.1 \|
	\| QA-GNN <small>(Yasunaga et al.)</small> \| R (UMLS) \| Fine-tuned \| 360M \| 38.0 \|
	\| LLaMA-2 <small>(Yang et al.)</small> \| R (Wikipedia) \| k-shot \| 13B \| 37.6 \|
	\| LLaMA-2-chat \| R (MedWiki) \| 2-shot \| 7B \| 37.2 \|
	\| LLaMA-2-chat \| ∅ \| 2-shot \| 7B \| 37.2 \|
	\| BioBERT <small>(Lee et al.)</small> \| ∅ \| Fine-tuned \| 110M \| 36.7 \|
	\| MTP-Instruct <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 30B \| 35.1 \|
	\| GPT-Neo <small>(Singhal et al.)</small> \| ∅ \| Fine-tuned \| 2.5B \| 33.3 \|
	\| LLaMa-2-chat <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 13B \| 32.2 \|
	\| LLaMa-2 <small>(Liévin et al.)</small> \| ∅ \| 0-shot \| 13B \| 31.1 \|
	\| GPT-NeoX <small>(Liévin et al.) </small> \| ∅ \| 0-shot \| 20B \| 26.9 \|