base_model: internistai/base-7b-v0.2
datasets:
- omi-health/medical-dialogue-to-soap-summary
language:
- en
license: apache-2.0
metrics:
- accuracy
tags:
- medical
- mlx
tag: text-generation
The Model cogbuji/MrGrammaticaOntology-internistai-SCT-DRIFT-clinical-problem-0.6.5 was converted to MLX format from internistai/base-7b-v0.2 using mlx-lm version 0.16.0.
The name of the model is a homage to Fela Kuti's song Mr Grammarticalogy-Lisationalsim Is The Boss released on the B-side of his 1976 LP Excuse O.
It is an experimental model for non-production environments inspired by explorations into how large language models can be trained to be more conversant in medical terminology and concepts and used in various medical informatics scenarios.
It is a LoRa finetune of internistai/base-7b-v0.2 using [controlled natural language (CNL) phrases] generated from the September 23rd release of SNOMED CT United States Edition. The general idea is described in Reference Domain Ontologies and Large Medical Language Models.
During the training, LoRa was applied to all linear layers using a dataset comprising 318,798 SNOMED-CT DRIFT phrases from the SNOMED-CT concept hierarchies relevant to medical problems (findings, morphologic abnormalities, situations with explicit context, and disorders) and 7,400 records from the Synthetic Medical Dialogues and SOAP Summaries dataset. The training ran for two days, 13 hours, and 55 minutes using mlx-tuning fork, a framework for parameterized large language model (Q)LoRa fine tuning on Apple Metal.
Below is a snippet of the configuration used (the format has changed over time):
lora_parameters:
keys: ["self_attn.q_proj","self_attn.v_proj","self_attn.k_proj","self_attn.o_proj"]
rank: 32
alpha: 32
dropout: 0.3205
scale: 10.0
epochs: 2
learning_schedule:
type: "cosine_w_warmup"
warmup_proportion: .1
min_lr: 1e-7
cycle_length: -1
min_cos_lr: 7e-6
The wand db log is below:
79,700 iterations at 39,850 iterations per epoch on a dataset of 318,798 records, 8 at a time.
MMLU-SR benchmarks
Below are before and after MMLU-SR benchmark scores for the MMLU medical topics listed below were measured before and afterwards. MMLU-SR is a dataset used by the LM Evaluation Harness for rigorous benchmarking of true model comprehension.
Before (unquantized internistai lm-eval run on Apple Metal)
hf (pretrained=internistai/base-7b-v0.2,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 64
Tasks Version Filter n-shot Metric Value Stderr clinical knowledge 0 none 0 acc ↑ 0.5019 ± 0.0308 professional medicine 0 none 0 acc ↑ 0.5441 ± 0.0303
After (unquantized internistai lm-eval run on Apple Metal)
hf (pretrained=../raw_models/outbox/MrGrammaticaOntology-internistai-SCT-DRIFT-clinical-problem-0.6.5,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 64
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
clinical knowledge | 0 | none | 0 | acc | ↑ | 0.5208 | ± | 0.0307 |
professional medicine | 0 | none | 0 | acc | ↑ | 0.5625 | ± | 0.0301 |
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("cogbuji/MrGrammaticalOntology-internistai-SCT-core-0.6.5")
response = generate(model, tokenizer, prompt="hello", verbose=True)