--- base_model: - LiquidAI/LFM2-350M library_name: transformers license: other license_name: lfm1.0 license_link: LICENSE tags: - mergekit - merge --- # lfm2-350M-med **Small medical fine-tune on top of LiquidAI’s LFM2-350M.** This checkpoint specializes the 350M LFM2 base for medical Q&A and tool-augmented search, using a light-weight recipe designed for laptops/edge boxes. > ⚠️ **Medical safety**: This model is **not** a clinician. It may hallucinate and should **not** be used for diagnosis or treatment. Always seek qualified medical supervision. --- ## TL;DR - **Base**: [LiquidAI/LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M). - **Training**: 1) SFT on **open-source medical data** + **tool-calling (search) traces** 2) **DPO** preference alignment using **MedMCQA** as a preference signal 3) Post-merge with the base via **Arcee Fusion** (MergeKit) for controlled weight fusion - **Eval (author’s harness)** - **MMLU-Pro**: **19.46** (vs **18.76** base in same harness) - **IFEVAL**: **52.595** (vs **61.72** base in same harness) _Note_: LFM2’s official IFEVAL uses a different internal harness and reports ~65 on IFEVAL for the base; numbers are **not directly comparable** across harnesses. --- ## What’s inside ### Base model: LFM2-350M - Designed for **on-device** inference, with strong CPU latency and a **ChatML-like** template. - Supports **tool use** with dedicated special tokens (``, ``, etc.). See the base card for the full template and examples. ### Specialization steps 1. **Domain SFT (medical + tools)** - Instruction-style Q&A from open medical sources and synthetic conversions. - Tool-use (search) supervised traces to teach function calling patterns. 2. **Preference alignment (DPO)** - Direct Preference Optimization with **MedMCQA-derived** preferences to bias toward clinically reasonable short answers. - Rationale: DPO is simple, stable at a small scale, and works well for short-form medical responses. 3. **Model fusion (Arcee Fusion)** - Final merge uses **Arcee Fusion** in MergeKit, which selectively fuses parameters to avoid over-averaging and can be configured via `merge_method: arcee_fusion`. --- ## Intended use & limitations **Use**: **education**, **research**. **Don’t use**: any medical advice. --- ## Evaluation > All results below were run with the author’s harness; they **will differ** from LiquidAI’s internal suite and Open LLM Leaderboard settings. | Benchmark | lfm2-350M-med | LFM2-350M (same harness) | |------------|---------------:|-------------------------:| | MMLU-Pro | **19.46** | 18.76 | | IFEVAL | **52.595** | 61.72 | - **MMLU-Pro** raises difficulty with 10 choices and more reasoning-heavy items—small models typically drop vs standard MMLU, so small absolute movements are meaningful. - **IFEVAL** measures verifiable instruction-following; scores depend heavily on prompt templates and verification scripts. --- ## Quickstart (Transformers) ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "mkurman/lfm2-350M-med" tok = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16") messages = [ {"role": "system", "content": "You are a careful medical assistant. Cite sources and warn that outputs are not medical advice."}, {"role": "user", "content": "Briefly explain the difference between cellulitis and erysipelas."} ] prompt = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) out = model.generate(**tok(prompt, return_tensors="pt"), max_new_tokens=256) print(tok.decode(out[0], skip_special_tokens=True))