---
base_model:
- LiquidAI/LFM2-350M
library_name: transformers
license: other
license_name: lfm1.0
license_link: LICENSE
tags:
- mergekit
- merge

---
# lfm2-350M-med

**Small medical fine-tune on top of LiquidAI’s LFM2-350M.**  
This checkpoint specializes the 350M LFM2 base for medical Q&A and tool-augmented search, using a light-weight recipe designed for laptops/edge boxes.

> ⚠️ **Medical safety**: This model is **not** a clinician. It may hallucinate and should **not** be used for diagnosis or treatment. Always seek qualified medical supervision.

---

## TL;DR

- **Base**: [LiquidAI/LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M). 
- **Training**:
  1) SFT on **open-source medical data** + **tool-calling (search) traces**  
  2) **DPO** preference alignment using **MedMCQA** as a preference signal  
  3) Post-merge with the base via **Arcee Fusion** (MergeKit) for controlled weight fusion
- **Eval (author’s harness)**  
  - **MMLU-Pro**: **19.46** (vs **18.76** base in same harness)  
  - **IFEVAL**: **52.595** (vs **61.72** base in same harness)  
  _Note_: LFM2’s official IFEVAL uses a different internal harness and reports ~65 on IFEVAL for the base; numbers are **not directly comparable** across harnesses.

---

## What’s inside

### Base model: LFM2-350M
- Designed for **on-device** inference, with strong CPU latency and a **ChatML-like** template.  
- Supports **tool use** with dedicated special tokens (`<tool_call>`, `</tool_call>`, etc.).  
  See the base card for the full template and examples.

### Specialization steps

1. **Domain SFT (medical + tools)**
   - Instruction-style Q&A from open medical sources and synthetic conversions.
   - Tool-use (search) supervised traces to teach function calling patterns.

2. **Preference alignment (DPO)**
   - Direct Preference Optimization with **MedMCQA-derived** preferences to bias toward clinically reasonable short answers.  
   - Rationale: DPO is simple, stable at a small scale, and works well for short-form medical responses.

3. **Model fusion (Arcee Fusion)**
   - Final merge uses **Arcee Fusion** in MergeKit, which selectively fuses parameters to avoid over-averaging and can be configured via `merge_method: arcee_fusion`.

---

## Intended use & limitations

**Use**: **education**, **research**.  
**Don’t use**: any medical advice.

---

## Evaluation

> All results below were run with the author’s harness; they **will differ** from LiquidAI’s internal suite and Open LLM Leaderboard settings.

| Benchmark  | lfm2-350M-med | LFM2-350M (same harness) |
|------------|---------------:|-------------------------:|
| MMLU-Pro   | **19.46**      | 18.76 |
| IFEVAL     | **52.595**     | 61.72 |

- **MMLU-Pro** raises difficulty with 10 choices and more reasoning-heavy items—small models typically drop vs standard MMLU, so small absolute movements are meaningful.
- **IFEVAL** measures verifiable instruction-following; scores depend heavily on prompt templates and verification scripts.

---

## Quickstart (Transformers)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "mkurman/lfm2-350M-med"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16")

messages = [
  {"role": "system", "content": "You are a careful medical assistant. Cite sources and warn that outputs are not medical advice."},
  {"role": "user", "content": "Briefly explain the difference between cellulitis and erysipelas."}
]

prompt = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
out = model.generate(**tok(prompt, return_tensors="pt"), max_new_tokens=256)
print(tok.decode(out[0], skip_special_tokens=True))