Gemma 4 E4B — Opus reasoning distill (LoRA)

LoRA adapter that teaches google/gemma-4-E4B-it to emit explicit step-by-step reasoning in the style of Claude Opus 4.6, supervised-distilled from a combined corpus of Opus + Sonnet reasoning traces:

Source: Claude Opus 4.6 + Sonnet 4.6 reasoning traces (~4.4k combined). Final eval loss: 0.9813.

Why

Prior to this release the only confirmed Gemma 4 Opus-reasoning LoRA on the hub was kai-os/gemma4-31b-Opus-4.6-reasoning at the 31B tier. This set fills in the smaller sizes (E2B, E4B) with the same Opus-derived recipe, so the hot-swap story works on lighter hardware.

Training recipe

  • LoRA: rank 32, alpha 64, dropout 0.05, bias=none
  • Target modules: q/k/v/o/gate/up/down_proj — text tower only (vision + audio projections under Gemma4ClippableLinear are excluded so gradients flow to the layers that actually run during text inference)
  • Sequence length: 2048, no packing
  • Effective batch: 16 (micro-batch 8 × grad-accum 2 on E2B; 4 × 4 on E4B)
  • Optimizer: AdamW, lr 2e-4, cosine schedule with 3% warmup
  • Epochs: 2
  • Precision: bf16, gradient checkpointing with use_reentrant=False
  • Attention: SDPA (flash-attn 2 unavailable on ROCm for Gemma 4's head dim)
  • Hardware: 1× AMD MI300X (192 GB, ROCm 7.0)

Eval trajectory (held-out 2% split)

epoch eval_loss
0.767 0.9930
1.533 0.9857
2.000 0.9813

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "google/gemma-4-E4B-it"
adapter_id = "josuediazflores/gemma-4-e4b-opus-reasoning-lora"

tokenizer = AutoTokenizer.from_pretrained(base_id)
model = AutoModelForCausalLM.from_pretrained(base_id, dtype="bfloat16")
model = PeftModel.from_pretrained(model, adapter_id)

messages = [{"role": "user", "content": "Prove there are infinitely many primes."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
out = model.generate(inputs.to(model.device), max_new_tokens=1024)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Caveats

  • Trained on ~16k reasoning traces — useful for broad reasoning patterns, not a substitute for domain-specific math/code evals.
  • bf16 adapter; works with any bf16, 8-bit, or mxfp8 quant of the same base. 4-bit quants of the base will still load but some quality drift.
  • No RLHF step — this is pure supervised distillation from the Sky-T1 corpus, which was itself Qwen-distilled from o1/R1-style outputs.

Citation

@dataset{sky_t1_2025,
  author = {NovaSky-AI},
  title = {Sky-T1 Reasoning Dataset},
  year = {2025},
  url = {https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k}
}
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for josuediazflores/gemma-4-e4b-opus-reasoning-lora

Adapter
(102)
this model

Datasets used to train josuediazflores/gemma-4-e4b-opus-reasoning-lora