Kimi-K2.7-Code Eagle3-MLA Draft

Eagle3-MLA speculative-decoding draft model for Kimi-K2.7-Code, trained natively on K2.7-Code data. Pairs with the Kimi-K2.7-Code verifier under vLLM speculative decoding.

What this is

  • Algorithm: EAGLE-3 with MLA (multi-head latent attention), single draft decoder layer.
  • Verifier: Kimi-K2.7-Code (DeepSeek-V3-class architecture; arch is identical across K2.5 / K2.6 / K2.7). The draft reuses the verifier's frozen embedding / lm_head / norm.
  • Init: lightseek K2.6 Eagle3-MLA export, then fine-tuned on K2.7-native data.
  • Training data: real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x) mixed with kimi-mtp prompts re-answered by K2.7-Code.
  • Recipe: ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4, cosine LR 2e-5, seq_length 8192.

Why K2.7-native

A K2.6-teacher draft over-fit the K2.6 distribution and lost to the lightseek init on real K2.7-Code traffic. Training on K2.7-native data reverses that: on held-out K2.7 traffic this draft matches or beats the lightseek init on accepted-token length.

Usage (vLLM)

vllm serve /path/to/Kimi-K2.7-Code \
  --tensor-parallel-size 8 \
  --speculative-config '{"model": "k-l-lambda/kimi-k2.7-code-eagle3-mla", "num_speculative_tokens": 3, "method": "eagle3"}'

Checkpoint

This is an intermediate checkpoint from an in-progress run (step 32400, the best by validation loss among retained checkpoints at upload time). It is published for evaluation; a final checkpoint will follow when the run reaches its step budget.

Downloads last month
7
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for k-l-lambda/kimi-k2.7-code-eagle3-mla

Finetuned
(4)
this model