Kimi-K2.7-Code Eagle3-MLA Draft
Eagle3-MLA speculative-decoding draft model for Kimi-K2.7-Code, trained natively on K2.7-Code data. Pairs with the Kimi-K2.7-Code verifier under vLLM speculative decoding.
What this is
- Algorithm: EAGLE-3 with MLA (multi-head latent attention), single draft decoder layer.
- Verifier:
Kimi-K2.7-Code(DeepSeek-V3-class architecture; arch is identical across K2.5 / K2.6 / K2.7). The draft reuses the verifier's frozen embedding / lm_head / norm. - Init: lightseek K2.6 Eagle3-MLA export, then fine-tuned on K2.7-native data.
- Training data: real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x) mixed with kimi-mtp prompts re-answered by K2.7-Code.
- Recipe: ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4, cosine LR 2e-5, seq_length 8192.
Why K2.7-native
A K2.6-teacher draft over-fit the K2.6 distribution and lost to the lightseek init on real K2.7-Code traffic. Training on K2.7-native data reverses that: on held-out K2.7 traffic this draft matches or beats the lightseek init on accepted-token length.
Usage (vLLM)
vllm serve /path/to/Kimi-K2.7-Code \
--tensor-parallel-size 8 \
--speculative-config '{"model": "k-l-lambda/kimi-k2.7-code-eagle3-mla", "num_speculative_tokens": 3, "method": "eagle3"}'
Checkpoint
This is an intermediate checkpoint from an in-progress run (step 32400, the best by validation loss among retained checkpoints at upload time). It is published for evaluation; a final checkpoint will follow when the run reaches its step budget.
- Downloads last month
- 7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for k-l-lambda/kimi-k2.7-code-eagle3-mla
Base model
moonshotai/Kimi-K2.7-Code