An extendable MTP paradigm with leaping capability for both training and inference.

Large Language Models (LLMs) are typically trained and deployed using Next-Token Prediction (NTP), an inherently sequential process that limits contextual coverage and inference efficiency. To address this, we propose Leap Multi-Token Prediction (L-MTP), an innovative method that extends Multi-Token Prediction (MTP) by strategically skipping intermediate tokens to predict non-adjacent future tokens in a single forward pass. L-MTP enhances the model's ability to capture long-range dependencies and enables a specialized decoding strategy that significantly accelerates inference.

Models & Datasets

The paper evaluates L-MTP on a diverse set of base LLMs: Qwen 2.5 (3B, 7B), Llama 3.2 (3B), Llama 3.1 (8B), and Gemma 3 (4B, 12B).

Training datasets are curated from: Math (Hendrycks et al.), Evol-Instruct-Code (Luo et al., Chaudhary) and Alpaca-GPT4 (Peng et al.). You can go through our dataset in LLaMA-Factory/data:dataset_info.json and link.

Citation

If you find this work useful, please cite our paper:

@article{lmtp,
  title={L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models},
  author={Liu, Xiaohao and Xia, Xiaobo and Zhao, Weixiang and Zhang, Manyi and Yu, Xianzhi and Su, Xiu and Yang, Shuo and Ng, See-Kiong and Chua, Tat-Seng},
  journal={arXiv preprint arXiv:2505.17505},
  year={2025}
}

🌟 Empower your LLMs to predict farther and generate faster with L-MTP! 🌟

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support