An extendable MTP paradigm with leaping capability for both training and inference.
Large Language Models (LLMs) are typically trained and deployed using Next-Token Prediction (NTP), an inherently sequential process that limits contextual coverage and inference efficiency. To address this, we propose Leap Multi-Token Prediction (L-MTP), an innovative method that extends Multi-Token Prediction (MTP) by strategically skipping intermediate tokens to predict non-adjacent future tokens in a single forward pass. L-MTP enhances the model's ability to capture long-range dependencies and enables a specialized decoding strategy that significantly accelerates inference.
Models & Datasets
The paper evaluates L-MTP on a diverse set of base LLMs: Qwen 2.5 (3B, 7B), Llama 3.2 (3B), Llama 3.1 (8B), and Gemma 3 (4B, 12B).
Training datasets are curated from: Math (Hendrycks et al.), Evol-Instruct-Code (Luo et al., Chaudhary) and Alpaca-GPT4 (Peng et al.). You can go through our dataset in LLaMA-Factory/data:dataset_info.json and link.
Citation
If you find this work useful, please cite our paper:
@article{lmtp,
title={L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models},
author={Liu, Xiaohao and Xia, Xiaobo and Zhao, Weixiang and Zhang, Manyi and Yu, Xianzhi and Su, Xiu and Yang, Shuo and Ng, See-Kiong and Chua, Tat-Seng},
journal={arXiv preprint arXiv:2505.17505},
year={2025}
}
