LightningRL

Diffusion Large Language Models with a SOTA Accuracy–Parallelism Trade-off

ICML 2026 Paper on arXiv Paper PDF GitHub Code Hugging Face Model

We introduce LightningRL, a reinforcement learning post-training framework for block-wise diffusion Large Language Models (dLLMs) that breaks the accuracy–parallelism trade-off. Applied to SDAR-8B, LightningRL achieves 7.32 average TPF and 497.9 AUP — simultaneously improving both generation quality and inference speed.

  • LightningRL-8B-32b-MATH500, LightningRL-8B-32b-GSM8K, LightningRL-8B-32b-MBPP, and LightningRL-8B-32b-HumanEval are task-specific variants fine-tuned with different reward weight configurations for targeted deployment.

Citation

@article{hu2026lightningrl,
  title={LightningRL: Breaking the Accuracy--Parallelism Trade-off of Block-wise dLLMs via Reinforcement Learning},
  author={Hu, Yanzhe and Jin, Yijie and Liu, Pengfei and Yu, Kai and Deng, Zhijie},
  journal={arXiv preprint},
  year={2026},
  note={Coming soon}
}
Downloads last month
18
Safetensors
Model size
8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including SJTU-DENG-Lab/LightningRL-8B-b32-HumanEval