π ReVisual-R1 (7B) β Open-Source Multimodal Reasoner
	
One cold-start, two RL stages, endless reasoning power.
	
		
	
	
		π Highlights
	
- SOTA on 9 tough benchmarks covering visualβmath + text reasoning. 
- Three-Stage SRO Training - 
- Text Cold-Start β seed deep reflection
- Multimodal RL β align vision & logic
- Text RL β polish fluency & brevity
 
- PAD (Prioritized Advantage Distillation) keeps gradients alive. 
- Efficient-Length Reward = concise, self-reflective CoT. 
	
		
	
	
		π Resources
	
	
		
	
	
		π Citation
	
@article{chen2025advancing,
  title={Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning},
  author={Chen, Shuang and Guo, Yue and Su, Zhaochen and Li, Yafu and Wu, Yulun and Chen, Jiacheng and Chen, Jiayu and Wang, Weijie and Qu, Xiaoye and Cheng, Yu},
  journal={arXiv preprint arXiv:2506.04207},
  year={2025}
}
Take ReVisual-R1 for a spin and let us know what you build! π―