ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning Paper • 2505.22094 • Published May 28 • 3
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models Paper • 2506.16054 • Published Jun 19 • 60
Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization Paper • 2502.04686 • Published Feb 7 • 2
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments Paper • 2506.02387 • Published Jun 3 • 58
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper • 2404.10719 • Published Apr 16, 2024 • 6