Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models Paper • 2509.23962 • Published 20 days ago • 5
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions Paper • 2510.08211 • Published 9 days ago • 22
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Paper • 2509.23924 • Published 20 days ago • 7
RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents Paper • 2506.00618 • Published May 31 • 1
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning Paper • 2509.22647 • Published 22 days ago • 31
Generalized Face Anti-spoofing via Finer Domain Partition and Disentangling Liveness-irrelevant Factors Paper • 2407.08243 • Published Jul 11, 2024 • 1
G^2V^2former: Graph Guided Video Vision Transformer for Face Anti-Spoofing Paper • 2408.07675 • Published Aug 14, 2024 • 1
Kronecker Mask and Interpretive Prompts are Language-Action Video Learners Paper • 2502.03549 • Published Feb 5 • 1
DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing Paper • 2503.00429 • Published Mar 1 • 1