Submitted by xxzcc 10 ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding Tencent 232 2
Submitted by Carlanlarkk 40 Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Tencent 12 2
Submitted by invokerliang 26 CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Tencent 1
Submitted by lr10260 19 VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning Tencent 2
Submitted by zptu 3 BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs Tencent 2
Submitted by xx18 31 Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners Tencent 2
Submitted by zhongwenxu 3 Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning Tencent 2