Bugai's Collection - a BugaiL Collection

BugaiL 's Collections

Bugai's Collection

Bugai's Collection

updated about 2 hours ago

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24 • 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26 • 41
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning

Paper • 2508.18966 • Published Aug 26 • 56
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 222
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2 • 83
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31 • 83
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1 • 73
Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

Paper • 2509.00605 • Published Aug 30 • 42
Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 68
DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks

Paper • 2509.01396 • Published Sep 1 • 56
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Paper • 2510.12276 • Published 29 days ago • 142
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 114
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Paper • 2510.25976 • Published 13 days ago • 11
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

Paper • 2510.25616 • Published 13 days ago • 89
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published 7 days ago • 96
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published 7 days ago • 52
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published 5 days ago • 185
V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published 5 days ago • 92
Scaling Agent Learning via Experience Synthesis

Paper • 2511.03773 • Published 6 days ago • 69
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Paper • 2511.04217 • Published 6 days ago • 14
HaluMem: Evaluating Hallucinations in Memory Systems of Agents

Paper • 2511.03506 • Published 6 days ago • 71
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Paper • 2511.07327 • Published 1 day ago • 57
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization

Paper • 2511.06411 • Published 2 days ago • 13