GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning Paper • 2311.12631 • Published Nov 21, 2023 • 15
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11, 2024 • 56
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Paper • 2504.01956 • Published Apr 2 • 41
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding Paper • 2506.23219 • Published Jun 29 • 7
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published Jul 8 • 42
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky Paper • 2507.03336 • Published Jul 4 • 5
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS Paper • 2507.07136 • Published Jul 9 • 36
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Paper • 2507.08801 • Published Jul 11 • 30
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation Paper • 2507.10524 • Published Jul 14 • 69
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16 • 41
OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique Paper • 2507.09075 • Published Jul 11 • 14
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once Paper • 2507.10541 • Published Jul 14 • 29
Lizard: An Efficient Linearization Framework for Large Language Models Paper • 2507.09025 • Published Jul 11 • 17
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs Paper • 2507.08616 • Published Jul 11 • 13
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes Paper • 2507.11407 • Published Jul 15 • 57
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Paper • 2507.13332 • Published Jul 17 • 48
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization Paper • 2507.12142 • Published Jul 16 • 36
FLEXITOKENS: Flexible Tokenization for Evolving Language Models Paper • 2507.12720 • Published Jul 17 • 9
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17 • 24
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers Paper • 2507.08422 • Published Jul 11 • 35
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization Paper • 2507.15061 • Published Jul 20 • 53
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling Paper • 2507.11061 • Published Jul 15 • 37
Gaussian Splatting with Discretized SDF for Relightable Assets Paper • 2507.15629 • Published Jul 21 • 22
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 119
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention Paper • 2507.17745 • Published Jul 23 • 33
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents Paper • 2507.22827 • Published Jul 30 • 97
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective Paper • 2507.23632 • Published Jul 31 • 6
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models Paper • 2508.00819 • Published Aug 1 • 62
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training Paper • 2508.00414 • Published Aug 1 • 89
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward Paper • 2508.03686 • Published about 1 month ago • 35
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2 • 234
Efficient Agents: Building Effective Agents While Reducing Cost Paper • 2508.02694 • Published Jul 24 • 85
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper • 2508.03680 • Published about 1 month ago • 65
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search Paper • 2508.02091 • Published Aug 4 • 13
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization Paper • 2508.05731 • Published 28 days ago • 25
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization Paper • 2508.07629 • Published 25 days ago • 39
Adapting Vision-Language Models Without Labels: A Comprehensive Survey Paper • 2508.05547 • Published 28 days ago • 11
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published 27 days ago • 171
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent Paper • 2508.06600 • Published 27 days ago • 36
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published 24 days ago • 44
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory Paper • 2508.09736 • Published 22 days ago • 54
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens Paper • 2508.05305 • Published 28 days ago • 45
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models Paper • 2508.09968 • Published 22 days ago • 15
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs Paper • 2508.14896 • Published 15 days ago • 21
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization Paper • 2508.10395 • Published 21 days ago • 40
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published 16 days ago • 35
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization Paper • 2508.14811 • Published 15 days ago • 39
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation Paper • 2508.17472 • Published 11 days ago • 26
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models Paper • 2508.18773 • Published 9 days ago • 14
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies Paper • 2508.20072 • Published 8 days ago • 28
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published 8 days ago • 77
SpotEdit: Evaluating Visually-Guided Image Editing Methods Paper • 2508.18159 • Published 10 days ago • 2
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks Paper • 2508.15804 • Published 22 days ago • 14
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published 7 days ago • 85
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published 11 days ago • 76
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published 8 days ago • 35
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles Paper • 2508.16072 • Published 14 days ago • 3
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published 7 days ago • 71
UItron: Foundational GUI Agent with Advanced Perception and Planning Paper • 2508.21767 • Published 6 days ago • 11
TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training Paper • 2508.17677 • Published 11 days ago • 14
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published 2 days ago • 91