Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
•
2512.24618
•
Published
•
149
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper
•
2512.24873
•
Published
•
104
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Paper
•
2512.23343
•
Published
•
29
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking
Paper
•
2512.24297
•
Published
•
6
Valori: A Deterministic Memory Substrate for AI Systems
Paper
•
2512.22280
•
Published
•
5
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper
•
2512.23959
•
Published
•
112
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization
Paper
•
2512.24615
•
Published
•
119
Nested Learning: The Illusion of Deep Learning Architectures
Paper
•
2512.24695
•
Published
•
43
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning
Paper
•
2512.24330
•
Published
•
35
Fast-weight Product Key Memory
Paper
•
2601.00671
•
Published
•
6
SimpleMem: Efficient Lifelong Memory for LLM Agents
Paper
•
2601.02553
•
Published
•
36
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
Paper
•
2601.02346
•
Published
•
26
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
Paper
•
2601.01576
•
Published
•
18
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
Paper
•
2601.03193
•
Published
•
47
NitroGen: An Open Foundation Model for Generalist Gaming Agents
Paper
•
2601.02427
•
Published
•
44
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning
Paper
•
2512.23412
•
Published
•
40
Token-Level LLM Collaboration via FusionRoute
Paper
•
2601.05106
•
Published
•
40
AT^2PO: Agentic Turn-based Policy Optimization via Tree Search
Paper
•
2601.04767
•
Published
•
28
Paper
•
2601.05111
•
Published
•
19
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
Paper
•
2601.02439
•
Published
•
16
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models
Paper
•
2601.03425
•
Published
•
16
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing
Paper
•
2601.04575
•
Published
•
9
DocDancer: Towards Agentic Document-Grounded Information Seeking
Paper
•
2601.05163
•
Published
•
5
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper
•
2601.02151
•
Published
•
108
AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering
Paper
•
2601.04620
•
Published
•
3
Evolving Programmatic Skill Networks
Paper
•
2601.03509
•
Published
•
84
Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning
Paper
•
2601.03872
•
Published
•
42
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
Paper
•
2601.05432
•
Published
•
166
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
Paper
•
2601.06002
•
Published
•
52
Agentic Rubrics as Contextual Verifiers for SWE Agents
Paper
•
2601.04171
•
Published
•
11
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
Paper
•
2601.06021
•
Published
•
45
MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
Paper
•
2601.02075
•
Published
•
8
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis
Paper
•
2601.05808
•
Published
•
36
Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts
Paper
•
2601.03315
•
Published
•
6
AgentOCR: Reimagining Agent History via Optical Self-Compression
Paper
•
2601.04786
•
Published
•
29
MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents
Paper
•
2601.03236
•
Published
•
3
Can We Predict Before Executing Machine Learning Agents?
Paper
•
2601.05930
•
Published
•
26
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
•
2601.05242
•
Published
•
222
An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift
Paper
•
2601.05882
•
Published
•
20
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
Paper
•
2601.05905
•
Published
•
18
SmartSearch: Process Reward-Guided Query Refinement for Search Agents
Paper
•
2601.04888
•
Published
•
10
Over-Searching in Search-Augmented Large Language Models
Paper
•
2601.05503
•
Published
•
6
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation
Paper
•
2601.04823
•
Published
•
6
Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning
Paper
•
2601.04726
•
Published
•
6
TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration
Paper
•
2601.04544
•
Published
•
6
IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck
Paper
•
2601.05870
•
Published
•
3
Distilling Feedback into Memory-as-a-Tool
Paper
•
2601.05960
•
Published
•
2
BabyVision: Visual Reasoning Beyond Language
Paper
•
2601.06521
•
Published
•
196
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
Paper
•
2601.05593
•
Published
•
83
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
Paper
•
2601.07226
•
Published
•
32
Dr. Zero: Self-Evolving Search Agents without Training Data
Paper
•
2601.07055
•
Published
•
20
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent
Paper
•
2601.07779
•
Published
•
28
Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction
Paper
•
2601.05107
•
Published
•
24
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration
Paper
•
2601.06860
•
Published
•
16
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era
Paper
•
2601.07526
•
Published
•
23
Forest Before Trees: Latent Superposition for Efficient Visual Reasoning
Paper
•
2601.06803
•
Published
•
10
TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning
Paper
•
2601.04698
•
Published
•
10
How Do Large Language Models Learn Concepts During Continual Pre-Training?
Paper
•
2601.03570
•
Published
•
4
OpenTinker: Separating Concerns in Agentic Reinforcement Learning
Paper
•
2601.07376
•
Published
•
6
ShowUI-Aloha: Human-Taught GUI Agent
Paper
•
2601.07181
•
Published
•
3
Are LLM Decisions Faithful to Verbal Confidence?
Paper
•
2601.07767
•
Published
•
4
Structured Episodic Event Memory
Paper
•
2601.06411
•
Published
•
4
Artificial Entanglement in the Fine-Tuning of Large Language Models
Paper
•
2601.06788
•
Published
•
3
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale
Paper
•
2601.08225
•
Published
•
52
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
Paper
•
2601.06487
•
Published
•
52
On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
Paper
•
2601.07389
•
Published
•
2
MemoBrain: Executive Memory as an Agentic Brain for Reasoning
Paper
•
2601.08079
•
Published
•
37
MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences
Paper
•
2601.06789
•
Published
•
78
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents
Paper
•
2601.07264
•
Published
•
24
Parallel Context-of-Experts Decoding for Retrieval Augmented Generation
Paper
•
2601.08670
•
Published
•
19
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization
Paper
•
2601.04582
•
Published
•
10
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
Paper
•
2601.08468
•
Published
•
6
EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs
Paper
•
2601.06786
•
Published
•
6
Controlled Self-Evolution for Algorithmic Code Optimization
Paper
•
2601.07348
•
Published
•
114
MAXS: Meta-Adaptive Exploration with LLM Agents
Paper
•
2601.09259
•
Published
•
95
EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines
Paper
•
2601.09465
•
Published
•
41
OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG
Paper
•
2601.09028
•
Published
•
33
ExpSeek: Self-Triggered Experience Seeking for Web Agents
Paper
•
2601.08605
•
Published
•
16
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models
Paper
•
2601.08955
•
Published
•
13
No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning
Paper
•
2601.06794
•
Published
•
4
The AI Hippocampus: How Far are We From Human Memory?
Paper
•
2601.09113
•
Published
•
5
DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing
Paper
•
2601.09609
•
Published
•
3
Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning
Paper
•
2601.09536
•
Published
•
5
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning
Paper
•
2601.04809
•
Published
•
3
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper
•
2601.08763
•
Published
•
147
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
Paper
•
2601.09667
•
Published
•
87
Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning
Paper
•
2601.07641
•
Published
•
46
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
Paper
•
2601.10402
•
Published
•
36
MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching
Paper
•
2601.10712
•
Published
•
24
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
Paper
•
2601.10129
•
Published
•
11
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution
Paper
•
2601.10657
•
Published
•
20
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Paper
•
2601.06431
•
Published
•
12
PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary
Paper
•
2601.10201
•
Published
•
8
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
Paper
•
2601.10338
•
Published
•
5
Memory Bank Compression for Continual Adaptation of Large Language Models
Paper
•
2601.00756
•
Published
•
2
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper
•
2601.09088
•
Published
•
62
Your Group-Relative Advantage Is Biased
Paper
•
2601.08521
•
Published
•
150
The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents
Paper
•
2601.11496
•
Published
•
47
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text
Paper
•
2601.10355
•
Published
•
39
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search
Paper
•
2601.11037
•
Published
•
17
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
Paper
•
2601.09195
•
Published
•
15
Reasoning Models Generate Societies of Thought
Paper
•
2601.10825
•
Published
•
14
PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records
Paper
•
2601.09636
•
Published
•
8
Language of Thought Shapes Output Diversity in Large Language Models
Paper
•
2601.11227
•
Published
•
9
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
Paper
•
2601.08808
•
Published
•
39
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper
•
2601.11004
•
Published
•
30
Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs
Paper
•
2601.11061
•
Published
•
7
YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation
Paper
•
2601.08441
•
Published
•
7
CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion
Paper
•
2601.09512
•
Published
•
4
Think3D: Thinking with Space for Spatial Reasoning
Paper
•
2601.13029
•
Published
•
47
Toward Efficient Agents: Memory, Tool learning, and Planning
Paper
•
2601.14192
•
Published
•
54
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution
Paper
•
2601.13761
•
Published
•
16
Aligning Agentic World Models via Knowledgeable Experience Learning
Paper
•
2601.13247
•
Published
•
15
Agentic-R: Learning to Retrieve for Agentic Search
Paper
•
2601.11888
•
Published
•
19
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Paper
•
2601.14249
•
Published
•
10
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning
Paper
•
2601.14209
•
Published
•
6
Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning
Paper
•
2601.13697
•
Published
•
4
Agentic Reasoning for Large Language Models
Paper
•
2601.12538
•
Published
•
194
Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance
Paper
•
2601.14171
•
Published
•
48
Behavior Knowledge Merge in Reinforced Agentic Models
Paper
•
2601.13572
•
Published
•
24
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
Paper
•
2601.14750
•
Published
•
17
Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics
Paper
•
2601.14027
•
Published
•
12
Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models
Paper
•
2601.14152
•
Published
•
5
The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems
Paper
•
2601.15059
•
Published
•
3
Facilitating Proactive and Reactive Guidance for Decision Making on the Web: A Design Probe with WebSeek
Paper
•
2601.15100
•
Published
•
3
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
Paper
•
2601.15876
•
Published
•
89
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper
•
2601.16206
•
Published
•
84
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
Paper
•
2601.15224
•
Published
•
12
Agentic Uncertainty Quantification
Paper
•
2601.15703
•
Published
•
8
Agentic Confidence Calibration
Paper
•
2601.15778
•
Published
•
5
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models
Paper
•
2601.15690
•
Published
•
4
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
Paper
•
2601.16746
•
Published
•
89
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
Paper
•
2601.16973
•
Published
•
40
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
Paper
•
2601.15808
•
Published
•
20
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper
•
2601.16443
•
Published
•
16
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
Paper
•
2601.15715
•
Published
•
13
ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch
Paper
•
2601.13606
•
Published
•
11
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences
Paper
•
2601.07251
•
Published
•
11
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
Paper
•
2601.11258
•
Published
•
8
Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization
Paper
•
2601.13118
•
Published
•
1
daVinci-Dev: Agent-native Mid-training for Software Engineering
Paper
•
2601.18418
•
Published
•
124
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
•
2601.18778
•
Published
•
40
Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents
Paper
•
2601.18217
•
Published
•
11
DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal
Paper
•
2601.18081
•
Published
•
7
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts
Paper
•
2601.17111
•
Published
•
5
Agentic Search in the Wild: Intents and Trajectory Dynamics from 14M+ Real Search Requests
Paper
•
2601.17617
•
Published
•
2
RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents
Paper
•
2601.18130
•
Published
•
1
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
Paper
•
2601.18631
•
Published
•
47
Self-Distillation Enables Continual Learning
Paper
•
2601.19897
•
Published
•
24
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Paper
•
2601.20614
•
Published
•
116
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Paper
•
2601.19325
•
Published
•
78
Reinforcement Learning via Self-Distillation
Paper
•
2601.20802
•
Published
•
39
Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning
Paper
•
2601.20209
•
Published
•
22
Linear representations in language models can change dramatically over a conversation
Paper
•
2601.20834
•
Published
•
21
SERA: Soft-Verified Efficient Repository Agents
Paper
•
2601.20789
•
Published
•
11
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
Paper
•
2601.19280
•
Published
•
9
OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution
Paper
•
2601.20380
•
Published
•
8
How AI Impacts Skill Formation
Paper
•
2601.20245
•
Published
•
8
VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning
Paper
•
2601.20055
•
Published
•
6
Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning
Paper
•
2601.20829
•
Published
•
5
Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives
Paper
•
2601.20833
•
Published
•
173
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper
•
2601.21204
•
Published
•
98
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation
Paper
•
2601.21420
•
Published
•
42
Exploring Reasoning Reward Model for Agents
Paper
•
2601.22154
•
Published
•
22
Language-based Trial and Error Falls Behind in the Era of Experience
Paper
•
2601.21754
•
Published
•
16
Self-Improving Pretraining: using post-trained models to pretrain better models
Paper
•
2601.21343
•
Published
•
16
Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening
Paper
•
2601.21590
•
Published
•
12
Beyond Imitation: Reinforcement Learning for Active Latent Planning
Paper
•
2601.21598
•
Published
•
9
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents
Paper
•
2601.20975
•
Published
•
9
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning
Paper
•
2601.22069
•
Published
•
7
Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels
Paper
•
2601.21268
•
Published
•
4
BMAM: Brain-inspired Multi-Agent Memory Framework
Paper
•
2601.20465
•
Published
•
4
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
Paper
•
2601.19001
•
Published
•
4
WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents
Paper
•
2601.21872
•
Published