Library - a JuanRafap Collection

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 402 • 98

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 141

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Paper • 2505.24298 • Published May 30 • 28

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Paper • 2505.19914 • Published May 26 • 43

One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23 • 60

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Paper • 2505.14810 • Published May 20 • 62

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Paper • 2505.16410 • Published May 22 • 58

JULI: Jailbreak Large Language Models by Self-Introspection

Paper • 2505.11790 • Published May 17

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 36

Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 38

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 80

CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

Paper • 2505.12504 • Published May 18 • 24

Neuro-Symbolic Query Compiler

Paper • 2505.11932 • Published May 17 • 18

Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

Paper • 2506.01320 • Published Jun 2 • 16

Aligning Latent Spaces with Flow Priors

Paper • 2506.05240 • Published Jun 5 • 27

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Paper • 2506.00070 • Published May 29 • 29

A Controllable Examination for Long-Context Language Models

Paper • 2506.02921 • Published Jun 3 • 33

MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs

Paper • 2506.01674 • Published Jun 2 • 28

CodeContests+: High-Quality Test Case Generation for Competitive Programming

Paper • 2506.05817 • Published Jun 6 • 9

FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion

Paper • 2506.01111 • Published Jun 1 • 30

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Paper • 2506.08012 • Published Jun 9 • 7

Dreamland: Controllable World Creation with Simulator and Generative Models

Paper • 2506.08006 • Published Jun 9 • 7

Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6 • 73

BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Paper • 2506.07530 • Published Jun 9 • 20

Solving Inequality Proofs with Large Language Models

Paper • 2506.07927 • Published Jun 9 • 20

Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better

Paper • 2506.09040 • Published Jun 10 • 34

Through the Valley: Path to Effective Long CoT Training for Small Language Models

Paper • 2506.07712 • Published Jun 9 • 18

Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework

Paper • 2506.02454 • Published Jun 3 • 7

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

Paper • 2506.04614 • Published Jun 5 • 19

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

Paper • 2506.06205 • Published Jun 6 • 30

Magistral

Paper • 2506.10910 • Published Jun 12 • 65

Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

Paper • 2506.14234 • Published Jun 17 • 41

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

Paper • 2506.14702 • Published Jun 17 • 3

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Paper • 2506.06962 • Published Jun 8 • 28

DoTA-RAG: Dynamic of Thought Aggregation RAG

Paper • 2506.12571 • Published Jun 14 • 50

syftr: Pareto-Optimal Generative AI

Paper • 2505.20266 • Published May 26

Scaling Test-time Compute for LLM Agents

Paper • 2506.12928 • Published Jun 15 • 63

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

Paper • 2506.10082 • Published Jun 11 • 8

General-Reasoner: Advancing LLM Reasoning Across All Domains

Paper • 2505.14652 • Published May 20 • 24

Optimizing Length Compression in Large Reasoning Models

Paper • 2506.14755 • Published Jun 17 • 10

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation

Paper • 2506.17202 • Published Jun 20 • 10

ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Paper • 2506.18896 • Published Jun 23 • 29

Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19 • 88

Robust Reward Modeling via Causal Rubrics

Paper • 2506.16507 • Published Jun 19 • 9

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Paper • 2507.02813 • Published Jul 3 • 60

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

Paper • 2507.01953 • Published Jul 2 • 19

Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective

Paper • 2506.17930 • Published Jun 22 • 19

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30 • 50

katanemo/Arch-Router-1.5B

Text Generation • 2B • Updated 17 days ago • 2.83k • • 228

Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Paper • 2507.03336 • Published Jul 4 • 6

SingLoRA: Low Rank Adaptation Using a Single Matrix

Paper • 2507.05566 • Published Jul 8 • 112

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 43

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Paper • 2507.05687 • Published Jul 8 • 27

Coding Triangle: How Does Large Language Model Understand Code?

Paper • 2507.06138 • Published Jul 8 • 21

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

Paper • 2507.05920 • Published Jul 8 • 11

RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

Paper • 2507.03253 • Published Jul 4 • 18

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Paper • 2507.07996 • Published Jul 10 • 34

Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective

Paper • 2507.08801 • Published Jul 11 • 30

A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259

WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Paper • 2507.15061 • Published Jul 20 • 60

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Paper • 2507.12841 • Published Jul 17 • 41

Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22 • 121

MUR: Momentum Uncertainty guided Reasoning for Large Language Models

Paper • 2507.14958 • Published Jul 20 • 46

Does More Inference-Time Compute Really Help Robustness?

Paper • 2507.15974 • Published Jul 21 • 7

RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

Paper • 2507.15024 • Published Jul 20 • 14

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting

Paper • 2507.15454 • Published Jul 21 • 7

Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models

Paper • 2507.14241 • Published Jul 17 • 17

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation

Paper • 2507.18537 • Published Jul 24 • 17

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

Paper • 2507.15597 • Published Jul 21 • 34

A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning

Paper • 2507.14295 • Published Jul 18 • 13

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

Paper • 2507.15852 • Published Jul 21 • 38

FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Paper • 2507.12720 • Published Jul 17 • 9

RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization

Paper • 2507.12142 • Published Jul 16 • 36

Replacing thinking with tool usage enables reasoning in small language models

Paper • 2507.05065 • Published Jul 7 • 15

Lizard: An Efficient Linearization Framework for Large Language Models

Paper • 2507.09025 • Published Jul 11 • 18

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 156

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 156

Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21 • 67

SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Paper • 2507.20984 • Published Jul 28 • 56

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25 • 31

Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published Jul 28 • 31

UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities

Paper • 2507.19766 • Published Jul 26 • 14

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Paper • 2507.22607 • Published Jul 30 • 46

Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Paper • 2508.00819 • Published Aug 1 • 62

Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Paper • 2508.02150 • Published Aug 4 • 36

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published Aug 1 • 92

On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Paper • 2507.23632 • Published Jul 31 • 6

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Paper • 2507.23726 • Published Jul 31 • 113

SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension

Paper • 2508.01959 • Published Aug 3 • 56

Tool-integrated Reinforcement Learning for Repo Deep Search

Paper • 2508.03012 • Published Aug 5 • 20

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Paper • 2508.05731 • Published Aug 7 • 25

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Paper • 2508.01242 • Published Aug 2 • 11

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 49

Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11 • 29

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

Paper • 2508.05954 • Published Aug 8 • 6

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

Paper • 2508.08791 • Published Aug 12 • 16

Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5 • 59

Complex Logical Instruction Generation

Paper • 2508.09125 • Published Aug 12 • 40

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

Paper • 2508.08401 • Published Aug 11 • 42

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

Paper • 2508.09192 • Published Aug 8 • 30

Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping

Paper • 2508.12466 • Published Aug 17 • 8

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

Paper • 2508.13142 • Published Aug 18 • 34

VertexRegen: Mesh Generation with Continuous Level of Detail

Paper • 2508.09062 • Published Aug 12 • 38

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Paper • 2508.10395 • Published Aug 14 • 42

STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer

Paper • 2508.10893 • Published Aug 14 • 31

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Paper • 2508.11987 • Published Aug 16 • 71

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28

UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14 • 44

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Paper • 2508.09968 • Published Aug 13 • 15

CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Paper • 2508.02091 • Published Aug 4 • 12

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21 • 46

Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21 • 88

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Paper • 2508.14460 • Published Aug 20 • 84

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Paper • 2508.14896 • Published Aug 20 • 22

PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs

Paper • 2508.17188 • Published Aug 24 • 17

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published Aug 23 • 22

Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

Paper • 2508.18032 • Published Aug 25 • 42

Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling

Paper • 2508.16745 • Published Aug 22 • 29

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26 • 36

Do What? Teaching Vision-Language-Action Models to Reject the Impossible

Paper • 2508.16292 • Published Aug 22 • 9

MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting

Paper • 2508.17811 • Published Aug 25 • 6

FastMesh:Efficient Artistic Mesh Generation via Component Decoupling

Paper • 2508.19188 • Published Aug 26 • 17

Spacer: Towards Engineered Scientific Inspiration

Paper • 2508.17661 • Published Aug 25 • 32

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published Aug 26 • 15

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26 • 42

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24 • 80

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89

Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published Aug 28 • 11

Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

Paper • 2508.21365 • Published Aug 29 • 29

Efficient Code Embeddings from Code Generation Models

Paper • 2508.21290 • Published Aug 29 • 19

CLIPSym: Delving into Symmetry Detection with CLIP

Paper • 2508.14197 • Published Aug 19 • 8

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

Paper • 2509.02522 • Published Sep 2 • 25

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2 • 83

Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29 • 13

LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 69

Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Paper • 2509.04292 • Published Sep 4 • 57

Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4 • 74

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published Aug 28 • 15

Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3 • 24

NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings

Paper • 2509.04011 • Published Sep 4 • 28

Symbolic Graphics Programming with Large Language Models

Paper • 2509.05208 • Published Sep 5 • 46

Bootstrapping Task Spaces for Self-Improvement

Paper • 2509.04575 • Published Sep 4 • 5

Behavioral Fingerprinting of Large Language Models

Paper • 2509.04504 • Published Sep 2 • 5

Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

Paper • 2509.06493 • Published Sep 8 • 11

Reinforcement Learning Foundations for Deep Research Systems: A Survey

Paper • 2509.06733 • Published Sep 8 • 32

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8 • 40

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Paper • 2509.06949 • Published Sep 8 • 55

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 100

Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16 • 70

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Paper • 2509.13305 • Published Sep 16 • 90

SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation

Paper • 2509.10708 • Published Sep 12 • 17

HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering

Paper • 2509.09713 • Published Sep 8 • 24

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18 • 113

Single-stream Policy Optimization

Paper • 2509.13232 • Published Sep 16 • 33

World Modeling with Probabilistic Structure Integration

Paper • 2509.09737 • Published Sep 10 • 13

Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning

Paper • 2509.13755 • Published Sep 17 • 19

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

Paper • 2507.16518 • Published Jul 22 • 2

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Paper • 2509.15130 • Published Sep 18 • 30

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Paper • 2509.15194 • Published Sep 18 • 33

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Paper • 2509.13761 • Published Sep 17 • 16

A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

Paper • 2509.15937 • Published Sep 19 • 20

BaseReward: A Strong Baseline for Multimodal Reward Model

Paper • 2509.16127 • Published Sep 19 • 21

MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks

Paper • 2509.14638 • Published Sep 18 • 11

Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents

Paper • 2509.15233 • Published Sep 17 • 2

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19 • 56

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

Paper • 2509.15591 • Published Sep 19 • 45

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Paper • 2509.15566 • Published Sep 19 • 14

Mano Report

Paper • 2509.17336 • Published Sep 22 • 10

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published Jan 18 • 15

Table as Thought: Exploring Structured Thoughts in LLM Reasoning

Paper • 2501.02152 • Published Jan 4

Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning

Paper • 2412.09078 • Published Dec 12, 2024

TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-Reflection

Paper • 2412.08024 • Published Dec 11, 2024 • 1

LLM2: Let Large Language Models Harness System 2 Reasoning

Paper • 2412.20372 • Published Dec 29, 2024

Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective

Paper • 2501.11110 • Published Jan 19 • 4

Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning

Paper • 2412.15797 • Published Dec 20, 2024 • 18

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

Paper • 2412.12881 • Published Dec 17, 2024 • 2

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Paper • 2509.17627 • Published Sep 22 • 66

Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

Paper • 2509.18824 • Published Sep 23 • 22

Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory

Paper • 2509.14662 • Published Sep 18 • 13

VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Paper • 2509.19803 • Published Sep 24 • 118

Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25 • 87

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Paper • 2509.24006 • Published Sep 28 • 116

Fine-tuning Done Right in Model Editing

Paper • 2509.22072 • Published Sep 26 • 28

No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

Paper • 2509.21880 • Published Sep 26 • 52

LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

Paper • 2509.22414 • Published Sep 26 • 21

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

Paper • 2509.22601 • Published Sep 26 • 29

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26 • 133

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26 • 68

AutoIntent: AutoML for Text Classification

Paper • 2509.21138 • Published Sep 25 • 35

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30 • 55

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published Sep 30 • 15

Sequential Diffusion Language Models

Paper • 2509.24007 • Published Sep 28 • 45

ReviewScore: Misinformed Peer Review Detection with Large Language Models

Paper • 2509.21679 • Published Sep 25 • 63

ReviewRL: Towards Automated Scientific Review with RL

Paper • 2508.10308 • Published Aug 14

ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks

Paper • 2508.15804 • Published Aug 14 • 15

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29 • 138

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published Sep 30 • 47

BroRL: Scaling Reinforcement Learning via Broadened Exploration

Paper • 2510.01180 • Published Oct 1 • 17

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1 • 88

Interactive Training: Feedback-Driven Neural Network Optimization

Paper • 2510.02297 • Published Oct 2 • 42

More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

Paper • 2509.25848 • Published Sep 30 • 79

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Paper • 2510.01591 • Published Oct 2 • 26

LongCodeZip: Compress Long Context for Code Language Models

Paper • 2510.00446 • Published Oct 1 • 107

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Paper • 2510.00515 • Published Oct 1 • 39

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

Paper • 2510.03561 • Published Oct 3 • 24

Large Language Models as Optimizers

Paper • 2309.03409 • Published Sep 7, 2023 • 77

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 53

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Paper • 2307.14936 • Published Jul 27, 2023 • 41

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

Paper • 2510.05091 • Published Oct 6 • 18

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

Paper • 2510.04996 • Published Oct 6 • 15

Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6 • 115

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

Paper • 2510.05069 • Published Oct 6 • 12

MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

Paper • 2510.03632 • Published Oct 4 • 42

Large Reasoning Models Learn Better Alignment from Flawed Thinking

Paper • 2510.00938 • Published Oct 1 • 58

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 489

Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6 • 30

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 265

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6 • 48

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3 • 74

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13 • 175

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Paper • 2510.09507 • Published Oct 10 • 10

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6 • 121

Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models

Paper • 2510.08492 • Published Oct 9 • 8

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Paper • 2510.09577 • Published Oct 10 • 6

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9 • 35

Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Paper • 2510.09201 • Published Oct 10 • 49

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 163

UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Paper • 2510.13515 • Published Oct 15 • 11

Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training

Paper • 2510.12586 • Published Oct 14 • 107

Understanding DeepResearch via Reports

Paper • 2510.07861 • Published Oct 9 • 6

RAG-Anything: All-in-One RAG Framework

Paper • 2510.12323 • Published Oct 14 • 49

The Art of Scaling Reinforcement Learning Compute for LLMs

Paper • 2510.13786 • Published Oct 15 • 30

Glyph: Scaling Context Windows via Visual-Text Compression

Paper • 2510.17800 • Published Oct 20 • 67

LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Paper • 2510.19363 • Published Oct 22 • 61

Unified Reinforcement and Imitation Learning for Vision-Language Models

Paper • 2510.19307 • Published Oct 22 • 28

Attention Is All You Need for KV Cache in Diffusion LLMs

Paper • 2510.14973 • Published Oct 16 • 39

Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

Paper • 2510.14967 • Published Oct 16 • 33

Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19 • 7

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Paper • 2510.19779 • Published Oct 22 • 59

Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

Paper • 2510.19304 • Published Oct 22 • 23

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published Oct 23 • 18

ReCode: Unify Plan and Action for Universal Granularity Control

Paper • 2510.23564 • Published Oct 27 • 119

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16 • 47

Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

Paper • 2510.23473 • Published Oct 27 • 83

World Simulation with Video Foundation Models for Physical AI

Paper • 2511.00062 • Published Oct 28 • 40

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

Paper • 2510.24411 • Published Oct 28 • 70

The End of Manual Decoding: Towards Truly End-to-End Language Models

Paper • 2510.26697 • Published Oct 30 • 114

The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Paper • 2511.04217 • Published 27 days ago • 15

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published 28 days ago • 121

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29 • 218

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

Paper • 2511.06307 • Published 24 days ago • 50

Black-Box On-Policy Distillation of Large Language Models

Paper • 2511.10643 • Published 20 days ago • 46

DoPE: Denoising Rotary Position Embedding

Paper • 2511.09146 • Published 21 days ago • 91

Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

Paper • 2511.14210 • Published 15 days ago • 19

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Paper • 2511.15605 • Published 14 days ago • 22

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Paper • 2511.16664 • Published 13 days ago • 24

TiDAR: Think in Diffusion, Talk in Autoregression

Paper • 2511.08923 • Published 21 days ago • 108

MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Paper • 2511.06805 • Published 23 days ago • 12

The Path Not Taken: RLVR Provably Learns Off the Principals

Paper • 2511.08567 • Published 22 days ago • 31

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29 • 44

FARMER: Flow AutoRegressive Transformer over Pixels

Paper • 2510.23588 • Published Oct 27 • 57

Parallel Loop Transformer for Efficient Test-Time Computation Scaling

Paper • 2510.24824 • Published Oct 28 • 15

LLM-guided Hierarchical Retrieval

Paper • 2510.13217 • Published Oct 15 • 19

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16 • 15

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23 • 55

GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

Paper • 2511.17592 • Published 16 days ago • 117

Virtual Width Networks

Paper • 2511.11238 • Published 19 days ago • 35

Flow Map Distillation Without Data

Paper • 2511.19428 • Published 9 days ago • 4

Monet: Reasoning in Latent Visual Space Beyond Images and Language

Paper • 2511.21395 • Published 7 days ago • 14

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs

Paper • 2511.19773 • Published 8 days ago • 9

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

Paper • 2511.20102 • Published 8 days ago • 26

Architecture Decoupling Is Not All You Need For Unified Multimodal Model

Paper • 2511.22663 • Published 6 days ago • 27

SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs

Paper • 2512.00722 • Published 3 days ago • 13

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 2 days ago • 61