Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 88
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 138
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning Paper • 2505.24298 • Published May 30 • 28
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning Paper • 2505.20355 • Published May 26 • 35
Interleaved Reasoning for Large Language Models via Reinforcement Learning Paper • 2505.19640 • Published May 26 • 14
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow Paper • 2505.17399 • Published May 23 • 14
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26 • 43
One RL to See Them All: Visual Triple Unified Reinforcement Learning Paper • 2505.18129 • Published May 23 • 59
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published May 20 • 62
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper • 2505.16410 • Published May 22 • 56
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published May 19 • 36
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18 • 24
Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models Paper • 2506.01320 • Published Jun 2 • 16
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published May 29 • 29
A Controllable Examination for Long-Context Language Models Paper • 2506.02921 • Published Jun 3 • 33
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Paper • 2506.01674 • Published Jun 2 • 28
CodeContests+: High-Quality Test Case Generation for Competitive Programming Paper • 2506.05817 • Published Jun 6 • 9
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published Jun 1 • 30
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior Paper • 2506.08012 • Published Jun 9 • 7
Dreamland: Controllable World Creation with Simulator and Generative Models Paper • 2506.08006 • Published Jun 9 • 7
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published Jun 6 • 73
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation Paper • 2506.07530 • Published Jun 9 • 20
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Paper • 2506.09040 • Published Jun 10 • 34
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published Jun 9 • 18
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework Paper • 2506.02454 • Published Jun 3 • 7
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Paper • 2506.04614 • Published Jun 5 • 18
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning Paper • 2506.06205 • Published Jun 6 • 30
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team Paper • 2506.14234 • Published Jun 17 • 41
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers Paper • 2506.14702 • Published Jun 17 • 3
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation Paper • 2506.06962 • Published Jun 8 • 28
LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning Paper • 2506.10082 • Published Jun 11 • 8
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20 • 23
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation Paper • 2506.17202 • Published Jun 20 • 10
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Paper • 2506.18896 • Published Jun 23 • 29
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19 • 88
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion Paper • 2507.02813 • Published Jul 3 • 60
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model Paper • 2507.01953 • Published Jul 2 • 19
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Paper • 2506.17930 • Published Jun 22 • 19
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30 • 50
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky Paper • 2507.03336 • Published Jul 4 • 5
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published Jul 8 • 43
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs Paper • 2507.05687 • Published Jul 8 • 27
Coding Triangle: How Does Large Language Model Understand Code? Paper • 2507.06138 • Published Jul 8 • 21
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning Paper • 2507.05920 • Published Jul 8 • 11
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs Paper • 2507.03253 • Published Jul 4 • 18
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Paper • 2507.07996 • Published Jul 10 • 34
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Paper • 2507.08801 • Published Jul 11 • 30
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 256
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization Paper • 2507.15061 • Published Jul 20 • 58
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17 • 41
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 120
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper • 2507.14958 • Published Jul 20 • 46
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback Paper • 2507.15024 • Published Jul 20 • 13
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting Paper • 2507.15454 • Published Jul 21 • 7
Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models Paper • 2507.14241 • Published Jul 17 • 17
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation Paper • 2507.18537 • Published Jul 24 • 17
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos Paper • 2507.15597 • Published Jul 21 • 33
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Paper • 2507.15852 • Published Jul 21 • 38
FLEXITOKENS: Flexible Tokenization for Evolving Language Models Paper • 2507.12720 • Published Jul 17 • 9
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization Paper • 2507.12142 • Published Jul 16 • 36
Replacing thinking with tool usage enables reasoning in small language models Paper • 2507.05065 • Published Jul 7 • 15
Lizard: An Efficient Linearization Framework for Large Language Models Paper • 2507.09025 • Published Jul 11 • 18
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment Paper • 2507.20984 • Published Jul 28 • 56
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25 • 30
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities Paper • 2507.19766 • Published Jul 26 • 14
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published Jul 30 • 46
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models Paper • 2508.00819 • Published Aug 1 • 62
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following Paper • 2508.02150 • Published Aug 4 • 36
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training Paper • 2508.00414 • Published Aug 1 • 91
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective Paper • 2507.23632 • Published Jul 31 • 6
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving Paper • 2507.23726 • Published Jul 31 • 112
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension Paper • 2508.01959 • Published Aug 3 • 56
Tool-integrated Reinforcement Learning for Repo Deep Search Paper • 2508.03012 • Published Aug 5 • 20
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization Paper • 2508.05731 • Published Aug 7 • 25
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh Paper • 2508.01242 • Published Aug 2 • 9
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 46
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents Paper • 2508.05954 • Published Aug 8 • 6
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments Paper • 2508.08791 • Published Aug 12 • 16
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning Paper • 2508.03501 • Published Aug 5 • 56
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery Paper • 2508.08401 • Published Aug 11 • 42
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing Paper • 2508.09192 • Published Aug 8 • 30
Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping Paper • 2508.12466 • Published Aug 17 • 8
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study Paper • 2508.13142 • Published Aug 18 • 33
VertexRegen: Mesh Generation with Continuous Level of Detail Paper • 2508.09062 • Published Aug 12 • 36
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization Paper • 2508.10395 • Published Aug 14 • 42
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer Paper • 2508.10893 • Published Aug 14 • 31
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper • 2508.11987 • Published Aug 16 • 69
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published Aug 14 • 18
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published Aug 14 • 27
UI-Venus Technical Report: Building High-performance UI Agents with RFT Paper • 2508.10833 • Published Aug 14 • 42
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models Paper • 2508.09968 • Published Aug 13 • 15
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search Paper • 2508.02091 • Published Aug 4 • 13
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Paper • 2508.15760 • Published Aug 21 • 46
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published Aug 20 • 82
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs Paper • 2508.14896 • Published Aug 20 • 21
PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs Paper • 2508.17188 • Published Aug 24 • 16
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning Paper • 2508.16949 • Published Aug 23 • 22
Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation Paper • 2508.18032 • Published Aug 25 • 40
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling Paper • 2508.16745 • Published Aug 22 • 28
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published Aug 26 • 36
Do What? Teaching Vision-Language-Action Models to Reject the Impossible Paper • 2508.16292 • Published Aug 22 • 9
MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting Paper • 2508.17811 • Published Aug 25 • 6
FastMesh:Efficient Artistic Mesh Generation via Component Decoupling Paper • 2508.19188 • Published Aug 26 • 16
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models Paper • 2508.18773 • Published Aug 26 • 15
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space Paper • 2508.19247 • Published Aug 26 • 41
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published Aug 24 • 80
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published Aug 28 • 89
Provable Benefits of In-Tool Learning for Large Language Models Paper • 2508.20755 • Published Aug 28 • 11
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models Paper • 2508.21365 • Published Aug 29 • 27
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR Paper • 2509.02522 • Published Sep 2 • 25
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
Universal Deep Research: Bring Your Own Model and Strategy Paper • 2509.00244 • Published Aug 29 • 13
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations Paper • 2509.03405 • Published Sep 3 • 23
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? Paper • 2509.04292 • Published Sep 4 • 57
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4 • 73
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench Paper • 2508.20931 • Published Aug 28 • 15
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers Paper • 2509.03059 • Published Sep 3 • 24
NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings Paper • 2509.04011 • Published Sep 4 • 28
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers Paper • 2509.06493 • Published Sep 8 • 11
Reinforcement Learning Foundations for Deep Research Systems: A Survey Paper • 2509.06733 • Published Sep 8 • 31
Reconstruction Alignment Improves Unified Multimodal Models Paper • 2509.07295 • Published Sep 8 • 39
Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published Sep 9 • 81
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8 • 56
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9 • 98
Towards General Agentic Intelligence via Environment Scaling Paper • 2509.13311 • Published 25 days ago • 69
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning Paper • 2509.13305 • Published 25 days ago • 86
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation Paper • 2509.10708 • Published 29 days ago • 17
HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering Paper • 2509.09713 • Published Sep 8 • 24
FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published 23 days ago • 104
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning Paper • 2509.13755 • Published 25 days ago • 19
C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning Paper • 2507.16518 • Published Jul 22 • 2
WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance Paper • 2509.15130 • Published 23 days ago • 30
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published 23 days ago • 33
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning Paper • 2509.13761 • Published 25 days ago • 16
A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning Paper • 2509.15937 • Published 22 days ago • 18
BaseReward: A Strong Baseline for Multimodal Reward Model Paper • 2509.16127 • Published 22 days ago • 21
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks Paper • 2509.14638 • Published 24 days ago • 11
Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents Paper • 2509.15233 • Published 25 days ago • 2
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer Paper • 2509.16197 • Published 22 days ago • 51
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification Paper • 2509.15591 • Published 23 days ago • 45
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent Paper • 2509.15566 • Published 23 days ago • 10
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published Jan 18 • 15
Table as Thought: Exploring Structured Thoughts in LLM Reasoning Paper • 2501.02152 • Published Jan 4
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning Paper • 2412.09078 • Published Dec 12, 2024
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-Reflection Paper • 2412.08024 • Published Dec 11, 2024 • 1
LLM2: Let Large Language Models Harness System 2 Reasoning Paper • 2412.20372 • Published Dec 29, 2024
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective Paper • 2501.11110 • Published Jan 19 • 4
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Paper • 2412.15797 • Published Dec 20, 2024 • 18
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement Paper • 2412.12881 • Published Dec 17, 2024 • 2
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models Paper • 2509.17627 • Published 19 days ago • 63
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper • 2509.18824 • Published 19 days ago • 21
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory Paper • 2509.14662 • Published 24 days ago • 13
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published 18 days ago • 116
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published 13 days ago • 110
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping Paper • 2509.21880 • Published 16 days ago • 39
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer Paper • 2509.22414 • Published 15 days ago • 21
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Paper • 2509.22601 • Published 15 days ago • 28
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published 15 days ago • 126
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published 12 days ago • 49
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Paper • 2509.26628 • Published 11 days ago • 12
ReviewScore: Misinformed Peer Review Detection with Large Language Models Paper • 2509.21679 • Published 16 days ago • 62
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks Paper • 2508.15804 • Published Aug 14 • 15
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published 12 days ago • 124
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published 12 days ago • 43
BroRL: Scaling Reinforcement Learning via Broadened Exploration Paper • 2510.01180 • Published 10 days ago • 16
Interactive Training: Feedback-Driven Neural Network Optimization Paper • 2510.02297 • Published 9 days ago • 38
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models Paper • 2509.25848 • Published 12 days ago • 56
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published 10 days ago • 22
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published 11 days ago • 102
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published 11 days ago • 38
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models Paper • 2510.03561 • Published 8 days ago • 22
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers Paper • 2309.08532 • Published Sep 15, 2023 • 53
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback Paper • 2307.14936 • Published Jul 27, 2023 • 41
Factuality Matters: When Image Generation and Editing Meet Structured Visuals Paper • 2510.05091 • Published 5 days ago • 15
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training Paper • 2510.04996 • Published 5 days ago • 14
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published 5 days ago • 86
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs Paper • 2510.05069 • Published 5 days ago • 11
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information Paper • 2510.03632 • Published 8 days ago • 38
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published 10 days ago • 52