-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 82 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 42 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 28 -
Outcome-Refining Process Supervision for Code Generation
Paper • 2412.15118 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2412.17256
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 33 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 41 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 28 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 42
-
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 72 -
Maya: An Instruction Finetuned Multilingual Multimodal Model
Paper • 2412.07112 • Published • 25 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 29 -
Diving into Self-Evolving Training for Multimodal Reasoning
Paper • 2412.17451 • Published • 40
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 82 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 42 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 29 -
Revisiting In-Context Learning with Long Context Language Models
Paper • 2412.16926 • Published • 27
-
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 42 -
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 82 -
NILE: Internal Consistency Alignment in Large Language Models
Paper • 2412.16686 • Published • 8
-
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Paper • 2412.13171 • Published • 31 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 41 -
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
Paper • 2411.19943 • Published • 55 -
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper • 2412.01928 • Published • 39
-
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 8 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 64 -
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper • 2411.14405 • Published • 58
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 8 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 46 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 71 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 64 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 31 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 36 -
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper • 2412.01928 • Published • 39