-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 34 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 44 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 29 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 44
Collections
Discover the best community collections!
Collections including paper arxiv:2412.16720
-
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 72 -
Maya: An Instruction Finetuned Multilingual Multimodal Model
Paper • 2412.07112 • Published • 26 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 29 -
Diving into Self-Evolving Training for Multimodal Reasoning
Paper • 2412.17451 • Published • 41
-
OpenAI o1 System Card
Paper • 2412.16720 • Published • 29 -
LearnLM: Improving Gemini for Learning
Paper • 2412.16429 • Published • 20 -
NILE: Internal Consistency Alignment in Large Language Models
Paper • 2412.16686 • Published • 8 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 37
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 84 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 44 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 29 -
Revisiting In-Context Learning with Long Context Language Models
Paper • 2412.16926 • Published • 27
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 8 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 46 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 71 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 88 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 18 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 25 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 26
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 114 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 42 -
Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 28
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 82 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 61 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 29 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 56
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 125 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 50 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 13 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 65