LLM-Reasoning - a lihaocruiser Collection

lihaocruiser 's Collections

LLM-HF

LLM-RAG

LLM-SyntheticData

LLM-recomendation

LLM-Hallucination

LLM-Reasoning

updated Jul 1

Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 70

Note prompt erasing
Learning From Mistakes Makes LLM Better Reasoner

Paper • 2310.20689 • Published Oct 31, 2023 • 28

Note 在COT数据的基础上，增加【错误步骤的评语】+【修正的结果】进行训练
Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 10

Note Outcome-supervised RM与Process-supervised RM
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

Paper • 2308.00436 • Published Aug 1, 2023 • 21
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 74
Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Paper • 2305.10601 • Published May 17, 2023 • 11
Let's Reinforce Step by Step

Paper • 2311.05821 • Published Nov 10, 2023 • 1
TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 37
ProTIP: Progressive Tool Retrieval Improves Planning

Paper • 2312.10332 • Published Dec 16, 2023 • 7
The Impact of Reasoning Step Length on Large Language Models

Paper • 2401.04925 • Published Jan 10 • 16
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 101
Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30 • 47
Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions

Paper • 2404.18410 • Published Apr 29
NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance

Paper • 2405.00566 • Published May 1
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 109
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Paper • 2406.12050 • Published Jun 17 • 19
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

Paper • 2310.08559 • Published Oct 12, 2023 • 1
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

Paper • 2406.18629 • Published Jun 26 • 41