-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 85 -
Pre-training Small Base LMs with Fewer Tokens
Paper • 2404.08634 • Published • 34 -
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Paper • 2405.15319 • Published • 25 -
Can LLMs Learn by Teaching? A Preliminary Study
Paper • 2406.14629 • Published • 17
Collections
Discover the best community collections!
Collections including paper arxiv:2406.10023
-
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 82 -
Aligning Teacher with Student Preferences for Tailored Training Data Generation
Paper • 2406.19227 • Published • 24 -
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 24 -
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Paper • 2404.03820 • Published • 24
-
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 37 -
Pandora: Towards General World Model with Natural Language Actions and Video States
Paper • 2406.09455 • Published • 14 -
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 14 -
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper • 2406.11194 • Published • 15
-
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 14 -
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper • 2405.19332 • Published • 15 -
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Paper • 2405.19107 • Published • 13 -
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper • 2406.00888 • Published • 30
-
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets
Paper • 2404.05623 • Published • 3 -
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper • 2405.19332 • Published • 15 -
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Paper • 2406.12168 • Published • 7 -
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Paper • 2406.10023 • Published • 2
-
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 38 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 25 -
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
Paper • 2403.03194 • Published • 12 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 24