Collections
Discover the best community collections!
Collections including paper arxiv:2311.00871
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Paper • 2311.00871 • Published • 2 -
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Paper • 2205.05055 • Published • 2 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 35
-
Cognitive Architectures for Language Agents
Paper • 2309.02427 • Published • 8 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 48 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 70 -
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Paper • 2311.00871 • Published • 2
-
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 56 -
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Paper • 2307.01952 • Published • 82 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 82 -
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Paper • 2311.00871 • Published • 2
-
SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving
Paper • 2402.02519 • Published -
Mixtral of Experts
Paper • 2401.04088 • Published • 157 -
Optimal Transport Aggregation for Visual Place Recognition
Paper • 2311.15937 • Published -
GOAT: GO to Any Thing
Paper • 2311.06430 • Published • 14