Uncovering mesa-optimization algorithms in Transformers Paper • 2309.05858 • Published Sep 11, 2023 • 12
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 65
FACET: Fairness in Computer Vision Evaluation Benchmark Paper • 2309.00035 • Published Aug 31, 2023 • 16
ConceptLab: Creative Generation using Diffusion Prior Constraints Paper • 2308.02669 • Published Aug 3, 2023 • 23
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 170
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding Paper • 2307.02499 • Published Jul 4, 2023 • 15
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 53
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language Paper • 2306.16410 • Published Jun 28, 2023 • 27
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \10,000 Budget; An Extra 4,000 Unlocks 81.8% Accuracy Paper • 2306.15658 • Published Jun 27, 2023 • 12
AudioPaLM: A Large Language Model That Can Speak and Listen Paper • 2306.12925 • Published Jun 22, 2023 • 53