Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Paper • 2408.12528 • Published Aug 22 • 50
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs Paper • 2407.03963 • Published Jul 4 • 15
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published Jul 2 • 23
SSMs Collection A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers. • 5 items • Updated Oct 1 • 26
The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models Paper • 2404.05904 • Published Apr 8 • 8
SpaceByte: Towards Deleting Tokenization from Large Language Modeling Paper • 2404.14408 • Published Apr 22 • 6
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Sep 25 • 683
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published Apr 11 • 42
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Paper • 2402.19427 • Published Feb 29 • 52
Gemma release Collection Groups the Gemma models released by the Google team. • 40 items • Updated Jul 31 • 325
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 104