-
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Paper • 2409.17481 • Published • 46 -
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling
Paper • 2409.14683 • Published • 8 -
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
Paper • 2409.17422 • Published • 23 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 89
Collections
Discover the best community collections!
Collections including paper arxiv:2409.17422
-
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Paper • 2408.15545 • Published • 34 -
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 62 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 40 -
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 38
-
PAS: Data-Efficient Plug-and-Play Prompt Augmentation System
Paper • 2407.06027 • Published • 8 -
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper • 2407.09025 • Published • 128 -
Toto: Time Series Optimized Transformer for Observability
Paper • 2407.07874 • Published • 29 -
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Paper • 2407.09413 • Published • 9
-
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Paper • 2407.02490 • Published • 23 -
Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations
Paper • 2406.13632 • Published • 5 -
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Paper • 2406.15319 • Published • 60 -
Make Your LLM Fully Utilize the Context
Paper • 2404.16811 • Published • 52
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 43