mmBERT: A Modern Multilingual Encoder with Annealed Language Learning Paper • 2509.06888 • Published Sep 8 • 12
mmBERT: A Modern Multilingual Encoder with Annealed Language Learning Paper • 2509.06888 • Published Sep 8 • 12
On the Theoretical Limitations of Embedding-Based Retrieval Paper • 2508.21038 • Published Aug 28 • 19
Certified Mitigation of Worst-Case LLM Copyright Infringement Paper • 2504.16046 • Published Apr 22 • 13
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published Mar 6 • 25
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data Paper • 2404.03862 • Published Apr 5, 2024
AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees Paper • 2404.08417 • Published Apr 12, 2024 • 1
Dated Data: Tracing Knowledge Cutoffs in Large Language Models Paper • 2403.12958 • Published Mar 19, 2024
MegaWika: Millions of reports and their sources across 50 diverse languages Paper • 2307.07049 • Published Jul 13, 2023
Defending Against Poisoning Attacks in Open-Domain Question Answering Paper • 2212.10002 • Published Dec 20, 2022
Learning to Reason via Program Generation, Emulation, and Search Paper • 2405.16337 • Published May 25, 2024
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation Paper • 2406.17186 • Published Jun 24, 2024 • 2
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Paper • 2409.11136 • Published Sep 17, 2024 • 24