The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published 1 day ago • 2
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published 1 day ago • 2 • 1
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published 1 day ago • 2
Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models Paper • 2510.13580 • Published 1 day ago • 1
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples Paper • 2510.07192 • Published 9 days ago • 2
view article Article Model statistics of the 50 most downloaded entities on Hugging Face By lbourdois • 4 days ago • 21
🥨 Bavarian NLP Papers Collection Awesome papers about Bavarian NLP • 11 items • Updated 7 days ago • 2
Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects Paper • 2510.07890 • Published 8 days ago • 1
MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs Paper • 2504.02768 • Published Apr 3 • 2
view article Article There is no such thing as a tokenizer-free lunch By catherinearnett • 22 days ago • 78
EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published 23 days ago • 38
Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish Paper • 2508.16431 • Published Aug 22 • 1