Say Anything but This: When Tokenizer Betrays Reasoning in LLMs Paper • 2601.14658 • Published 1 day ago • 1
GutenOCR: A Grounded Vision-Language Front-End for Documents Paper • 2601.14490 • Published 2 days ago • 13
view article Article How We Built a Semantic Highlight Model To Save Token Cost for RAG 8 days ago • 57
It's All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models Paper • 2601.08500 • Published 9 days ago • 1
Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis Paper • 2512.22100 • Published 27 days ago • 3
Bolmo: Byteifying the Next Generation of Language Models Paper • 2512.15586 • Published Dec 17, 2025 • 17 • 3
Bolmo: Byteifying the Next Generation of Language Models Paper • 2512.15586 • Published Dec 17, 2025 • 17
FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition Paper • 2512.13884 • Published Dec 15, 2025 • 15