Large Language Models Do NOT Really Know What They Don't Know Paper • 2510.09033 • Published 18 days ago • 16
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published 21 days ago • 107
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG Paper • 2510.03663 • Published 24 days ago • 15
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs Paper • 2509.22582 • Published Sep 26 • 10
F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data Paper • 2510.02294 • Published 25 days ago • 42
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks Paper • 2510.02286 • Published 25 days ago • 28
The Rogue Scalpel: Activation Steering Compromises LLM Safety Paper • 2509.22067 • Published Sep 26 • 27
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published 26 days ago • 26
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published 27 days ago • 107
jina-reranker-v3: Last but Not Late Interaction for Document Reranking Paper • 2509.25085 • Published 28 days ago • 6
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published 27 days ago • 510
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games Paper • 2509.01052 • Published Sep 1 • 20
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding Paper • 2508.21496 • Published Aug 29 • 54
VibeVoice Collection Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ • 5 items • Updated Sep 1 • 129
Story2Board: A Training-Free Approach for Expressive Storyboard Generation Paper • 2508.09983 • Published Aug 13 • 68
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search Paper • 2507.15245 • Published Jul 21 • 11
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Paper • 2507.02778 • Published Jul 3 • 9