Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis Paper • 2601.21709 • Published 4 days ago • 2
FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation Paper • 2601.23182 • Published 3 days ago • 18
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers Paper • 2601.04890 • Published 25 days ago • 41
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published Dec 31, 2025 • 63
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper • 2512.20578 • Published Dec 23, 2025 • 85
LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper • 2512.15745 • Published Dec 10, 2025 • 83
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published Nov 13, 2025 • 99
GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver Paper • 2510.17699 • Published Oct 20, 2025 • 25
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20, 2024 • 176
Quantum Variational Activation Functions Empower Kolmogorov-Arnold Networks Paper • 2509.14026 • Published Sep 17, 2025 • 5
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published Aug 26, 2025 • 36
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21, 2025 • 474
Dens3R: A Foundation Model for 3D Geometry Prediction Paper • 2507.16290 • Published Jul 22, 2025 • 9