Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published May 14 • 71
Constitutional AI: Harmlessness from AI Feedback Paper • 2212.08073 • Published Dec 15, 2022 • 3
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective Paper • 2404.07200 • Published Apr 10, 2024 • 2
Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion Paper • 2410.19324 • Published Oct 25, 2024 • 1
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Jul 29 • 187
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference Paper • 2508.02193 • Published Aug 4 • 130
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training Paper • 2508.00414 • Published Aug 1 • 91
The Well Collection A 15TB collection of physics simulation datasets. • 18 items • Updated Mar 24 • 38
view article Article Transformers Are Getting Old: Variants and Alternatives Exist! By ProCreations • Jul 5 • 42
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1 • 36
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers Paper • 2504.10483 • Published Apr 14 • 20
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published Apr 11 • 130
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5, 2024 • 70
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published Feb 28 • 26
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 192