samusenps's picture

In a Training Loop 🔄

samusenps

samusenps

·

AI & ML interests

New & Foundational Architectures, World Models, Multi-Modality, Interpretability , High Quality Training Data, Reproducible Open Source

Organizations

upvoted a paper 5 months ago

X-Part: high fidelity and structure coherent shape decomposition

Paper • 2509.08643 • Published Sep 10, 2025 • 28

upvoted 2 papers 6 months ago

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

Paper • 2509.09680 • Published Sep 11, 2025 • 43

Causal Attention with Lookahead Keys

Paper • 2509.07301 • Published Sep 9, 2025 • 21

upvoted 17 papers 7 months ago

Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds

Paper • 2508.14892 • Published Aug 20, 2025 • 9

"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

Paper • 2508.15752 • Published Aug 21, 2025 • 8

Visual Autoregressive Modeling for Instruction-Guided Image Editing

Paper • 2508.15772 • Published Aug 21, 2025 • 9

ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling

Paper • 2508.15767 • Published Aug 21, 2025 • 17

SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass

Paper • 2508.15769 • Published Aug 21, 2025 • 19

A Survey on Large Language Model Benchmarks

Paper • 2508.15361 • Published Aug 21, 2025 • 19

Waver: Wave Your Way to Lifelike Video Generation

Paper • 2508.15761 • Published Aug 21, 2025 • 36

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21, 2025 • 47

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21, 2025 • 65

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21, 2025 • 268

Representing Speech Through Autoregressive Prediction of Cochlear Tokens

Paper • 2508.11598 • Published Aug 15, 2025 • 17

S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

Paper • 2508.12880 • Published Aug 18, 2025 • 48

Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping

Paper • 2508.12466 • Published Aug 17, 2025 • 8

Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

Paper • 2508.13009 • Published Aug 18, 2025 • 25

4DNeX: Feed-Forward 4D Generative Modeling Made Easy

Paper • 2508.13154 • Published Aug 18, 2025 • 62

ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning

Paper • 2508.10419 • Published Aug 14, 2025 • 75

Next Visual Granularity Generation

Paper • 2508.12811 • Published Aug 18, 2025 • 49