My notification - a nithin12342 Collection

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published Jan 21 • 21

Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs

Paper • 2601.17058 • Published Jan 22 • 190

Less is More: Optimizing Function Calling for LLM Execution on Edge Devices

Paper • 2411.15399 • Published Nov 23, 2024 • 1

nvidia/personaplex-7b-v1

Audio-to-Audio • Updated 10 days ago • 477k • 2.27k

Qwen/Qwen3-ASR-0.6B

Automatic Speech Recognition • Updated Jan 30 • 338k • 234

Qwen3-ASR Technical Report

Paper • 2601.21337 • Published Jan 29 • 36

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 25

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Paper • 2601.22153 • Published Jan 29 • 73

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

Paper • 2601.20354 • Published Jan 28 • 111

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

Paper • 2601.21406 • Published Jan 29 • 5

Revisiting Parameter Server in LLM Post-Training

Paper • 2601.19362 • Published Jan 27 • 8

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Paper • 2601.21420 • Published Jan 29 • 42

SERA: Soft-Verified Efficient Repository Agents

Paper • 2601.20789 • Published Jan 28 • 13

moonshotai/Kimi-K2.5

Image-Text-to-Text • 1.1T • Updated 13 days ago • 2.69M • • 2.27k

DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Paper • 2601.22904 • Published Jan 30 • 15

Phr00t/LTX2-Rapid-Merges

Image-Text-to-Video • Updated 28 days ago • 336

ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

Paper • 2601.23184 • Published Jan 30 • 36

FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space

Paper • 2602.02092 • Published Feb 2 • 18

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Paper • 2602.02493 • Published Feb 2 • 44

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Paper • 2601.22628 • Published Jan 30 • 35

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Paper • 2602.02488 • Published Feb 2 • 33

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Paper • 2602.02185 • Published Feb 2 • 115

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Paper • 2601.21358 • Published Jan 29 • 7

Balancing Understanding and Generation in Discrete Diffusion Models

Paper • 2602.01362 • Published Feb 1 • 17

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Paper • 2602.03796 • Published Feb 3 • 62

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

Paper • 2602.01785 • Published Feb 2 • 95

LIVE: Long-horizon Interactive Video World Modeling

Paper • 2602.03747 • Published Feb 3 • 12

Qwen/Qwen3-Coder-Next

Text Generation • 80B • Updated Feb 3 • 1.16M • • 1.12k

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Paper • 2602.03510 • Published Feb 3 • 27

RISE-Video: Can Video Generators Decode Implicit World Rules?

Paper • 2602.05986 • Published Feb 5 • 26

FASA: Frequency-aware Sparse Attention

Paper • 2602.03152 • Published Feb 3 • 150

DFlash: Block Diffusion for Flash Speculative Decoding

Paper • 2602.06036 • Published Feb 5 • 43

GEBench: Benchmarking Image Generation Models as GUI Environments

Paper • 2602.09007 • Published about 1 month ago • 39

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

Paper • 2602.08236 • Published Feb 9 • 9

AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research

Paper • 2602.06540 • Published Feb 6 • 21

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

Paper • 2602.04649 • Published Feb 4 • 12

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published Feb 5 • 347

AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders

Paper • 2602.05027 • Published Feb 4 • 60

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Paper • 2602.06291 • Published Feb 6 • 23

Towards Autonomous Mathematics Research

Paper • 2602.10177 • Published 30 days ago • 36

Free(): Learning to Forget in Malloc-Only Reasoning Models

Paper • 2602.08030 • Published Feb 8 • 5

Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion

Paper • 2602.07775 • Published Feb 8 • 8

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Paper • 2602.08711 • Published about 1 month ago • 28

Qute: Towards Quantum-Native Database

Paper • 2602.14699 • Published 24 days ago • 13

Qwen/Qwen3.5-397B-A17B

Image-Text-to-Text • 403B • Updated 17 days ago • 1.66M • • 1.3k

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

Paper • 2602.16742 • Published 22 days ago • 12

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

Paper • 2602.20160 • Published 17 days ago • 10

From Perception to Action: An Interactive Benchmark for Vision Reasoning

Paper • 2602.21015 • Published 16 days ago • 23

SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

Paper • 2602.21818 • Published 15 days ago • 52

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Paper • 2602.24286 • Published 13 days ago • 85

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

Paper • 2603.00141 • Published 16 days ago • 134

RubricBench: Aligning Model-Generated Rubrics with Human Standards

Paper • 2603.01562 • Published 10 days ago • 57

SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Paper • 2602.23866 • Published 13 days ago • 82

Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published 12 days ago • 44

CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

Paper • 2603.04291 • Published 8 days ago • 13

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

Paper • 2603.04791 • Published 7 days ago • 16

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Paper • 2603.03646 • Published 8 days ago • 8

Utonia: Toward One Encoder for All Point Clouds

Paper • 2603.03283 • Published 9 days ago • 164

DreamWorld: Unified World Modeling in Video Generation

Paper • 2603.00466 • Published 12 days ago • 16

On-Policy Self-Distillation for Reasoning Compression

Paper • 2603.05433 • Published 7 days ago • 6

fal/virtual-tryoff-lora

Image-to-Image • Updated 6 days ago • 548 • 24

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published 6 days ago • 97

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

Paper • 2603.05888 • Published 6 days ago • 2

Scale Space Diffusion

Paper • 2603.08709 • Published 3 days ago • 11