Video - a Carlosvirella100 Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Carlosvirella100 's Collections

CAMV

Video

Video

updated about 24 hours ago

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 15
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 56
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Paper • 2504.01956 • Published Apr 2 • 41
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Paper • 2506.23219 • Published Jun 29 • 7
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 42
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Paper • 2507.03336 • Published Jul 4 • 5
GTA1: GUI Test-time Scaling Agent

Paper • 2507.05791 • Published Jul 8 • 25
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

Paper • 2507.07136 • Published Jul 9 • 36
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective

Paper • 2507.08801 • Published Jul 11 • 30
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper • 2507.10524 • Published Jul 14 • 69
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16 • 41
OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Paper • 2507.09075 • Published Jul 11 • 14
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

Paper • 2507.10541 • Published Jul 14 • 29
Lizard: An Efficient Linearization Framework for Large Language Models

Paper • 2507.09025 • Published Jul 11 • 17
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs

Paper • 2507.08616 • Published Jul 11 • 13
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Paper • 2507.11407 • Published Jul 15 • 57
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Paper • 2507.13332 • Published Jul 17 • 48
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization

Paper • 2507.12142 • Published Jul 16 • 36
FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Paper • 2507.12720 • Published Jul 17 • 9
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Paper • 2507.13158 • Published Jul 17 • 24
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers

Paper • 2507.08422 • Published Jul 11 • 35
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Paper • 2507.15061 • Published Jul 20 • 53
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling

Paper • 2507.11061 • Published Jul 15 • 37
Gaussian Splatting with Discretized SDF for Relightable Assets

Paper • 2507.15629 • Published Jul 21 • 22
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22 • 119
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Paper • 2507.17745 • Published Jul 23 • 33
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21 • 61
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Jul 30 • 97
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Paper • 2507.23632 • Published Jul 31 • 6
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Paper • 2508.00819 • Published Aug 1 • 62
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published Aug 1 • 89
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 239
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Paper • 2508.03686 • Published about 1 month ago • 35
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 234
Efficient Agents: Building Effective Agents While Reducing Cost

Paper • 2508.02694 • Published Jul 24 • 85
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published about 1 month ago • 65
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Paper • 2508.02091 • Published Aug 4 • 13
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Paper • 2508.05731 • Published 28 days ago • 25
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published 25 days ago • 39
Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Paper • 2508.05547 • Published 28 days ago • 11
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published 27 days ago • 171
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

Paper • 2508.06600 • Published 27 days ago • 36
Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published 24 days ago • 27
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published 24 days ago • 44
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published 22 days ago • 54
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published 28 days ago • 45
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Paper • 2508.09968 • Published 22 days ago • 15
A Survey on Diffusion Language Models

Paper • 2508.10875 • Published 21 days ago • 33
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Paper • 2508.14896 • Published 15 days ago • 21
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Paper • 2508.10395 • Published 21 days ago • 40
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published 16 days ago • 35
Deep Think with Confidence

Paper • 2508.15260 • Published 15 days ago • 81
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization

Paper • 2508.14811 • Published 15 days ago • 39
UQ: Assessing Language Models on Unsolved Questions

Paper • 2508.17580 • Published 11 days ago • 14
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

Paper • 2508.17472 • Published 11 days ago • 26
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published 9 days ago • 14
Autoregressive Universal Video Segmentation Model

Paper • 2508.19242 • Published 9 days ago • 26
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published 8 days ago • 28
Self-Rewarding Vision-Language Model via Reasoning Decomposition

Paper • 2508.19652 • Published 8 days ago • 77
SpotEdit: Evaluating Visually-Guided Image Editing Methods

Paper • 2508.18159 • Published 10 days ago • 2
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks

Paper • 2508.15804 • Published 22 days ago • 14
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published 7 days ago • 85
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published 11 days ago • 76
Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published 7 days ago • 27
VibeVoice Technical Report

Paper • 2508.19205 • Published 9 days ago • 120
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Paper • 2508.20096 • Published 8 days ago • 35
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Paper • 2508.16072 • Published 14 days ago • 3
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published 7 days ago • 71
UItron: Foundational GUI Agent with Advanced Perception and Planning

Paper • 2508.21767 • Published 6 days ago • 11
Efficient Code Embeddings from Code Generation Models

Paper • 2508.21290 • Published 7 days ago • 11
TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

Paper • 2508.17677 • Published 11 days ago • 14
CLIPSym: Delving into Symmetry Detection with CLIP

Paper • 2508.14197 • Published 16 days ago • 6
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published 2 days ago • 91

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs