video - a zzfive Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

zzfive 's Collections

3d

image

LLMs

video

agent

cv

audio

robot

video

updated about 19 hours ago

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18 • 15
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18 • 8
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19 • 13
Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 86
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

Paper • 2402.00769 • Published Feb 1 • 20
VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20 • 22
Video ReCap: Recursive Captioning of Hour-Long Videos

Paper • 2402.13250 • Published Feb 20 • 25
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22 • 19
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88
Sora Generates Videos with Stunning Geometrical Consistency

Paper • 2402.17403 • Published Feb 27 • 16
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Paper • 2402.17723 • Published Feb 27 • 16
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 32
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5 • 34
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation

Paper • 2403.02827 • Published Mar 5 • 6
Video Editing via Factorized Diffusion Distillation

Paper • 2403.09334 • Published Mar 14 • 21
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14 • 13
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Paper • 2108.01073 • Published Aug 2, 2021 • 7
AnimateDiff-Lightning: Cross-Model Diffusion Distillation

Paper • 2403.12706 • Published Mar 19 • 17
Streaming Dense Video Captioning

Paper • 2404.01297 • Published Apr 1 • 11
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15 • 11
MotionMaster: Training-free Camera Motion Transfer For Video Generation

Paper • 2404.15789 • Published Apr 24 • 10
LLM-AD: Large Language Model based Audio Description System

Paper • 2405.00983 • Published May 2 • 16
FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19 • 53
ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published May 22 • 23
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Paper • 2405.14598 • Published May 23 • 11
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

Paper • 2405.15216 • Published May 24 • 12
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

Paper • 2405.16537 • Published May 26 • 16
Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Paper • 2405.15757 • Published May 24 • 14
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Paper • 2405.17405 • Published May 27 • 14
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Paper • 2405.17414 • Published May 27 • 10
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Paper • 2405.18386 • Published May 28 • 20
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Paper • 2405.18750 • Published May 29 • 21
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29 • 12
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Paper • 2405.20222 • Published May 30 • 10
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

Paper • 2405.19707 • Published May 30 • 5
Learning Temporally Consistent Video Depth from Video Diffusion Priors

Paper • 2406.01493 • Published Jun 3 • 18
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

Paper • 2406.00908 • Published Jun 3 • 12
Searching Priors Makes Text-to-Video Synthesis Better

Paper • 2406.03215 • Published Jun 5 • 11
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 72
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 23
VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6 • 23
MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Paper • 2406.05338 • Published Jun 8 • 39
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Paper • 2406.06523 • Published Jun 10 • 50
Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Paper • 2406.07792 • Published Jun 12 • 13
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Paper • 2406.07686 • Published Jun 11 • 14
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

Paper • 2406.08656 • Published Jun 12 • 7
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

Paper • 2406.08845 • Published Jun 13 • 8
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning

Paper • 2406.14130 • Published Jun 20 • 10
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21 • 14
Video-Infinity: Distributed Long Video Generation

Paper • 2406.16260 • Published Jun 24 • 28
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

Paper • 2407.01519 • Published Jul 1 • 22
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix

Paper • 2407.00367 • Published Jun 29 • 9
VIMI: Grounding Video Generation through Multi-modal Instruction

Paper • 2407.06304 • Published Jul 8 • 9
VEnhancer: Generative Space-Time Enhancement for Video Generation

Paper • 2407.07667 • Published Jul 10 • 14
Still-Moving: Customized Video Generation without Customized Video Data

Paper • 2407.08674 • Published Jul 11 • 12
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

Paper • 2407.06188 • Published Jul 8 • 1
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

Paper • 2407.09012 • Published Jul 12 • 8
Video Occupancy Models

Paper • 2407.09533 • Published Jun 25 • 6
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

Paper • 2407.10285 • Published Jul 14 • 4
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Paper • 2407.12781 • Published Jul 17 • 12
Towards Understanding Unsafe Video Generation

Paper • 2407.12581 • Published Jul 17
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

Paper • 2407.13759 • Published Jul 18 • 17
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

Paper • 2407.15642 • Published Jul 22 • 10
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

Paper • 2407.16655 • Published Jul 23 • 28
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Paper • 2407.14505 • Published Jul 19 • 25
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Paper • 2407.19918 • Published Jul 29 • 48
Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Paper • 2407.21705 • Published Jul 31 • 25
Fine-gained Zero-shot Video Sampling

Paper • 2407.21475 • Published Jul 31 • 5
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

Paper • 2408.00458 • Published Aug 1 • 10
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model

Paper • 2408.00762 • Published Aug 1 • 9
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Paper • 2408.02629 • Published Aug 5 • 13
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

Paper • 2408.03284 • Published Aug 6 • 10
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Paper • 2408.04631 • Published Aug 8 • 8
Kalman-Inspired Feature Propagation for Video Face Super-Resolution

Paper • 2408.05205 • Published Aug 9 • 8
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

Paper • 2408.08189 • Published Aug 15 • 15
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data

Paper • 2408.10119 • Published Aug 19 • 16
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models

Paper • 2408.11318 • Published Aug 21 • 54
TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Paper • 2408.11475 • Published Aug 21 • 17
Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22 • 15
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

Paper • 2408.13239 • Published Aug 23 • 11
Training-free Long Video Generation with Chain of Diffusion Model Experts

Paper • 2408.13423 • Published Aug 24 • 21
TVG: A Training-free Transition Video Generation Method with Diffusion Models

Paper • 2408.13413 • Published Aug 24 • 13
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Paper • 2408.15239 • Published Aug 27 • 28
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published Sep 2 • 12
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

Paper • 2409.01055 • Published Sep 2 • 6
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published Sep 4 • 89
OSV: One Step is Enough for High-Quality Image to Video Generation

Paper • 2409.11367 • Published Sep 17 • 13
Towards Diverse and Efficient Audio Captioning via Diffusion Models

Paper • 2409.09401 • Published Sep 14 • 6
LVCD: Reference-based Lineart Video Colorization with Diffusion Models

Paper • 2409.12960 • Published Sep 19 • 22
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation

Paper • 2409.12532 • Published Sep 19 • 5
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Paper • 2409.16160 • Published Sep 24 • 32
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Paper • 2409.18964 • Published Sep 27 • 25
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

Paper • 2410.04364 • Published Oct 6 • 27
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Paper • 2410.03051 • Published Oct 4 • 4
Pyramidal Flow Matching for Efficient Video Generative Modeling

Paper • 2410.05954 • Published Oct 8 • 37
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

Paper • 2410.05677 • Published Oct 8 • 14
Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Paper • 2410.02757 • Published Oct 3 • 36
Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published Oct 14 • 52
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

Paper • 2410.10774 • Published Oct 14 • 24
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

Paper • 2410.10816 • Published Oct 14 • 19
Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17 • 88
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control

Paper • 2410.13830 • Published Oct 17 • 23
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published 30 days ago • 24
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Paper • 2410.19355 • Published 28 days ago • 23
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale

Paper • 2410.20280 • Published 26 days ago • 21
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Paper • 2410.23277 • Published 22 days ago • 7
Fashion-VDM: Video Diffusion Model for Virtual Try-On

Paper • 2411.00225 • Published 21 days ago • 7
Adaptive Caching for Faster Video Generation with Diffusion Transformers

Paper • 2411.02397 • Published 17 days ago • 20
Motion Control for Enhanced Complex Action Video Generation

Paper • 2411.08328 • Published 9 days ago • 2
AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published 5 days ago • 18
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing

Paper • 2411.11045 • Published 4 days ago • 8
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations

Paper • 2411.10818 • Published 5 days ago • 15

Collection guide
Browse collections

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs