Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2312.07537

Image / Video Gen

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion

Understanding Diffusion Models: A Unified Perspective

Paper • 2208.11970 • Published Aug 25, 2022
Tutorial on Diffusion Models for Imaging and Vision

Paper • 2403.18103 • Published Mar 26 • 2
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 3
Denoising Diffusion Implicit Models

Paper • 2010.02502 • Published Oct 6, 2020 • 3

Interesting things.

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1 • 11
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 602
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26 • 23
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 111

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Paper • 2403.13248 • Published Mar 20 • 77
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 45
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Paper • 2409.20551 • Published Sep 30 • 13

FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder

Paper • 2401.10032 • Published Jan 18 • 12
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Paper • 2401.04658 • Published Jan 9 • 24
FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 26
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing

Paper • 2312.05605 • Published Dec 9, 2023 • 1

aMUSEd: An Open MUSE Reproduction

Paper • 2401.01808 • Published Jan 3 • 28
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3 • 27
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Paper • 2401.00604 • Published Dec 31, 2023 • 4
LARP: Language-Agent Role Play for Open-World Games

Paper • 2312.17653 • Published Dec 24, 2023 • 30

One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

Paper • 2306.07967 • Published Jun 13, 2023 • 24
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

Paper • 2306.07954 • Published Jun 13, 2023 • 113
TryOnDiffusion: A Tale of Two UNets

Paper • 2306.08276 • Published Jun 14, 2023 • 73
Seeing the World through Your Eyes

Paper • 2306.09348 • Published Jun 15, 2023 • 33

Diffusion models

FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

Paper • 2311.13073 • Published Nov 22, 2023 • 56
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture

Paper • 2311.10123 • Published Nov 16, 2023 • 15
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 13
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Paper • 2312.00845 • Published Dec 1, 2023 • 36

OmnimatteRF: Robust Omnimatte with 3D Background Modeling

Paper • 2309.07749 • Published Sep 14, 2023 • 7
AudioSR: Versatile Audio Super-resolution at Scale

Paper • 2309.07314 • Published Sep 13, 2023 • 25
Generative Image Dynamics

Paper • 2309.07906 • Published Sep 14, 2023 • 52
MagiCapture: High-Resolution Multi-Concept Portrait Customization

Paper • 2309.06895 • Published Sep 13, 2023 • 27

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs