Instructions to use MCG-NJU/UniAVGen with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use MCG-NJU/UniAVGen with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("MCG-NJU/UniAVGen", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
UniAVGen: Unified Audio and Video Generation with
Asymmetric Cross-Modal Interactions
Guozhen Zhang
·
Zixiang Zhou
·
Teng Hu
·
Ziqiao Peng
·
Youliang Zhang
Yi Chen
·
Yuan Zhou
·
Qinglin Lu
·
Limin Wang
MCG-NJU | Tencent Hunyuan
This repository is the checkpoint of paper "UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions". UniAVGen is a unified framework for high-fidelity joint audio-video generation, addressing key limitations of existing methods such as poor lip synchronization, insufficient semantic consistency, and limited task generalization.
Citation
If you think this project is helpful in your research or for application, please feel free to leave a star⭐️ and cite our paper:
@misc{zhang2025uniavgenunifiedaudiovideo,
title={UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions},
author={Guozhen Zhang and Zixiang Zhou and Teng Hu and Ziqiao Peng and Youliang Zhang and Yi Chen and Yuan Zhou and Qinglin Lu and Limin Wang},
year={2025},
eprint={2511.03334},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.03334},
}
- Downloads last month
- 187
Model tree for MCG-NJU/UniAVGen
Base model
Wan-AI/Wan2.2-TI2V-5B