File size: 2,566 Bytes
e509017
 
 
 
 
 
 
91950d8
c5f5d69
91950d8
 
 
 
 
 
 
 
c5f5d69
 
 
 
91950d8
 
 
 
6cb07f5
 
e6c91d1
91950d8
 
 
 
 
 
 
 
 
 
 
 
 
 
2c110fc
 
 
 
 
 
e509017
725712e
 
61a7197
 
725712e
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
pipeline_tag: text-to-video
license: other
license_name: tencent-hunyuan-community
license_link: LICENSE
---

<p align="center">
  <img src="assets/logo.jpg"  height=30>
</p>

# FastHunyuan Model Card

## Model Details

FastHunyuan is an accelerated [HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo) model. It can sample high quality videos with 6 diffusion steps. That brings around 8X speed up compared to the original HunyuanVideo with 50 steps.

- **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/)
- **License**:  tencent-hunyuan-community
- **Distilled from**: [HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo)
- **Github Repository**: https://github.com/hao-ai-lab/FastVideo

## Usage

- Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README.
- Alternatively, you can inference FastHunyuan using the official [Hunyuan Video repository](https://github.com/Tencent/HunyuanVideo) by  **setting the shift to 17 and steps to 6, resolution to 720X1280X125, and cfg bigger than 6**.
We find that a large CFG scale generally leads to faster videos.

## Training details

FastHunyuan is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters: 
- Batch size: 16
- Resulotion: 720x1280
- Num of frames: 125
- Train steps: 320
- GPUs: 32
- LR: 1e-6
- Loss: huber

## Evaluation
We provide some qualitative comparison between FastHunyuan 6 step inference v.s. the original Hunyuan with 6 step inference: 

| FastHunyuan 6 step | Hunyuan 6 step |
| --- | --- |
| ![FastHunyuan 6 step](assets/distilled/1.gif) | ![Hunyuan 6 step](assets/undistilled/1.gif) |
| ![FastHunyuan 6 step](assets/distilled/2.gif) | ![Hunyuan 6 step](assets/undistilled/2.gif) |
| ![FastHunyuan 6 step](assets/distilled/3.gif) | ![Hunyuan 6 step](assets/undistilled/3.gif) |
| ![FastHunyuan 6 step](assets/distilled/4.gif) | ![Hunyuan 6 step](assets/undistilled/4.gif) |

## Memory requirements

Please check our github repo for details. https://github.com/hao-ai-lab/FastVideo

For inference, we can inference FastHunyuan on single RTX4090. We now support NF4 and LLM-INT8 quantized inference using BitsAndBytes for FastHunyuan. With NF4 quantization, inference can be performed on a single RTX 4090 GPU, requiring just 20GB of VRAM.

For Lora Finetune, minimum hardware requirement
- 40 GB GPU memory each for 2 GPUs with lora
- 30 GB GPU memory each for 2 GPUs with CPU offload and lora.