Update README.md
Browse files
README.md
CHANGED
@@ -14,22 +14,29 @@ language:
|
|
14 |
# Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
|
15 |
|
16 |
|
17 |
-
|
18 |
This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/).
|
19 |
|
20 |
-
> [**Hunyuan-DiT
|
21 |
-
|
22 |
-
>
|
23 |
|
24 |
-
|
25 |
-
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
|
|
28 |
|
|
|
29 |
|
30 |
-
|
31 |
|
32 |
-
Welcome to [Tencent Hunyuan Bot](https://hunyuan.tencent.com/bot/chat), where you can explore our innovative products! Just input the suggested prompts below or any other **imaginative prompts containing drawing-related keywords** to activate the Hunyuan text-to-image generation feature. You can use **simple prompts** as well as **multi-turn language interactions** to create the picture. Unleash your creativity and create any picture you desire, **all for free!**
|
33 |
> 画一只穿着西装的猪
|
34 |
>
|
35 |
> draw a pig in a suit
|
@@ -38,18 +45,38 @@ Welcome to [Tencent Hunyuan Bot](https://hunyuan.tencent.com/bot/chat), where yo
|
|
38 |
>
|
39 |
> generate a painting, cyberpunk style, sports car
|
40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
## 📑 Open-source Plan
|
42 |
|
43 |
- Hunyuan-DiT (Text-to-Image Model)
|
44 |
- [x] Inference
|
45 |
- [x] Checkpoints
|
46 |
-
- [
|
47 |
-
- [
|
48 |
-
- [
|
|
|
|
|
|
|
|
|
|
|
49 |
- [DialogGen](https://github.com/Centaurusalpha/DialogGen) (Prompt Enhancement Model)
|
50 |
-
- [x] Inference
|
51 |
- [X] Web Demo (Gradio)
|
|
|
52 |
- [X] Cli Demo
|
|
|
|
|
|
|
|
|
53 |
|
54 |
## Contents
|
55 |
- [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding)
|
@@ -62,10 +89,17 @@ Welcome to [Tencent Hunyuan Bot](https://hunyuan.tencent.com/bot/chat), where yo
|
|
62 |
- [📜 Requirements](#-requirements)
|
63 |
- [🛠 Dependencies and Installation](#%EF%B8%8F-dependencies-and-installation)
|
64 |
- [🧱 Download Pretrained Models](#-download-pretrained-models)
|
|
|
|
|
|
|
|
|
65 |
- [🔑 Inference](#-inference)
|
66 |
- [Using Gradio](#using-gradio)
|
|
|
67 |
- [Using Command Line](#using-command-line)
|
68 |
- [More Configurations](#more-configurations)
|
|
|
|
|
69 |
- [🔗 BibTeX](#-bibtex)
|
70 |
|
71 |
## **Abstract**
|
@@ -145,7 +179,7 @@ In order to comprehensively compare the generation capabilities of HunyuanDiT an
|
|
145 |
|
146 |
* **Multi-turn Text2Image Generation**
|
147 |
|
148 |
-
|
149 |
|
150 |
|
151 |
|
@@ -155,15 +189,14 @@ In order to comprehensively compare the generation capabilities of HunyuanDiT an
|
|
155 |
|
156 |
This repo consists of DialogGen (a prompt enhancement model) and Hunyuan-DiT (a text-to-image model).
|
157 |
|
158 |
-
The following table shows the requirements for running the models (
|
159 |
|
160 |
-
| Model
|
161 |
-
|
162 |
-
| DialogGen + Hunyuan-DiT
|
163 |
-
|
|
164 |
-
|
165 |
-
|
166 |
-
| Hunyuan-DiT | ✔ | 1 | ? | A100 | -->
|
167 |
|
168 |
* An NVIDIA GPU with CUDA support is required.
|
169 |
* We have tested V100 and A100 GPUs.
|
@@ -174,15 +207,17 @@ The following table shows the requirements for running the models (The TensorRT
|
|
174 |
## 🛠️ Dependencies and Installation
|
175 |
|
176 |
Begin by cloning the repository:
|
177 |
-
```
|
178 |
git clone https://github.com/tencent/HunyuanDiT
|
179 |
cd HunyuanDiT
|
180 |
```
|
181 |
|
|
|
|
|
182 |
We provide an `environment.yml` file for setting up a Conda environment.
|
183 |
Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html).
|
184 |
|
185 |
-
```
|
186 |
# 1. Prepare conda environment
|
187 |
conda env create -f environment.yml
|
188 |
|
@@ -199,37 +234,158 @@ python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.1.
|
|
199 |
## 🧱 Download Pretrained Models
|
200 |
To download the model, first install the huggingface-cli. (Detailed instructions are available [here](https://huggingface.co/docs/huggingface_hub/guides/cli).)
|
201 |
|
202 |
-
```
|
203 |
python -m pip install "huggingface_hub[cli]"
|
204 |
```
|
205 |
|
206 |
Then download the model using the following commands:
|
207 |
|
208 |
-
```
|
209 |
# Create a directory named 'ckpts' where the model will be saved, fulfilling the prerequisites for running the demo.
|
210 |
mkdir ckpts
|
211 |
# Use the huggingface-cli tool to download the model.
|
212 |
# The download time may vary from 10 minutes to 1 hour depending on network conditions.
|
213 |
huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts
|
214 |
```
|
215 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
216 |
|
217 |
All models will be automatically downloaded. For more information about the model, visit the Hugging Face repository [here](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT).
|
218 |
|
219 |
-
| Model | #Params |
|
220 |
-
|
221 |
-
| mT5 | 1.6B | [mT5](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/mt5) |
|
222 |
-
| CLIP | 350M | [CLIP](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/clip_text_encoder) |
|
223 |
-
|
|
224 |
-
|
|
225 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
226 |
|
227 |
|
228 |
## 🔑 Inference
|
229 |
|
230 |
### Using Gradio
|
231 |
|
232 |
-
Make sure
|
233 |
|
234 |
```shell
|
235 |
# By default, we start a Chinese UI.
|
@@ -244,13 +400,61 @@ python app/hydit_app.py --no-enhance
|
|
244 |
|
245 |
# Start with English UI
|
246 |
python app/hydit_app.py --lang en
|
|
|
|
|
|
|
|
|
247 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
248 |
|
249 |
### Using Command Line
|
250 |
|
251 |
-
We provide
|
252 |
|
253 |
-
```
|
254 |
# Prompt Enhancement + Text-to-Image. Torch mode
|
255 |
python sample_t2i.py --prompt "渔舟唱晚"
|
256 |
|
@@ -262,6 +466,10 @@ python sample_t2i.py --infer-mode fa --prompt "渔舟唱晚"
|
|
262 |
|
263 |
# Generate an image with other image sizes.
|
264 |
python sample_t2i.py --prompt "渔舟唱晚" --image-size 1280 768
|
|
|
|
|
|
|
|
|
265 |
```
|
266 |
|
267 |
More example prompts can be found in [example_prompts.txt](example_prompts.txt)
|
@@ -277,14 +485,63 @@ We list some more useful configurations for easy usage:
|
|
277 |
| `--seed` | 42 | The random seed for generating images |
|
278 |
| `--infer-steps` | 100 | The number of steps for sampling |
|
279 |
| `--negative` | - | The negative prompt for image generation |
|
280 |
-
| `--infer-mode` | torch |
|
281 |
| `--sampler` | ddpm | The diffusion sampler (ddpm, ddim, or dpmms) |
|
282 |
| `--no-enhance` | False | Disable the prompt enhancement model |
|
283 |
| `--model-root` | ckpts | The root directory of the model checkpoints |
|
284 |
| `--load-key` | ema | Load the student model or EMA model (ema or module) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
285 |
|
|
|
|
|
|
|
|
|
286 |
|
287 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
288 |
If you find [Hunyuan-DiT](https://arxiv.org/abs/2405.08748) or [DialogGen](https://arxiv.org/abs/2403.08857) useful for your research and applications, please cite using this BibTeX:
|
289 |
|
290 |
```BibTeX
|
@@ -303,4 +560,14 @@ If you find [Hunyuan-DiT](https://arxiv.org/abs/2405.08748) or [DialogGen](https
|
|
303 |
journal={arXiv preprint arXiv:2403.08857},
|
304 |
year={2024}
|
305 |
}
|
306 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
# Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
|
15 |
|
16 |
|
|
|
17 |
This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/).
|
18 |
|
19 |
+
> [**Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://arxiv.org/abs/2405.08748) <br>
|
20 |
+
|
21 |
+
> [**DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation**](https://arxiv.org/abs/2403.08857) <br>
|
22 |
|
23 |
+
## 🔥🔥🔥 News!!
|
24 |
+
* Jun 13, 2024: :zap: HYDiT-v1.1 version is released, which mitigates the issue of image oversaturation and alleviates the watermark issue. Please check [HunyuanDiT-v1.1 ](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.1) and
|
25 |
+
[Distillation-v1.1](https://huggingface.co/Tencent-Hunyuan/Distillation-v1.1) for more details.
|
26 |
+
* Jun 13, 2024: :truck: The training code is released, offering [full-parameter training](#full-parameter-training) and [LoRA training](#lora).
|
27 |
+
* Jun 06, 2024: :tada: Hunyuan-DiT is now available in ComfyUI. Please check [ComfyUI](#using-comfyui) for more details.
|
28 |
+
* Jun 06, 2024: 🚀 We introduce Distillation version for Hunyuan-DiT acceleration, which achieves **50%** acceleration on NVIDIA GPUs. Please check [Distillation](https://huggingface.co/Tencent-Hunyuan/Distillation) for more details.
|
29 |
+
* Jun 05, 2024: 🤗 Hunyuan-DiT is now available in 🤗 Diffusers! Please check the [example](#using--diffusers) below.
|
30 |
+
* Jun 04, 2024: :globe_with_meridians: Support Tencent Cloud links to download the pretrained models! Please check the [links](#-download-pretrained-models) below.
|
31 |
+
* May 22, 2024: 🚀 We introduce TensorRT version for Hunyuan-DiT acceleration, which achieves **47%** acceleration on NVIDIA GPUs. Please check [TensorRT-libs](https://huggingface.co/Tencent-Hunyuan/TensorRT-libs) for instructions.
|
32 |
+
* May 22, 2024: 💬 We support demo running multi-turn text2image generation now. Please check the [script](#using-gradio) below.
|
33 |
|
34 |
+
## 🤖 Try it on the web
|
35 |
|
36 |
+
Welcome to our web-based [**Tencent Hunyuan Bot**](https://hunyuan.tencent.com/bot/chat), where you can explore our innovative products! Just input the suggested prompts below or any other **imaginative prompts containing drawing-related keywords** to activate the Hunyuan text-to-image generation feature. Unleash your creativity and create any picture you desire, **all for free!**
|
37 |
|
38 |
+
You can use simple prompts similar to natural language text
|
39 |
|
|
|
40 |
> 画一只穿着西装的猪
|
41 |
>
|
42 |
> draw a pig in a suit
|
|
|
45 |
>
|
46 |
> generate a painting, cyberpunk style, sports car
|
47 |
|
48 |
+
or multi-turn language interactions to create the picture.
|
49 |
+
|
50 |
+
> 画一个木制的鸟
|
51 |
+
>
|
52 |
+
> draw a wooden bird
|
53 |
+
>
|
54 |
+
> 变成玻璃的
|
55 |
+
>
|
56 |
+
> turn into glass
|
57 |
+
|
58 |
## 📑 Open-source Plan
|
59 |
|
60 |
- Hunyuan-DiT (Text-to-Image Model)
|
61 |
- [x] Inference
|
62 |
- [x] Checkpoints
|
63 |
+
- [x] Distillation Version
|
64 |
+
- [x] TensorRT Version
|
65 |
+
- [x] Training
|
66 |
+
- [x] Lora
|
67 |
+
- [ ] Controlnet (Pose, Canny, Depth, Tile)
|
68 |
+
- [ ] IP-adapter
|
69 |
+
- [ ] Hunyuan-DiT-XL checkpoints (0.7B model)
|
70 |
+
- [ ] Caption model (Re-caption the raw image-text pairs)
|
71 |
- [DialogGen](https://github.com/Centaurusalpha/DialogGen) (Prompt Enhancement Model)
|
72 |
+
- [x] Inference
|
73 |
- [X] Web Demo (Gradio)
|
74 |
+
- [x] Multi-turn T2I Demo (Gradio)
|
75 |
- [X] Cli Demo
|
76 |
+
- [X] ComfyUI
|
77 |
+
- [X] Diffusers
|
78 |
+
- [ ] WebUI
|
79 |
+
|
80 |
|
81 |
## Contents
|
82 |
- [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding)
|
|
|
89 |
- [📜 Requirements](#-requirements)
|
90 |
- [🛠 Dependencies and Installation](#%EF%B8%8F-dependencies-and-installation)
|
91 |
- [🧱 Download Pretrained Models](#-download-pretrained-models)
|
92 |
+
- [:truck: Training](#truck-training)
|
93 |
+
- [Data Preparation](#data-preparation)
|
94 |
+
- [Full Parameter Training](#full-parameter-training)
|
95 |
+
- [LoRA](#lora)
|
96 |
- [🔑 Inference](#-inference)
|
97 |
- [Using Gradio](#using-gradio)
|
98 |
+
- [Using Diffusers](#using--diffusers)
|
99 |
- [Using Command Line](#using-command-line)
|
100 |
- [More Configurations](#more-configurations)
|
101 |
+
- [Using ComfyUI](#using-comfyui)
|
102 |
+
- [🚀 Acceleration (for Linux)](#-acceleration-for-linux)
|
103 |
- [🔗 BibTeX](#-bibtex)
|
104 |
|
105 |
## **Abstract**
|
|
|
179 |
|
180 |
* **Multi-turn Text2Image Generation**
|
181 |
|
182 |
+
https://github.com/Tencent/tencent.github.io/assets/27557933/94b4dcc3-104d-44e1-8bb2-dc55108763d1
|
183 |
|
184 |
|
185 |
|
|
|
189 |
|
190 |
This repo consists of DialogGen (a prompt enhancement model) and Hunyuan-DiT (a text-to-image model).
|
191 |
|
192 |
+
The following table shows the requirements for running the models (batch size = 1):
|
193 |
|
194 |
+
| Model | --load-4bit (DialogGen) | GPU Peak Memory | GPU |
|
195 |
+
|:-----------------------:|:-----------------------:|:---------------:|:---------------:|
|
196 |
+
| DialogGen + Hunyuan-DiT | ✘ | 32G | A100 |
|
197 |
+
| DialogGen + Hunyuan-DiT | ✔ | 22G | A100 |
|
198 |
+
| Hunyuan-DiT | - | 11G | A100 |
|
199 |
+
| Hunyuan-DiT | - | 14G | RTX3090/RTX4090 |
|
|
|
200 |
|
201 |
* An NVIDIA GPU with CUDA support is required.
|
202 |
* We have tested V100 and A100 GPUs.
|
|
|
207 |
## 🛠️ Dependencies and Installation
|
208 |
|
209 |
Begin by cloning the repository:
|
210 |
+
```shell
|
211 |
git clone https://github.com/tencent/HunyuanDiT
|
212 |
cd HunyuanDiT
|
213 |
```
|
214 |
|
215 |
+
### Installation Guide for Linux
|
216 |
+
|
217 |
We provide an `environment.yml` file for setting up a Conda environment.
|
218 |
Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html).
|
219 |
|
220 |
+
```shell
|
221 |
# 1. Prepare conda environment
|
222 |
conda env create -f environment.yml
|
223 |
|
|
|
234 |
## 🧱 Download Pretrained Models
|
235 |
To download the model, first install the huggingface-cli. (Detailed instructions are available [here](https://huggingface.co/docs/huggingface_hub/guides/cli).)
|
236 |
|
237 |
+
```shell
|
238 |
python -m pip install "huggingface_hub[cli]"
|
239 |
```
|
240 |
|
241 |
Then download the model using the following commands:
|
242 |
|
243 |
+
```shell
|
244 |
# Create a directory named 'ckpts' where the model will be saved, fulfilling the prerequisites for running the demo.
|
245 |
mkdir ckpts
|
246 |
# Use the huggingface-cli tool to download the model.
|
247 |
# The download time may vary from 10 minutes to 1 hour depending on network conditions.
|
248 |
huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts
|
249 |
```
|
250 |
+
|
251 |
+
<details>
|
252 |
+
<summary>💡Tips for using huggingface-cli (network problem)</summary>
|
253 |
+
|
254 |
+
##### 1. Using HF-Mirror
|
255 |
+
|
256 |
+
If you encounter slow download speeds in China, you can try a mirror to speed up the download process. For example,
|
257 |
+
|
258 |
+
```shell
|
259 |
+
HF_ENDPOINT=https://hf-mirror.com huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts
|
260 |
+
```
|
261 |
+
|
262 |
+
##### 2. Resume Download
|
263 |
+
|
264 |
+
`huggingface-cli` supports resuming downloads. If the download is interrupted, you can just rerun the download
|
265 |
+
command to resume the download process.
|
266 |
+
|
267 |
+
Note: If an `No such file or directory: 'ckpts/.huggingface/.gitignore.lock'` like error occurs during the download
|
268 |
+
process, you can ignore the error and rerun the download command.
|
269 |
+
|
270 |
+
</details>
|
271 |
+
|
272 |
+
---
|
273 |
|
274 |
All models will be automatically downloaded. For more information about the model, visit the Hugging Face repository [here](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT).
|
275 |
|
276 |
+
| Model | #Params | Huggingface Download URL | Tencent Cloud Download URL |
|
277 |
+
|:------------------:|:-------:|:-------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------:|
|
278 |
+
| mT5 | 1.6B | [mT5](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/mt5) | [mT5](https://dit.hunyuan.tencent.com/download/HunyuanDiT/mt5.zip) |
|
279 |
+
| CLIP | 350M | [CLIP](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/clip_text_encoder) | [CLIP](https://dit.hunyuan.tencent.com/download/HunyuanDiT/clip_text_encoder.zip) |
|
280 |
+
| Tokenizer | - | [Tokenizer](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/tokenizer) | [Tokenizer](https://dit.hunyuan.tencent.com/download/HunyuanDiT/tokenizer.zip) |
|
281 |
+
| DialogGen | 7.0B | [DialogGen](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/dialoggen) | [DialogGen](https://dit.hunyuan.tencent.com/download/HunyuanDiT/dialoggen.zip) |
|
282 |
+
| sdxl-vae-fp16-fix | 83M | [sdxl-vae-fp16-fix](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/sdxl-vae-fp16-fix) | [sdxl-vae-fp16-fix](https://dit.hunyuan.tencent.com/download/HunyuanDiT/sdxl-vae-fp16-fix.zip) |
|
283 |
+
| Hunyuan-DiT | 1.5B | [Hunyuan-DiT](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/model) | [Hunyuan-DiT](https://dit.hunyuan.tencent.com/download/HunyuanDiT/model.zip) |
|
284 |
+
| Data demo | - | - | [Data demo](https://dit.hunyuan.tencent.com/download/HunyuanDiT/data_demo.zip) |
|
285 |
+
|
286 |
+
## :truck: Training
|
287 |
+
|
288 |
+
### Data Preparation
|
289 |
+
|
290 |
+
Refer to the commands below to prepare the training data.
|
291 |
+
|
292 |
+
1. Install dependencies
|
293 |
+
|
294 |
+
We offer an efficient data management library, named IndexKits, supporting the management of reading hundreds of millions of data during training, see more in [docs](./IndexKits/README.md).
|
295 |
+
```shell
|
296 |
+
# 1 Install dependencies
|
297 |
+
cd HunyuanDiT
|
298 |
+
pip install -e ./IndexKits
|
299 |
+
```
|
300 |
+
2. Data download
|
301 |
+
|
302 |
+
Feel free to download the [data demo](https://dit.hunyuan.tencent.com/download/HunyuanDiT/data_demo.zip).
|
303 |
+
```shell
|
304 |
+
# 2 Data download
|
305 |
+
wget -O ./dataset/data_demo.zip https://dit.hunyuan.tencent.com/download/HunyuanDiT/data_demo.zip
|
306 |
+
unzip ./dataset/data_demo.zip -d ./dataset
|
307 |
+
mkdir ./dataset/porcelain/arrows ./dataset/porcelain/jsons
|
308 |
+
```
|
309 |
+
3. Data conversion
|
310 |
+
|
311 |
+
Create a CSV file for training data with the fields listed in the table below.
|
312 |
+
|
313 |
+
| Fields | Required | Description | Example |
|
314 |
+
|:---------------:| :------: |:----------------:|:-----------:|
|
315 |
+
| `image_path` | Required | image path | `./dataset/porcelain/images/0.png` |
|
316 |
+
| `text_zh` | Required | text | 青花瓷风格,一只蓝色的鸟儿站在蓝色的花瓶上,周围点缀着白色花朵,背景是白色 |
|
317 |
+
| `md5` | Optional | image md5 (Message Digest Algorithm 5) | `d41d8cd98f00b204e9800998ecf8427e` |
|
318 |
+
| `width` | Optional | image width | `1024 ` |
|
319 |
+
| `height` | Optional | image height | ` 1024 ` |
|
320 |
+
|
321 |
+
> ⚠️ Optional fields like MD5, width, and height can be omitted. If omitted, the script below will automatically calculate them. This process can be time-consuming when dealing with large-scale training data.
|
322 |
+
|
323 |
+
We utilize [Arrow](https://github.com/apache/arrow) for training data format, offering a standard and efficient in-memory data representation. A conversion script is provided to transform CSV files into Arrow format.
|
324 |
+
```shell
|
325 |
+
# 3 Data conversion
|
326 |
+
python ./hydit/data_loader/csv2arrow.py ./dataset/porcelain/csvfile/image_text.csv ./dataset/porcelain/arrows
|
327 |
+
```
|
328 |
+
|
329 |
+
4. Data Selection and Configuration File Creation
|
330 |
+
|
331 |
+
We configure the training data through YAML files. In these files, you can set up standard data processing strategies for filtering, copying, deduplicating, and more regarding the training data. For more details, see [docs](IndexKits/docs/MakeDataset.md).
|
332 |
+
|
333 |
+
For a sample file, please refer to [file](./dataset/yamls/porcelain.yaml). For a full parameter configuration file, see [file](./IndexKits/docs/MakeDataset.md).
|
334 |
+
|
335 |
+
|
336 |
+
5. Create training data index file using YAML file.
|
337 |
+
|
338 |
+
```shell
|
339 |
+
# Single Resolution Data Preparation
|
340 |
+
idk base -c dataset/yamls/porcelain.yaml -t dataset/porcelain/jsons/porcelain.json
|
341 |
+
|
342 |
+
# Multi Resolution Data Preparation
|
343 |
+
idk multireso -c dataset/yamls/porcelain_mt.yaml -t dataset/porcelain/jsons/porcelain_mt.json
|
344 |
+
```
|
345 |
+
|
346 |
+
The directory structure for `porcelain` dataset is:
|
347 |
+
|
348 |
+
```shell
|
349 |
+
cd ./dataset
|
350 |
+
|
351 |
+
porcelain
|
352 |
+
├──images/ (image files)
|
353 |
+
│ ├──0.png
|
354 |
+
│ ├──1.png
|
355 |
+
│ ├──......
|
356 |
+
├──csvfile/ (csv files containing text-image pairs)
|
357 |
+
│ ├──image_text.csv
|
358 |
+
├──arrows/ (arrow files containing all necessary training data)
|
359 |
+
│ ├──00000.arrow
|
360 |
+
│ ├──00001.arrow
|
361 |
+
│ ├──......
|
362 |
+
├──jsons/ (final training data index files which read data from arrow files during training)
|
363 |
+
│ ├──porcelain.json
|
364 |
+
│ ├──porcelain_mt.json
|
365 |
+
```
|
366 |
+
|
367 |
+
### Full-parameter Training
|
368 |
+
|
369 |
+
To leverage DeepSpeed in training, you have the flexibility to control **single-node** / **multi-node** training by adjusting parameters such as `--hostfile` and `--master_addr`. For more details, see [link](https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node).
|
370 |
+
|
371 |
+
```shell
|
372 |
+
# Single Resolution Data Preparation
|
373 |
+
PYTHONPATH=./ sh hydit/train.sh --index-file dataset/porcelain/jsons/porcelain.json
|
374 |
+
|
375 |
+
# Multi Resolution Data Preparation
|
376 |
+
PYTHONPATH=./ sh hydit/train.sh --index-file dataset/porcelain/jsons/porcelain.json --multireso --reso-step 64
|
377 |
+
```
|
378 |
+
|
379 |
+
### LoRA
|
380 |
+
|
381 |
+
We provide training and inference scripts for LoRA, detailed in the [guidances](./lora/README.md).
|
382 |
|
383 |
|
384 |
## 🔑 Inference
|
385 |
|
386 |
### Using Gradio
|
387 |
|
388 |
+
Make sure the conda environment is activated before running the following command.
|
389 |
|
390 |
```shell
|
391 |
# By default, we start a Chinese UI.
|
|
|
400 |
|
401 |
# Start with English UI
|
402 |
python app/hydit_app.py --lang en
|
403 |
+
|
404 |
+
# Start a multi-turn T2I generation UI.
|
405 |
+
# If your GPU memory is less than 32GB, use '--load-4bit' to enable 4-bit quantization, which requires at least 22GB of memory.
|
406 |
+
python app/multiTurnT2I_app.py
|
407 |
```
|
408 |
+
Then the demo can be accessed through http://0.0.0.0:443. It should be noted that the 0.0.0.0 here needs to be X.X.X.X with your server IP.
|
409 |
+
|
410 |
+
### Using 🤗 Diffusers
|
411 |
+
|
412 |
+
Please install PyTorch version 2.0 or higher in advance to satisfy the requirements of the specified version of the diffusers library.
|
413 |
+
|
414 |
+
Install 🤗 diffusers, ensuring that the version is at least 0.28.1:
|
415 |
+
|
416 |
+
```shell
|
417 |
+
pip install git+https://github.com/huggingface/diffusers.git
|
418 |
+
```
|
419 |
+
or
|
420 |
+
```shell
|
421 |
+
pip install diffusers
|
422 |
+
```
|
423 |
+
|
424 |
+
You can generate images with both Chinese and English prompts using the following Python script:
|
425 |
+
```py
|
426 |
+
import torch
|
427 |
+
from diffusers import HunyuanDiTPipeline
|
428 |
+
|
429 |
+
pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16)
|
430 |
+
pipe.to("cuda")
|
431 |
+
|
432 |
+
# You may also use English prompt as HunyuanDiT supports both English and Chinese
|
433 |
+
# prompt = "An astronaut riding a horse"
|
434 |
+
prompt = "一个宇航员在骑马"
|
435 |
+
image = pipe(prompt).images[0]
|
436 |
+
```
|
437 |
+
You can use our distilled model to generate images even faster:
|
438 |
+
|
439 |
+
```py
|
440 |
+
import torch
|
441 |
+
from diffusers import HunyuanDiTPipeline
|
442 |
+
|
443 |
+
pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-Diffusers-Distilled", torch_dtype=torch.float16)
|
444 |
+
pipe.to("cuda")
|
445 |
+
|
446 |
+
# You may also use English prompt as HunyuanDiT supports both English and Chinese
|
447 |
+
# prompt = "An astronaut riding a horse"
|
448 |
+
prompt = "一个宇航员在骑马"
|
449 |
+
image = pipe(prompt, num_inference_steps=25).images[0]
|
450 |
+
```
|
451 |
+
More details can be found in [HunyuanDiT-Diffusers-Distilled](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-Diffusers-Distilled)
|
452 |
|
453 |
### Using Command Line
|
454 |
|
455 |
+
We provide several commands to quick start:
|
456 |
|
457 |
+
```shell
|
458 |
# Prompt Enhancement + Text-to-Image. Torch mode
|
459 |
python sample_t2i.py --prompt "渔舟唱晚"
|
460 |
|
|
|
466 |
|
467 |
# Generate an image with other image sizes.
|
468 |
python sample_t2i.py --prompt "渔舟唱晚" --image-size 1280 768
|
469 |
+
|
470 |
+
# Prompt Enhancement + Text-to-Image. DialogGen loads with 4-bit quantization, but it may loss performance.
|
471 |
+
python sample_t2i.py --prompt "渔舟唱晚" --load-4bit
|
472 |
+
|
473 |
```
|
474 |
|
475 |
More example prompts can be found in [example_prompts.txt](example_prompts.txt)
|
|
|
485 |
| `--seed` | 42 | The random seed for generating images |
|
486 |
| `--infer-steps` | 100 | The number of steps for sampling |
|
487 |
| `--negative` | - | The negative prompt for image generation |
|
488 |
+
| `--infer-mode` | torch | The inference mode (torch, fa, or trt) |
|
489 |
| `--sampler` | ddpm | The diffusion sampler (ddpm, ddim, or dpmms) |
|
490 |
| `--no-enhance` | False | Disable the prompt enhancement model |
|
491 |
| `--model-root` | ckpts | The root directory of the model checkpoints |
|
492 |
| `--load-key` | ema | Load the student model or EMA model (ema or module) |
|
493 |
+
| `--load-4bit` | Fasle | Load DialogGen model with 4bit quantization |
|
494 |
+
|
495 |
+
### Using ComfyUI
|
496 |
+
|
497 |
+
We provide several commands to quick start:
|
498 |
+
|
499 |
+
```shell
|
500 |
+
# Download comfyui code
|
501 |
+
git clone https://github.com/comfyanonymous/ComfyUI.git
|
502 |
+
|
503 |
+
# Install torch, torchvision, torchaudio
|
504 |
+
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu117
|
505 |
+
|
506 |
+
# Install Comfyui essential python package
|
507 |
+
cd ComfyUI
|
508 |
+
pip install -r requirements.txt
|
509 |
+
|
510 |
+
# ComfyUI has been successfully installed!
|
511 |
+
|
512 |
+
# Download model weight as before or link the existing model folder to ComfyUI.
|
513 |
+
python -m pip install "huggingface_hub[cli]"
|
514 |
+
mkdir models/hunyuan
|
515 |
+
huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./models/hunyuan/ckpts
|
516 |
+
|
517 |
+
# Move to the ComfyUI custom_nodes folder and copy comfyui-hydit folder from HunyuanDiT Repo.
|
518 |
+
cd custom_nodes
|
519 |
+
cp -r ${HunyuanDiT}/comfyui-hydit ./
|
520 |
+
cd comfyui-hydit
|
521 |
+
|
522 |
+
# Install some essential python Package.
|
523 |
+
pip install -r requirements.txt
|
524 |
+
|
525 |
+
# Our tool has been successfully installed!
|
526 |
|
527 |
+
# Go to ComfyUI main folder
|
528 |
+
cd ../..
|
529 |
+
# Run the ComfyUI Lauch command
|
530 |
+
python main.py --listen --port 80
|
531 |
|
532 |
+
# Running ComfyUI successfully!
|
533 |
+
```
|
534 |
+
More details can be found in [ComfyUI README](comfyui-hydit/README.md)
|
535 |
+
|
536 |
+
## 🚀 Acceleration (for Linux)
|
537 |
+
|
538 |
+
- We provide TensorRT version of HunyuanDiT for inference acceleration (faster than flash attention).
|
539 |
+
See [Tencent-Hunyuan/TensorRT-libs](https://huggingface.co/Tencent-Hunyuan/TensorRT-libs) for more details.
|
540 |
+
|
541 |
+
- We provide Distillation version of HunyuanDiT for inference acceleration.
|
542 |
+
See [Tencent-Hunyuan/Distillation](https://huggingface.co/Tencent-Hunyuan/Distillation) for more details.
|
543 |
+
|
544 |
+
## 🔗 BibTeX
|
545 |
If you find [Hunyuan-DiT](https://arxiv.org/abs/2405.08748) or [DialogGen](https://arxiv.org/abs/2403.08857) useful for your research and applications, please cite using this BibTeX:
|
546 |
|
547 |
```BibTeX
|
|
|
560 |
journal={arXiv preprint arXiv:2403.08857},
|
561 |
year={2024}
|
562 |
}
|
563 |
+
```
|
564 |
+
|
565 |
+
## Start History
|
566 |
+
|
567 |
+
<a href="https://star-history.com/#Tencent/HunyuanDiT&Date">
|
568 |
+
<picture>
|
569 |
+
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent/HunyuanDiT&type=Date&theme=dark" />
|
570 |
+
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent/HunyuanDiT&type=Date" />
|
571 |
+
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent/HunyuanDiT&type=Date" />
|
572 |
+
</picture>
|
573 |
+
</a>
|