BoyuanJiang commited on
Commit
10321dd
·
1 Parent(s): b5a9c6d

upload model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.jpg filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,119 @@
1
  ---
2
  license: cc-by-nc-sa-4.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-sa-4.0
3
+
4
+ extra_gated_prompt: "You agree that this model will only be used for non-commercial purposes."
5
+ extra_gated_fields:
6
+ Name: text
7
+ Email: text
8
+ Country: country
9
+ Organization or Affiliation: text
10
+ I agree to use this dataset for non-commercial use ONLY: checkbox
11
  ---
12
+
13
+ # FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-onon
14
+
15
+ <div style="display: flex; justify-content: center; align-items: center;">
16
+ <a href="https://arxiv.org/abs/2411.10499" style="margin: 0 2px;">
17
+ <img src='https://img.shields.io/badge/arXiv-2411.10499-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'>
18
+ </a>
19
+ <a href="https://github.com/BoyuanJiang/FitDiT" style="margin: 0 2px;">
20
+ <img src='https://img.shields.io/badge/GitHub-Repo-blue?style=flat&logo=GitHub' alt='GitHub'>
21
+ </a>
22
+ <a href="http://demo.fitdit.byjiang.com/" style="margin: 0 2px;">
23
+ <img src='https://img.shields.io/badge/Demo-Gradio-gold?style=flat&logo=Gradio&logoColor=red' alt='Demo'>
24
+ </a>
25
+ <a href='https://huggingface.co/BoyuanJiang/FitDiT' style="margin: 0 2px;">
26
+ <img src='https://img.shields.io/badge/Hugging Face-ckpts-orange?style=flat&logo=HuggingFace&logoColor=orange' alt='huggingface'>
27
+ </a>
28
+ <a href='https://byjiang.com/FitDiT/' style="margin: 0 2px;">
29
+ <img src='https://img.shields.io/badge/Webpage-Project-silver?style=flat&logo=&logoColor=orange' alt='webpage'>
30
+ </a>
31
+ <a href="https://raw.githubusercontent.com/BoyuanJiang/FitDiT/refs/heads/main/LICENSE" style="margin: 0 2px;">
32
+ <img src='https://img.shields.io/badge/License-CC BY--NC--SA--4.0-lightgreen?style=flat&logo=Lisence' alt='License'>
33
+ </a>
34
+ </div>
35
+
36
+ **FitDiT** is designed for high-fidelity virtual try-on using Diffusion Transformers (DiT).
37
+ <div align="center">
38
+ <img src="resource/img/teaser.jpg" width="100%" height="100%"/>
39
+ </div>
40
+
41
+
42
+ ## Updates
43
+ - **`2024/12/20`**: The FitDiT [**model weight**](https://huggingface.co/BoyuanJiang/FitDiT) is available.
44
+ - **`2024/12/17`**: Inference code is released.
45
+ - **`2024/12/4`**: Our [**Online Demo**](http://demo.fitdit.byjiang.com/) is released.
46
+ - **`2024/11/25`**: Our [**Complex Virtual Dressing Dataset (CVDD)**](https://huggingface.co/datasets/BoyuanJiang/CVDD) is released.
47
+ - **`2024/11/15`**: Our [**FitDiT paper**](https://arxiv.org/abs/2411.10499) is available.
48
+
49
+
50
+ ## Gradio Demo
51
+ Our algorithm is divided into two steps. The first step is to generate the mask of the try-on area, and the second step is to try-on in the mask area.
52
+
53
+ ### Step1: Run Mask
54
+ You can simpley get try-on mask by click **Step1: Run Mask** at the right side of gradio demo. If the automatically generated mask are not well covered the area where you want to try-on, you can either adjust the mask by:
55
+
56
+ 1. Drag the slider of *mask offset top*, *mask offset bottom*, *mask offset left* or *mask offset right* and then click **Step1: Run Mask** button, this will re-generate mask.
57
+
58
+ ![mask_offset](resource/img/mask_offset.jpg)
59
+
60
+
61
+
62
+ 2. Using the brush or eraser tool to edit the automatically generated mask
63
+
64
+ ![manually_adjust](resource/img/manually_adjust.jpg)
65
+
66
+ ### Step2: Run Try-on
67
+ After generating a suitable mask, you can get the try-on results by click **Step2: Run Try-on**. In the Try-on resolution drop-down box, you can select a suitable processing resolution. In our online demo, the default resolution is 1152x1536, which means that the input model image and garment image will be pad and resized to this resolution before being fed into the model.
68
+
69
+
70
+ ## Local Demo
71
+ First apply access of FitDiT [model weight](https://huggingface.co/BoyuanJiang/FitDiT), then clone model to *local_model_dir*
72
+
73
+ ### Enviroment
74
+ We test our model with following enviroment
75
+ ```
76
+ torch==2.3.0
77
+ torchvision==0.18.0
78
+ diffusers==0.31.0
79
+ transformers==4.39.3
80
+ gradio==5.8.0
81
+ onnxruntime-gpu==1.20.1
82
+ ```
83
+
84
+ ### Run gradio locally
85
+ ```
86
+ # Run model with bf16 without any offload, fastest inference and most memory
87
+ python gradio_sd3.py --model_path local_model_dir
88
+
89
+ # Run model with fp16
90
+ python gradio_sd3.py --model_path local_model_dir --fp16
91
+
92
+ # Run model with fp16 and cpu offload, moderate inference and moderate memory
93
+ python gradio_sd3.py --model_path local_model_dir --fp16 --offload
94
+
95
+ # Run model with fp16 and aggressive cpu offload, slowest inference and less memory
96
+ python gradio_sd3.py --model_path local_model_dir --fp16 --aggressive_offload
97
+ ```
98
+
99
+ ## Star History
100
+
101
+ [![Star History Chart](https://api.star-history.com/svg?repos=BoyuanJiang/FitDiT&type=Date)](https://star-history.com/#BoyuanJiang/FitDiT&Date)
102
+
103
+ ## Contact
104
+ This model can only be used **for non-commercial use**. If you want to use it for commercial use or expect better results, please contact me at byronjiang@tencent.com
105
+
106
+
107
+ ## Citation
108
+ If you find our work helpful for your research, please consider citing our work.
109
+ ```
110
+ @misc{jiang2024fitditadvancingauthenticgarment,
111
+ title={FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on},
112
+ author={Boyuan Jiang and Xiaobin Hu and Donghao Luo and Qingdong He and Chengming Xu and Jinlong Peng and Jiangning Zhang and Chengjie Wang and Yunsheng Wu and Yanwei Fu},
113
+ year={2024},
114
+ eprint={2411.10499},
115
+ archivePrefix={arXiv},
116
+ primaryClass={cs.CV},
117
+ url={https://arxiv.org/abs/2411.10499},
118
+ }
119
+ ```
dwpose/dw-ll_ucoco_384.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:724f4ff2439ed61afb86fb8a1951ec39c6220682803b4a8bd4f598cd913b1843
3
+ size 134399116
dwpose/yolox_l.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7860ae79de6c89a3c1eb72ae9a2756c0ccfbe04b7791bb5880afabd97855a411
3
+ size 216746733
humanparsing/parsing_atr.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04c7d1d070d0e0ae943d86b18cb5aaaea9e278d97462e9cfb270cbbe4cd977f4
3
+ size 266859305
humanparsing/parsing_lip.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8436e1dae96e2601c373d1ace29c8f0978b16357d9038c17a8ba756cca376dbc
3
+ size 266863411
model_index.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusion3TryOnPipeline",
3
+ "_diffusers_version": "0.29.0.dev0",
4
+ "scheduler": [
5
+ "diffusers",
6
+ "FlowMatchEulerDiscreteScheduler"
7
+ ],
8
+ "image_encoder_large": [
9
+ "transformers",
10
+ "CLIPVisionModelWithProjection"
11
+ ],
12
+ "image_encoder_bigG": [
13
+ "transformers",
14
+ "CLIPVisionModelWithProjection"
15
+ ],
16
+ "transformer_garm": [
17
+ "diffusers",
18
+ "SD3Transformer2DModel"
19
+ ],
20
+ "transformer_vton": [
21
+ "diffusers",
22
+ "SD3Transformer2DModel"
23
+ ],
24
+ "vae": [
25
+ "diffusers",
26
+ "AutoencoderKL"
27
+ ]
28
+ }
pose_guider/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1b5ae40cdc9ccf32a157cc5f150e43812c5bba79cd10b7614ba199407e6d6f6
3
+ size 10267174
resource/img/manually_adjust.jpg ADDED

Git LFS Details

  • SHA256: 3a3c0a42a9c5fdc423d2a6c04d45e34e71e49a785f0f0cb1786ae0bba73ef1fb
  • Pointer size: 131 Bytes
  • Size of remote file: 559 kB
resource/img/mask_offset.jpg ADDED

Git LFS Details

  • SHA256: 41c112c0093150c3eafcd825129a33c84888f2cf1fb59a94af05f8ad7b422852
  • Pointer size: 131 Bytes
  • Size of remote file: 321 kB
resource/img/teaser.jpg ADDED

Git LFS Details

  • SHA256: 8c625d1ea090054b5144fe14caa0bd9cbbdee5d52f0e9c8e9d8b001f70a501bf
  • Pointer size: 132 Bytes
  • Size of remote file: 2.03 MB
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "FlowMatchEulerDiscreteScheduler",
3
+ "_diffusers_version": "0.29.0.dev0",
4
+ "num_train_timesteps": 1000,
5
+ "shift": 3.0
6
+ }
transformer_garm/config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "SD3Transformer2DModel",
3
+ "_diffusers_version": "0.31.0",
4
+ "_name_or_path": "",
5
+ "attention_head_dim": 64,
6
+ "caption_projection_dim": 1536,
7
+ "in_channels": 16,
8
+ "joint_attention_dim": 4096,
9
+ "num_attention_heads": 24,
10
+ "num_layers": 24,
11
+ "out_channels": 16,
12
+ "patch_size": 2,
13
+ "pooled_projection_dim": 2048,
14
+ "pos_embed_max_size": 192,
15
+ "sample_size": 128
16
+ }
transformer_garm/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69c252316f4d8e4717cca392df6b0fbbb534276b2e1f4863163e9a25a5b85d49
3
+ size 3830005160
transformer_vton/config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "SD3Transformer2DModel",
3
+ "_diffusers_version": "0.31.0",
4
+ "_name_or_path": "",
5
+ "attention_head_dim": 64,
6
+ "caption_projection_dim": 1536,
7
+ "in_channels": 33,
8
+ "joint_attention_dim": 4096,
9
+ "num_attention_heads": 24,
10
+ "num_layers": 24,
11
+ "out_channels": 16,
12
+ "patch_size": 2,
13
+ "pooled_projection_dim": 2048,
14
+ "pos_embed_max_size": 192,
15
+ "sample_size": 128
16
+ }
transformer_vton/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45c1a95de40e1d9039308a4fd479fa73f1a0b6092728169a81e4c2a8c32ed6b4
3
+ size 3830214056
vae/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.29.0.dev0",
4
+ "act_fn": "silu",
5
+ "block_out_channels": [
6
+ 128,
7
+ 256,
8
+ 512,
9
+ 512
10
+ ],
11
+ "down_block_types": [
12
+ "DownEncoderBlock2D",
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D"
16
+ ],
17
+ "force_upcast": true,
18
+ "in_channels": 3,
19
+ "latent_channels": 16,
20
+ "latents_mean": null,
21
+ "latents_std": null,
22
+ "layers_per_block": 2,
23
+ "norm_num_groups": 32,
24
+ "out_channels": 3,
25
+ "sample_size": 1024,
26
+ "scaling_factor": 1.5305,
27
+ "shift_factor": 0.0609,
28
+ "up_block_types": [
29
+ "UpDecoderBlock2D",
30
+ "UpDecoderBlock2D",
31
+ "UpDecoderBlock2D",
32
+ "UpDecoderBlock2D"
33
+ ],
34
+ "use_post_quant_conv": false,
35
+ "use_quant_conv": false
36
+ }
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9b67a279283625caee39d61eacb5324243848477b4eb535355eaaa8423d4e09
3
+ size 167666654