File size: 8,442 Bytes
0f9e661
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# img2img-turbo

[**Paper**](https://arxiv.org/abs/2403.12036) | [**Sketch2Image Demo**](https://huggingface.co/spaces/gparmar/img2img-turbo-sketch) 
#### **Quick start:** [**Running Locally**](#getting-started) | [**Gradio (locally hosted)**](#gradio-demo) | [**Training**](#training-with-your-own-data)

### Cat Sketching
<p align="left" >
<img src="https://raw.githubusercontent.com/GaParmar/img2img-turbo/main/assets/cat_2x.gif" width="800" />
</p>

### Fish Sketching
<p align="left">
<img src="https://raw.githubusercontent.com/GaParmar/img2img-turbo/main/assets/fish_2x.gif"  width="800" />
</p>


We propose a general method for adapting a single-step diffusion model, such as SD-Turbo, to new tasks and domains through adversarial learning. This enables us to leverage the internal knowledge of pre-trained diffusion models while achieving efficient inference (e.g., for 512x512 images, 0.29 seconds on A6000 and 0.11 seconds on A100). 

Our one-step conditional models **CycleGAN-Turbo** and **pix2pix-turbo** can perform various image-to-image translation tasks for both unpaired and paired settings. CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods, while pix2pix-turbo is on par with recent works such as ControlNet for Sketch2Photo and Edge2Image, but with one-step inference. 

[One-Step Image Translation with Text-to-Image Models](https://arxiv.org/abs/2403.12036)<br>
[Gaurav Parmar](https://gauravparmar.com/), [Taesung Park](https://taesung.me/), [Srinivasa Narasimhan](https://www.cs.cmu.edu/~srinivas/), [Jun-Yan Zhu](https://github.com/junyanz/)<br>
CMU and Adobe, arXiv 2403.12036

<br>
<div>
<p align="center">
<img src='assets/teaser_results.jpg' align="center" width=1000px>
</p>
</div>




## Results

### Paired Translation with pix2pix-turbo
**Edge to Image**
<div>
<p align="center">
<img src='assets/edge_to_image_results.jpg' align="center" width=800px>
</p>
</div>

<!-- **Sketch to Image**
TODO -->
### Generating Diverse Outputs
By varying the input noise map, our method can generate diverse outputs from the same input conditioning.
The output style can be controlled by changing the text prompt.
<div> <p align="center">
<img src='assets/gen_variations.jpg' align="center" width=800px>
</p> </div>

### Unpaired Translation with CycleGAN-Turbo

**Day to Night**
<div> <p align="center">
<img src='assets/day2night_results.jpg' align="center" width=800px>
</p> </div>

**Night to Day**
<div><p align="center">
<img src='assets/night2day_results.jpg' align="center" width=800px>
</p> </div>

**Clear to Rainy**
<div>
<p align="center">
<img src='assets/clear2rainy_results.jpg' align="center" width=800px>
</p>
</div>

**Rainy to Clear**
<div>
<p align="center">
<img src='assets/rainy2clear.jpg' align="center" width=800px>
</p>
</div>
<hr>


## Method
**Our Generator Architecture:**
We tightly integrate three separate modules in the original latent diffusion models into a single end-to-end network with small trainable weights. This architecture allows us to translate the input image x to the output y, while retaining the input scene structure. We use LoRA adapters in each module, introduce skip connections and Zero-Convs between input and output, and retrain the first layer of the U-Net. Blue boxes indicate trainable layers. Semi-transparent layers are frozen. The same generator can be used for various GAN objectives.
<div>
<p align="center">
<img src='assets/method.jpg' align="center" width=900px>
</p>
</div>


## Getting Started
**Environment Setup**
- We provide a [conda env file](environment.yml) that contains all the required dependencies.
    ```
    conda env create -f environment.yaml
    ```
- Following this, you can activate the conda environment with the command below. 
  ```
  conda activate img2img-turbo
  ```
- Or use virtual environment:
  ```
  python3 -m venv venv
  source venv/bin/activate
  pip install -r requirements.txt
  ```
**Paired Image Translation (pix2pix-turbo)**
- The following command takes an image file and a prompt as inputs, extracts the canny edges, and saves the results in the directory specified.
    ```bash
    python src/inference_paired.py --model_name "edge_to_image" \
        --input_image "assets/examples/bird.png" \
        --prompt "a blue bird" \
        --output_dir "outputs"
    ```
    <table>
    <th>Input Image</th>
    <th>Canny Edges</th>
    <th>Model Output</th>
    </tr>
    <tr>
    <td><img src='assets/examples/bird.png' width="200px"></td>
    <td><img src='assets/examples/bird_canny.png' width="200px"></td>
    <td><img src='assets/examples/bird_canny_blue.png' width="200px"></td>
    </tr>
    </table>
    <br>

- The following command takes a sketch and a prompt as inputs, and saves the results in the directory specified.
    ```bash
    python src/inference_paired.py --model_name "sketch_to_image_stochastic" \
    --input_image "assets/examples/sketch_input.png" --gamma 0.4 \
    --prompt "ethereal fantasy concept art of an asteroid. magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy" \
    --output_dir "outputs"
    ```
    <table>
    <th>Input</th>
    <th>Model Output</th>
    </tr>
    <tr>
    <td><img src='assets/examples/sketch_input.png' width="400px"></td>
    <td><img src='assets/examples/sketch_output.png' width="400px"></td>
    </tr>
    </table>
    <br>

**Unpaired Image Translation (CycleGAN-Turbo)**
- The following command takes a **day** image file as input, and saves the output **night** in the directory specified.
    ```
    python src/inference_unpaired.py --model_name "day_to_night" \
        --input_image "assets/examples/day2night_input.png" --output_dir "outputs"
    ```
    <table>
    <th>Input (day)</th>
    <th>Model Output (night)</th>
    </tr>
    <tr>
    <td><img src='assets/examples/day2night_input.png' width="400px"></td>
    <td><img src='assets/examples/day2night_output.png' width="400px"></td>
    </tr>
    </table>

- The following command takes a **night** image file as input, and saves the output **day** in the directory specified.
    ```
    python src/inference_unpaired.py --model_name "night_to_day" \
        --input_image "assets/examples/night2day_input.png" --output_dir "outputs"
    ```
    <table>
    <th>Input (night)</th>
    <th>Model Output (day)</th>
    </tr>
    <tr>
    <td><img src='assets/examples/night2day_input.png' width="400px"></td>
    <td><img src='assets/examples/night2day_output.png' width="400px"></td>
    </tr>
    </table>

- The following command takes a **clear** image file as input, and saves the output **rainy** in the directory specified.
    ```
    python src/inference_unpaired.py --model_name "clear_to_rainy" \
        --input_image "assets/examples/clear2rainy_input.png" --output_dir "outputs"
    ```
    <table>
    <th>Input (clear)</th>
    <th>Model Output (rainy)</th>
    </tr>
    <tr>
    <td><img src='assets/examples/clear2rainy_input.png' width="400px"></td>
    <td><img src='assets/examples/clear2rainy_output.png' width="400px"></td>
    </tr>
    </table>

- The following command takes a **rainy** image file as input, and saves the output **clear** in the directory specified.
    ```
    python src/inference_unpaired.py --model_name "rainy_to_clear" \
        --input_image "assets/examples/rainy2clear_input.png" --output_dir "outputs"
    ```
    <table>
    <th>Input (rainy)</th>
    <th>Model Output (clear)</th>
    </tr>
    <tr>
    <td><img src='assets/examples/rainy2clear_input.png' width="400px"></td>
    <td><img src='assets/examples/rainy2clear_output.png' width="400px"></td>
    </tr>
    </table>



## Gradio Demo
- We provide a Gradio demo for the paired image translation tasks.
- The following command will launch the sketch to image locally using gradio.
    ```
    gradio gradio_sketch2image.py
    ```
- The following command will launch the canny edge to image gradio demo locally.
   ```
    gradio gradio_canny2image.py
   ```


## Training with your own data
- See the steps [here](docs/training_pix2pix_turbo.md) for training a pix2pix-turbo model on your paired data.
- See the steps [here](docs/training_cyclegan_turbo.md) for training a CycleGAN-Turbo model on your unpaired data.


## Acknowledgment
Our work uses the Stable Diffusion-Turbo as the base model with the following [LICENSE](https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE).