|
# Color Diffusion (Evaluating Model Perception of Color Illusions in Photorealistic Scenes) |
|
|
|
Authors: Lingjun Mao, Zineng Tang, Alane Suhr |
|
|
|
--- |
|
|
|
![examples](https://github.com/mao1207/RCID/blob/main/images/color-diffusion.gif?raw=true) |
|
|
|
|
|
## Model Overview |
|
|
|
The **Color Diffusion** model used in the paper "Evaluating Model Perception of Color Illusions in Photorealistic Scenes" is designed to generate images for RCID dataset based on a color sketch. By simply providing the model with a colored draft image, it can generate realistic images that match both the shape and color patterns of the provided sketch, according to a given text prompt. This model is built upon ControlNet and has been trained for 20 epochs on the MS COCO 2017 dataset. |
|
|
|
## RCID Dataset |
|
|
|
![RCID](https://github.com/mao1207/RCID/blob/main/images/main_figure.png?raw=true) |
|
|
|
The construction of our dataset involves three steps: |
|
|
|
1. **Image Generation.** For contrast and stripe illusions, we use procedural code to generate simple illusion images, which are then processed by our **Color Diffusion** model to create realistic illusion images. For filter illusions, we directly apply contrasting color filters to the original images. Each type of illusion also includes a corresponding control group without any illusions for comparison. |
|
|
|
2. **Question Generation.** We use GPT-4o to generate image-specific questions that are designed to evaluate the model's understanding of the illusion. |
|
|
|
3. **Human Feedback.** We collect human participants' feedback on these images and adjust the original classification of “illusion” and “non-illusion” based on whether participants are deceived. |
|
|
|
Our data can be found in the following link: [RCID Dataset](https://huggingface.co/datasets/mao1207/RCID) |
|
|
|
The code is released on [Color Illusion](https://github.com/mao1207/RCID) |
|
|
|
## How to Use the Model |
|
|
|
To generate a realistic image from a simplified image and a text prompt using the Color Diffusion model, you can use the following code: |
|
|
|
```python |
|
import random |
|
import torch |
|
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel |
|
from diffusers.utils import load_image |
|
|
|
# Set device |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
# Load the models |
|
controlnet = ControlNetModel.from_pretrained("controlnet_model_path", torch_dtype=torch.float32).to(device) |
|
pipe = StableDiffusionControlNetPipeline.from_pretrained("base_model_path", controlnet=controlnet, torch_dtype=torch.float32).to(device) |
|
|
|
# Load your simplified image |
|
simplified_image = load_image("path_to_simplified_image.png") |
|
|
|
# Define the text prompt |
|
prompt = "A photorealistic image of a sunset over the ocean." |
|
|
|
# Generate realistic image |
|
generator = torch.manual_seed(random.randint(0, 100000)) |
|
generated_image = pipe(prompt, num_inference_steps=50, generator=generator, image=simplified_image).images[0] |
|
|
|
# Save the generated image |
|
generated_image.save("generated_image.png") |
|
``` |
|
|
|
## License |
|
|
|
The source code of this repository is released under the Apache License 2.0. The model license and dataset license are listed on their corresponding webpages. |
|
|
|
For more information, access to the dataset, and to contribute, please visit our [Website](https://color-illusion.github.io/Color-Illusion/). |