--- license: apache-2.0 datasets: - lmms-lab/COCO-Caption2017 language: - en base_model: - stable-diffusion-v1-5/stable-diffusion-v1-5 pipeline_tag: text-to-image --- # Color Diffusion (Evaluating Model Perception of Color Illusions in Photorealistic Scenes) Authors: Lingjun Mao, Zineng Tang, Alane Suhr --- ![examples](https://github.com/mao1207/RCID/blob/main/images/color-diffusion.gif?raw=true) ## Model Overview The **Color Diffusion** model used in the paper "Evaluating Model Perception of Color Illusions in Photorealistic Scenes" is designed to generate images for RCID dataset based on a color sketch. By simply providing the model with a colored draft image, it can generate realistic images that match both the shape and color patterns of the provided sketch, according to a given text prompt. This model is built upon ControlNet and has been trained for 20 epochs on the MS COCO 2017 dataset. ## RCID Dataset ![RCID](https://github.com/mao1207/RCID/blob/main/images/main_figure.png?raw=true) The construction of our dataset involves three steps: 1. **Image Generation.** For contrast and stripe illusions, we use procedural code to generate simple illusion images, which are then processed by our **Color Diffusion** model to create realistic illusion images. For filter illusions, we directly apply contrasting color filters to the original images. Each type of illusion also includes a corresponding control group without any illusions for comparison. 2. **Question Generation.** We use GPT-4o to generate image-specific questions that are designed to evaluate the model's understanding of the illusion. 3. **Human Feedback.** We collect human participants' feedback on these images and adjust the original classification of “illusion” and “non-illusion” based on whether participants are deceived. Our data can be found in the following link: [RCID Dataset](https://huggingface.co/datasets/mao1207/RCID) The code is released on [Color Illusion](https://github.com/mao1207/RCID) ## How to Use the Model To generate a realistic image from a simplified image and a text prompt using the Color Diffusion model, you can use the following code: ```python import random import torch from diffusers import StableDiffusionControlNetPipeline, ControlNetModel from diffusers.utils import load_image # Set device device = "cuda" if torch.cuda.is_available() else "cpu" # Load the models controlnet = ControlNetModel.from_pretrained("controlnet_model_path", torch_dtype=torch.float32).to(device) pipe = StableDiffusionControlNetPipeline.from_pretrained("base_model_path", controlnet=controlnet, torch_dtype=torch.float32).to(device) # Load your simplified image simplified_image = load_image("path_to_simplified_image.png") # Define the text prompt prompt = "A photorealistic image of a sunset over the ocean." # Generate realistic image generator = torch.manual_seed(random.randint(0, 100000)) generated_image = pipe(prompt, num_inference_steps=50, generator=generator, image=simplified_image).images[0] # Save the generated image generated_image.save("generated_image.png") ``` ## License The source code of this repository is released under the Apache License 2.0. The model license and dataset license are listed on their corresponding webpages. For more information, access to the dataset, and to contribute, please visit our [Website](https://color-illusion.github.io/Color-Illusion/).