mao1207
/

color-diffusion

Model card Files Files and versions Community

color-diffusion / README.md

mao1207's picture

Update README.md

4d1e830 verified about 1 month ago

|

3.26 kB

	# Color Diffusion (Evaluating Model Perception of Color Illusions in Photorealistic Scenes)

	Authors: Lingjun Mao, Zineng Tang, Alane Suhr

	---

	![examples](https://github.com/mao1207/RCID/blob/main/images/color-diffusion.gif?raw=true)


	## Model Overview

	The Color Diffusion model used in the paper "Evaluating Model Perception of Color Illusions in Photorealistic Scenes" is designed to generate images for RCID dataset based on a color sketch. By simply providing the model with a colored draft image, it can generate realistic images that match both the shape and color patterns of the provided sketch, according to a given text prompt. This model is built upon ControlNet and has been trained for 20 epochs on the MS COCO 2017 dataset.

	## RCID Dataset

	![RCID](https://github.com/mao1207/RCID/blob/main/images/main_figure.png?raw=true)

	The construction of our dataset involves three steps:

	1. Image Generation. For contrast and stripe illusions, we use procedural code to generate simple illusion images, which are then processed by our Color Diffusion model to create realistic illusion images. For filter illusions, we directly apply contrasting color filters to the original images. Each type of illusion also includes a corresponding control group without any illusions for comparison.

	2. Question Generation. We use GPT-4o to generate image-specific questions that are designed to evaluate the model's understanding of the illusion.

	3. Human Feedback. We collect human participants' feedback on these images and adjust the original classification of “illusion” and “non-illusion” based on whether participants are deceived.

	Our data can be found in the following link: [RCID Dataset](https://huggingface.co/datasets/mao1207/RCID)

	The code is released on [Color Illusion](https://github.com/mao1207/RCID)

	## How to Use the Model

	To generate a realistic image from a simplified image and a text prompt using the Color Diffusion model, you can use the following code:

	```python
	import random
	import torch
	from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
	from diffusers.utils import load_image

	# Set device
	device = "cuda" if torch.cuda.is_available() else "cpu"

	# Load the models
	controlnet = ControlNetModel.from_pretrained("controlnet_model_path", torch_dtype=torch.float32).to(device)
	pipe = StableDiffusionControlNetPipeline.from_pretrained("base_model_path", controlnet=controlnet, torch_dtype=torch.float32).to(device)

	# Load your simplified image
	simplified_image = load_image("path_to_simplified_image.png")

	# Define the text prompt
	prompt = "A photorealistic image of a sunset over the ocean."

	# Generate realistic image
	generator = torch.manual_seed(random.randint(0, 100000))
	generated_image = pipe(prompt, num_inference_steps=50, generator=generator, image=simplified_image).images[0]

	# Save the generated image
	generated_image.save("generated_image.png")
	```

	## License

	The source code of this repository is released under the Apache License 2.0. The model license and dataset license are listed on their corresponding webpages.

	For more information, access to the dataset, and to contribute, please visit our [Website](https://color-illusion.github.io/Color-Illusion/).