CommonArt-PoC / README.md

Update README.md

8ad3e60 verified 6 months ago

4.58 kB

	---
	library_name: diffusers
	license: apache-2.0
	datasets:
	- common-canvas/commoncatalog-cc-by
	- alfredplpl/commoncatalog-cc-by-recap
	language:
	- en
	---

	# CommonArt-PoC

	![beach](beach.png)

	CommonArt is a text-to-image generation model with authorized images only.
	The architecture is based on DiT that is used by Stable Diffusion 3 and Sora.

	## How to Get Started with the Model

	You can use this model by diffusers library.

	```python
	import torch
	from diffusers import Transformer2DModel, PixArtSigmaPipeline

	device = "cpu"
	weight_dtype = torch.float32

	transformer = Transformer2DModel.from_pretrained(
	"alfredplpl/CommonArt-PoC",
	torch_dtype=weight_dtype,
	use_safetensors=True,
	)

	pipe = PixArtSigmaPipeline.from_pretrained(
	"PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers",
	transformer=transformer,
	torch_dtype=weight_dtype,
	use_safetensors=True,
	)

	pipe.to(device)

	prompt = " A picturesque photograph of a serene coastline, capturing the tranquility of a sunrise over the ocean. The image shows a wide expanse of gently rolling sandy beach, with clear, turquoise water stretching into the horizon. Seashells and pebbles are scattered along the shore, and the sun's rays create a golden hue on the water's surface. The distant outline of a lighthouse can be seen, adding to the quaint charm of the scene. The sky is painted with soft pastel colors of dawn, gradually transitioning from pink to blue, creating a sense of peacefulness and beauty."
	image = pipe(prompt,guidance_scale=4.5,max_squence_length=512).images[0]
	image.save("beach.png")
	```


	## Model Details

	### Model Description

	- Developed by: alfredplpl
	- Funded by : alfredplpl
	- Shared by : alfredplpl
	- Model type: Diffusion transformer
	- Language(s) (NLP): English
	- License: Apache-2.0

	### Model Sources

	- Repository: [Pixart-Sigma](https://github.com/PixArt-alpha/PixArt-sigma)
	- Paper: [PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation](https://arxiv.org/abs/2403.04692)

	## Uses

	- Any purpose

	### Direct Use

	- To develop commercial text-to-image generation.
	- To research non-commercial text-to-image generation.

	### Out-of-Scope Use

	- To generate misinformation.

	## Bias, Risks, and Limitations

	- limited represantation

	## Training Details

	### Training Data

	I used these dataset to train the transformer.

	- CommonCatalog CC BY
	- CommonCatalog CC BY Extention

	#### Training Hyperparameters

	- Training regime:
	```bash
	_base_ = ['../PixArt_xl2_internal.py']
	data_root = "/mnt/my_raid/pixart"
	image_list_json = ['data_info.json']

	data = dict(
	type='InternalDataSigma', root='InternData', image_list_json=image_list_json, transform='default_train',
	load_vae_feat=False, load_t5_feat=False,
	)
	image_size = 256

	# model setting
	model = 'PixArt_XL_2'
	mixed_precision = 'fp16' # ['fp16', 'fp32', 'bf16']
	fp32_attention = True
	#load_from = "/mnt/my_raid/pixart/working/checkpoints/epoch_1_step_17500.pth" # https://huggingface.co/PixArt-alpha/PixArt-Sigma
	#resume_from = dict(checkpoint="/mnt/my_raid/pixart/working/checkpoints/epoch_37_step_62039.pth", load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)
	vae_pretrained = "output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers/vae" # sdxl vae
	multi_scale = False # if use multiscale dataset model training
	pe_interpolation = 0.5

	# training setting
	num_workers = 10
	train_batch_size = 64 # 64 as default
	num_epochs = 200 # 3
	gradient_accumulation_steps = 1
	grad_checkpointing = True
	gradient_clip = 0.2
	optimizer = dict(type='CAMEWrapper', lr=2e-5, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16))
	lr_schedule_args = dict(num_warmup_steps=1000)

	#visualize=True
	#train_sampling_steps = 3
	#eval_sampling_steps = 3
	log_interval = 20
	save_model_epochs = 1
	#save_model_steps = 2500
	work_dir = 'output/debug'

	# pixart-sigma
	scale_factor = 0.13025
	real_prompt_ratio = 0.5
	model_max_length = 512
	class_dropout_prob = 0.1

	```

	## How to resume training

	1. Download the [model](checkpoint/epoch_50_step_116738.pth).
	1. Set the model as "resume_from" model.

	## Environmental Impact

	- Hardware Type: A6000x2
	- Hours used: 700
	- Compute Region: Japan
	- Carbon Emitted: Not so much

	## Technical Specifications [optional]

	### Model Architecture and Objective

	Diffusion Transformer

	### Compute Infrastructure

	Desktop PC

	#### Hardware

	A6000x2

	#### Software

	[Pixart-Sigma repository](https://github.com/PixArt-alpha/PixArt-sigma)


	## Model Card Contact

	alfredplpl