|
--- |
|
library_name: diffusers |
|
license: apache-2.0 |
|
datasets: |
|
- common-canvas/commoncatalog-cc-by |
|
- alfredplpl/commoncatalog-cc-by-recap |
|
language: |
|
- en |
|
--- |
|
|
|
# CommonArt-PoC |
|
|
|
![beach](beach.png) |
|
|
|
CommonArt is a text-to-image generation model with authorized images only. |
|
The architecture is based on DiT that is used by Stable Diffusion 3 and Sora. |
|
|
|
## How to Get Started with the Model |
|
|
|
You can use this model by diffusers library. |
|
|
|
```python |
|
import torch |
|
from diffusers import Transformer2DModel, PixArtSigmaPipeline |
|
|
|
device = "cpu" |
|
weight_dtype = torch.float32 |
|
|
|
transformer = Transformer2DModel.from_pretrained( |
|
"alfredplpl/CommonArt-PoC", |
|
torch_dtype=weight_dtype, |
|
use_safetensors=True, |
|
) |
|
|
|
pipe = PixArtSigmaPipeline.from_pretrained( |
|
"PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers", |
|
transformer=transformer, |
|
torch_dtype=weight_dtype, |
|
use_safetensors=True, |
|
) |
|
|
|
pipe.to(device) |
|
|
|
prompt = " A picturesque photograph of a serene coastline, capturing the tranquility of a sunrise over the ocean. The image shows a wide expanse of gently rolling sandy beach, with clear, turquoise water stretching into the horizon. Seashells and pebbles are scattered along the shore, and the sun's rays create a golden hue on the water's surface. The distant outline of a lighthouse can be seen, adding to the quaint charm of the scene. The sky is painted with soft pastel colors of dawn, gradually transitioning from pink to blue, creating a sense of peacefulness and beauty." |
|
image = pipe(prompt,guidance_scale=4.5,max_squence_length=512).images[0] |
|
image.save("beach.png") |
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** alfredplpl |
|
- **Funded by :** alfredplpl |
|
- **Shared by :** alfredplpl |
|
- **Model type:** Diffusion transformer |
|
- **Language(s) (NLP):** English |
|
- **License:** Apache-2.0 |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [Pixart-Sigma](https://github.com/PixArt-alpha/PixArt-sigma) |
|
- **Paper:** [PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation](https://arxiv.org/abs/2403.04692) |
|
|
|
## Uses |
|
|
|
- Any purpose |
|
|
|
### Direct Use |
|
|
|
- To develop commercial text-to-image generation. |
|
- To research non-commercial text-to-image generation. |
|
|
|
### Out-of-Scope Use |
|
|
|
- To generate misinformation. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
- limited represantation |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
I used these dataset to train the transformer. |
|
|
|
- CommonCatalog CC BY |
|
- CommonCatalog CC BY Extention |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** |
|
```bash |
|
_base_ = ['../PixArt_xl2_internal.py'] |
|
data_root = "/mnt/my_raid/pixart" |
|
image_list_json = ['data_info.json'] |
|
|
|
data = dict( |
|
type='InternalDataSigma', root='InternData', image_list_json=image_list_json, transform='default_train', |
|
load_vae_feat=False, load_t5_feat=False, |
|
) |
|
image_size = 256 |
|
|
|
# model setting |
|
model = 'PixArt_XL_2' |
|
mixed_precision = 'fp16' # ['fp16', 'fp32', 'bf16'] |
|
fp32_attention = True |
|
#load_from = "/mnt/my_raid/pixart/working/checkpoints/epoch_1_step_17500.pth" # https://huggingface.co/PixArt-alpha/PixArt-Sigma |
|
#resume_from = dict(checkpoint="/mnt/my_raid/pixart/working/checkpoints/epoch_37_step_62039.pth", load_ema=False, resume_optimizer=True, resume_lr_scheduler=True) |
|
vae_pretrained = "output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers/vae" # sdxl vae |
|
multi_scale = False # if use multiscale dataset model training |
|
pe_interpolation = 0.5 |
|
|
|
# training setting |
|
num_workers = 10 |
|
train_batch_size = 64 # 64 as default |
|
num_epochs = 200 # 3 |
|
gradient_accumulation_steps = 1 |
|
grad_checkpointing = True |
|
gradient_clip = 0.2 |
|
optimizer = dict(type='CAMEWrapper', lr=2e-5, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16)) |
|
lr_schedule_args = dict(num_warmup_steps=1000) |
|
|
|
#visualize=True |
|
#train_sampling_steps = 3 |
|
#eval_sampling_steps = 3 |
|
log_interval = 20 |
|
save_model_epochs = 1 |
|
#save_model_steps = 2500 |
|
work_dir = 'output/debug' |
|
|
|
# pixart-sigma |
|
scale_factor = 0.13025 |
|
real_prompt_ratio = 0.5 |
|
model_max_length = 512 |
|
class_dropout_prob = 0.1 |
|
|
|
``` |
|
|
|
## How to resume training |
|
|
|
1. Download the [model](checkpoint/epoch_50_step_116738.pth). |
|
1. Set the model as "resume_from" model. |
|
|
|
## Environmental Impact |
|
|
|
- **Hardware Type:** A6000x2 |
|
- **Hours used:** 700 |
|
- **Compute Region:** Japan |
|
- **Carbon Emitted:** Not so much |
|
|
|
## Technical Specifications [optional] |
|
|
|
### Model Architecture and Objective |
|
|
|
Diffusion Transformer |
|
|
|
### Compute Infrastructure |
|
|
|
Desktop PC |
|
|
|
#### Hardware |
|
|
|
A6000x2 |
|
|
|
#### Software |
|
|
|
[Pixart-Sigma repository](https://github.com/PixArt-alpha/PixArt-sigma) |
|
|
|
|
|
## Model Card Contact |
|
|
|
alfredplpl |