NadaGh's picture
End of training
3a25a0a verified

[[open-in-colab]]

ํ›‘์–ด๋ณด๊ธฐ

Diffusion ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋‚˜ ์˜ค๋””์˜ค์™€ ๊ฐ™์€ ๊ด€์‹ฌ ์ƒ˜ํ”Œ๋“ค์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๋žœ๋ค ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ์ œ๊ฑฐํ•˜๋„๋ก ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ์ƒ์„ฑ AI์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ด ๋งค์šฐ ๋†’์•„์กŒ์œผ๋ฉฐ, ์ธํ„ฐ๋„ท์—์„œ diffusion ์ƒ์„ฑ ์ด๋ฏธ์ง€์˜ ์˜ˆ๋ฅผ ๋ณธ ์ ์ด ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๐Ÿงจ Diffusers๋Š” ๋ˆ„๊ตฌ๋‚˜ diffusion ๋ชจ๋ธ๋“ค์„ ๋„๋ฆฌ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.

๊ฐœ๋ฐœ์ž๋“  ์ผ๋ฐ˜ ์‚ฌ์šฉ์ž๋“  ์ด ํ›‘์–ด๋ณด๊ธฐ๋ฅผ ํ†ตํ•ด ๐Ÿงจ Diffusers๋ฅผ ์†Œ๊ฐœํ•˜๊ณ  ๋น ๋ฅด๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€๋“œ๋ฆฝ๋‹ˆ๋‹ค! ์•Œ์•„์•ผ ํ•  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋Š” ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค:

  • [DiffusionPipeline]์€ ์ถ”๋ก ์„ ์œ„ํ•ด ์‚ฌ์ „ ํ•™์Šต๋œ diffusion ๋ชจ๋ธ์—์„œ ์ƒ˜ํ”Œ์„ ๋น ๋ฅด๊ฒŒ ์ƒ์„ฑํ•˜๋„๋ก ์„ค๊ณ„๋œ ๋†’์€ ์ˆ˜์ค€์˜ ์—”๋“œํˆฌ์—”๋“œ ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค.
  • Diffusion ์‹œ์Šคํ…œ ์ƒ์„ฑ์„ ์œ„ํ•œ ๋นŒ๋”ฉ ๋ธ”๋ก์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ model ์•„ํ‚คํ…์ฒ˜ ๋ฐ ๋ชจ๋“ˆ.
  • ๋‹ค์–‘ํ•œ schedulers - ํ•™์Šต์„ ์œ„ํ•ด ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ถ”๋ก  ์ค‘์— ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์–ดํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

ํ›‘์–ด๋ณด๊ธฐ์—์„œ๋Š” ์ถ”๋ก ์„ ์œ„ํ•ด [DiffusionPipeline]์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค€ ๋‹ค์Œ, ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ [DiffusionPipeline] ๋‚ด๋ถ€์—์„œ ์ผ์–ด๋‚˜๋Š” ์ผ์„ ๋ณต์ œํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค.

ํ›‘์–ด๋ณด๊ธฐ๋Š” ๊ฐ„๊ฒฐํ•œ ๋ฒ„์ „์˜ ๐Ÿงจ Diffusers ์†Œ๊ฐœ๋กœ์„œ ๋…ธํŠธ๋ถ ๋น ๋ฅด๊ฒŒ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๋””ํ“จ์ €์˜ ๋ชฉํ‘œ, ๋””์ž์ธ ์ฒ ํ•™, ํ•ต์‹ฌ API์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ์ž์„ธํžˆ ์•Œ์•„๋ณด๋ ค๋ฉด ๋…ธํŠธ๋ถ์„ ํ™•์ธํ•˜์„ธ์š”!

์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:

# ์ฃผ์„ ํ’€์–ด์„œ Colab์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ํ•˜๊ธฐ.
#!pip install --upgrade diffusers accelerate transformers
  • ๐Ÿค— Accelerate๋Š” ์ถ”๋ก  ๋ฐ ํ•™์Šต์„ ์œ„ํ•œ ๋ชจ๋ธ ๋กœ๋”ฉ ์†๋„๋ฅผ ๋†’์—ฌ์ค๋‹ˆ๋‹ค.
  • ๐Ÿค— Transformers๋Š” Stable Diffusion๊ณผ ๊ฐ™์ด ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” diffusion ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

DiffusionPipeline

[DiffusionPipeline] ์€ ์ถ”๋ก ์„ ์œ„ํ•ด ์‚ฌ์ „ ํ•™์Šต๋œ diffusion ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ํฌํ•จํ•˜๋Š” ์—”๋“œ ํˆฌ ์—”๋“œ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์ž‘์—…์— [DiffusionPipeline]์„ ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ํ‘œ์—์„œ ์ง€์›๋˜๋Š” ๋ช‡ ๊ฐ€์ง€ ์ž‘์—…์„ ์‚ดํŽด๋ณด๊ณ , ์ง€์›๋˜๋Š” ์ž‘์—…์˜ ์ „์ฒด ๋ชฉ๋ก์€ ๐Ÿงจ Diffusers Summary ํ‘œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Task Description Pipeline
Unconditional Image Generation generate an image from Gaussian noise unconditional_image_generation
Text-Guided Image Generation generate an image given a text prompt conditional_image_generation
Text-Guided Image-to-Image Translation adapt an image guided by a text prompt img2img
Text-Guided Image-Inpainting fill the masked part of an image given the image, the mask and a text prompt inpaint
Text-Guided Depth-to-Image Translation adapt parts of an image guided by a text prompt while preserving structure via depth estimation depth2img

๋จผ์ € [DiffusionPipeline]์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋‹ค์šด๋กœ๋“œํ•  ํŒŒ์ดํ”„๋ผ์ธ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ์ €์žฅ๋œ ๋ชจ๋“  checkpoint์— ๋Œ€ํ•ด [DiffusionPipeline]์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ›‘์–ด๋ณด๊ธฐ์—์„œ๋Š” text-to-image ์ƒ์„ฑ์„ ์œ„ํ•œ stable-diffusion-v1-5 ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

Stable Diffusion ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— ๋ผ์ด์„ ์Šค๋ฅผ ๋จผ์ € ์ฃผ์˜ ๊นŠ๊ฒŒ ์ฝ์–ด์ฃผ์„ธ์š”. ๐Ÿงจ Diffusers๋Š” ๋ถˆ์พŒํ•˜๊ฑฐ๋‚˜ ์œ ํ•ดํ•œ ์ฝ˜ํ…์ธ ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด safety_checker๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ์žˆ์ง€๋งŒ, ๋ชจ๋ธ์˜ ํ–ฅ์ƒ๋œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๊ธฐ๋Šฅ์œผ๋กœ ์ธํ•ด ์—ฌ์ „ํžˆ ์ž ์žฌ์ ์œผ๋กœ ์œ ํ•ดํ•œ ์ฝ˜ํ…์ธ ๊ฐ€ ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

[~DiffusionPipeline.from_pretrained] ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจ๋ธ ๋กœ๋“œํ•˜๊ธฐ:

>>> from diffusers import DiffusionPipeline

>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

The [DiffusionPipeline]์€ ๋ชจ๋“  ๋ชจ๋ธ๋ง, ํ† ํฐํ™”, ์Šค์ผ€์ค„๋ง ์ปดํฌ๋„ŒํŠธ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ์บ์‹œํ•ฉ๋‹ˆ๋‹ค. Stable Diffusion Pipeline์€ ๋ฌด์—‡๋ณด๋‹ค๋„ [UNet2DConditionModel]๊ณผ [PNDMScheduler]๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

>>> pipeline
StableDiffusionPipeline {
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.13.1",
  ...,
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  ...,
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

์ด ๋ชจ๋ธ์€ ์•ฝ 14์–ต ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ GPU์—์„œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•  ๊ฒƒ์„ ๊ฐ•๋ ฅํžˆ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. PyTorch์—์„œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ ๊ฐ์ฒด๋ฅผ GPU๋กœ ์ด๋™ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

>>> pipeline.to("cuda")

์ด์ œ ํŒŒ์ดํ”„๋ผ์ธ์— ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ ๋‹ค์Œ ๋…ธ์ด์ฆˆ๊ฐ€ ์ œ๊ฑฐ๋œ ์ด๋ฏธ์ง€์— ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด๋ฏธ์ง€ ์ถœ๋ ฅ์€ PIL.Image ๊ฐ์ฒด๋กœ ๊ฐ์‹ธ์ง‘๋‹ˆ๋‹ค.

>>> image = pipeline("An image of a squirrel in Picasso style").images[0]
>>> image

save๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค:

>>> image.save("image_of_squirrel_painting.png")

๋กœ์ปฌ ํŒŒ์ดํ”„๋ผ์ธ

ํŒŒ์ดํ”„๋ผ์ธ์„ ๋กœ์ปฌ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ ์ผํ•œ ์ฐจ์ด์ ์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋จผ์ € ๋‹ค์šด๋กœ๋“œํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค:

!git lfs install
!git clone https://huggingface.co/runwayml/stable-diffusion-v1-5

๊ทธ๋Ÿฐ ๋‹ค์Œ ์ €์žฅ๋œ ๊ฐ€์ค‘์น˜๋ฅผ ํŒŒ์ดํ”„๋ผ์ธ์— ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:

>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")

์ด์ œ ์œ„ ์„น์…˜์—์„œ์™€ ๊ฐ™์ด ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์Šค์ผ€์ค„๋Ÿฌ ๊ต์ฒด

์Šค์ผ€์ค„๋Ÿฌ๋งˆ๋‹ค ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ์†๋„์™€ ํ’ˆ์งˆ์ด ์„œ๋กœ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์ž์‹ ์—๊ฒŒ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ฐพ๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ ์ง์ ‘ ์‚ฌ์šฉํ•ด ๋ณด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค! ๐Ÿงจ Diffusers์˜ ์ฃผ์š” ๊ธฐ๋Šฅ ์ค‘ ํ•˜๋‚˜๋Š” ์Šค์ผ€์ค„๋Ÿฌ ๊ฐ„์— ์‰ฝ๊ฒŒ ์ „ํ™˜์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ธฐ๋ณธ ์Šค์ผ€์ค„๋Ÿฌ์ธ [PNDMScheduler]๋ฅผ [EulerDiscreteScheduler]๋กœ ๋ฐ”๊พธ๋ ค๋ฉด, [~diffusers.ConfigMixin.from_config] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ๋“œํ•˜์„ธ์š”:

>>> from diffusers import EulerDiscreteScheduler

>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)

์ƒˆ ์Šค์ผ€์ค„๋Ÿฌ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด๋ณด๊ณ  ์–ด๋–ค ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธํ•ด ๋ณด์„ธ์š”!

๋‹ค์Œ ์„น์…˜์—์„œ๋Š” ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ผ๋Š” [DiffusionPipeline]์„ ๊ตฌ์„ฑํ•˜๋Š” ์ปดํฌ๋„ŒํŠธ๋ฅผ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ณ  ์ด๋Ÿฌํ•œ ์ปดํฌ๋„ŒํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ์–‘์ด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์›Œ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ

๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์ƒ˜ํ”Œ์„ ๊ฐ€์ ธ์™€ ๊ฐ ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ๋งˆ๋‹ค ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ด๋ฏธ์ง€์™€ ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์‚ฌ์ด์˜ ์ฐจ์ด์ธ ๋…ธ์ด์ฆˆ ์ž”์ฐจ(๋‹ค๋ฅธ ๋ชจ๋ธ์€ ์ด์ „ ์ƒ˜ํ”Œ์„ ์ง์ ‘ ์˜ˆ์ธกํ•˜๊ฑฐ๋‚˜ ์†๋„ ๋˜๋Š” v-prediction์„ ์˜ˆ์ธกํ•˜๋Š” ํ•™์Šต์„ ํ•ฉ๋‹ˆ๋‹ค)์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ๋ฏน์Šค ์•ค ๋งค์น˜ํ•˜์—ฌ ๋‹ค๋ฅธ diffusion ์‹œ์Šคํ…œ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ์€ [~ModelMixin.from_pretrained] ๋ฉ”์„œ๋“œ๋กœ ์‹œ์ž‘๋˜๋ฉฐ, ์ด ๋ฉ”์„œ๋“œ๋Š” ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋กœ์ปฌ์— ์บ์‹œํ•˜์—ฌ ๋‹ค์Œ์— ๋ชจ๋ธ์„ ๋กœ๋“œํ•  ๋•Œ ๋” ๋น ๋ฅด๊ฒŒ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ›‘์–ด๋ณด๊ธฐ์—์„œ๋Š” ๊ณ ์–‘์ด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํ•™์Šต๋œ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์žˆ๋Š” ๊ธฐ๋ณธ์ ์ธ unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ์ธ [UNet2DModel]์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:

>>> from diffusers import UNet2DModel

>>> repo_id = "google/ddpm-cat-256"
>>> model = UNet2DModel.from_pretrained(repo_id)

๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜์— ์•ก์„ธ์Šคํ•˜๋ ค๋ฉด model.config๋ฅผ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค:

>>> model.config

๋ชจ๋ธ ๊ตฌ์„ฑ์€ ๐ŸงŠ ๊ณ ์ •๋œ ๐ŸงŠ ๋”•์…”๋„ˆ๋ฆฌ๋กœ, ๋ชจ๋ธ์ด ์ƒ์„ฑ๋œ ํ›„์—๋Š” ํ•ด๋‹น ๋งค๊ฐœ ๋ณ€์ˆ˜๋“ค์„ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์˜๋„์ ์ธ ๊ฒƒ์œผ๋กœ, ์ฒ˜์Œ์— ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ •์˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋™์ผํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋ฉด์„œ ๋‹ค๋ฅธ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ์ถ”๋ก  ์ค‘์— ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • sample_size: ์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ๋†’์ด ๋ฐ ๋„ˆ๋น„ ์น˜์ˆ˜์ž…๋‹ˆ๋‹ค.
  • in_channels: ์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜์ž…๋‹ˆ๋‹ค.
  • down_block_types ๋ฐ up_block_types: UNet ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋‹ค์šด ๋ฐ ์—…์ƒ˜ํ”Œ๋ง ๋ธ”๋ก์˜ ์œ ํ˜•.
  • block_out_channels: ๋‹ค์šด์ƒ˜ํ”Œ๋ง ๋ธ”๋ก์˜ ์ถœ๋ ฅ ์ฑ„๋„ ์ˆ˜. ์—…์ƒ˜ํ”Œ๋ง ๋ธ”๋ก์˜ ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜์— ์—ญ์ˆœ์œผ๋กœ ์‚ฌ์šฉ๋˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.
  • layers_per_block: ๊ฐ UNet ๋ธ”๋ก์— ์กด์žฌํ•˜๋Š” ResNet ๋ธ”๋ก์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.

์ถ”๋ก ์— ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋žœ๋ค ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋กœ ์ด๋ฏธ์ง€ ๋ชจ์–‘์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฌด์ž‘์œ„ ๋…ธ์ด์ฆˆ๋ฅผ ์ˆ˜์‹ ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ 'batch' ์ถ•, ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜์— ํ•ด๋‹นํ•˜๋Š” 'channel' ์ถ•, ์ด๋ฏธ์ง€์˜ ๋†’์ด์™€ ๋„ˆ๋น„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” 'sample_size' ์ถ•์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

>>> import torch

>>> torch.manual_seed(0)

>>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
>>> noisy_sample.shape
torch.Size([1, 3, 256, 256])

์ถ”๋ก ์„ ์œ„ํ•ด ๋ชจ๋ธ์— ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ timestep์„ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. 'timestep'์€ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๋…ธ์ด์ฆˆ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์‹œ์ž‘ ๋ถ€๋ถ„์— ๋” ๋งŽ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๊ณ  ๋ ๋ถ€๋ถ„์— ๋” ์ ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์ด diffusion ๊ณผ์ •์—์„œ ์‹œ์ž‘ ๋˜๋Š” ๋์— ๋” ๊ฐ€๊นŒ์šด ์œ„์น˜๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. sample ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ์ถœ๋ ฅ์„ ์–ป์Šต๋‹ˆ๋‹ค:

>>> with torch.no_grad():
...     noisy_residual = model(sample=noisy_sample, timestep=2).sample

ํ•˜์ง€๋งŒ ์‹ค์ œ ์˜ˆ๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค๋ฅผ ์•ˆ๋‚ดํ•  ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ์„น์…˜์—์„œ๋Š” ๋ชจ๋ธ์„ ์Šค์ผ€์ค„๋Ÿฌ์™€ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.

์Šค์ผ€์ค„๋Ÿฌ

์Šค์ผ€์ค„๋Ÿฌ๋Š” ๋ชจ๋ธ ์ถœ๋ ฅ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ์€ ์ƒ˜ํ”Œ์—์„œ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ƒ˜ํ”Œ๋กœ ์ „ํ™˜ํ•˜๋Š” ๊ฒƒ์„ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค - ์ด ๊ฒฝ์šฐ 'noisy_residual'.

๐Ÿงจ Diffusers๋Š” Diffusion ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ํˆด๋ฐ•์Šค์ž…๋‹ˆ๋‹ค. [DiffusionPipeline]์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ฏธ๋ฆฌ ๋งŒ๋“ค์–ด์ง„ Diffusion ์‹œ์Šคํ…œ์„ ํŽธ๋ฆฌํ•˜๊ฒŒ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ์„ ํƒํ•˜์—ฌ ์‚ฌ์šฉ์ž ์ง€์ • Diffusion ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ›‘์–ด๋ณด๊ธฐ์˜ ๊ฒฝ์šฐ, [~diffusers.ConfigMixin.from_config] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ [DDPMScheduler]๋ฅผ ์ธ์Šคํ„ด์Šคํ™”ํ•ฉ๋‹ˆ๋‹ค:

>>> from diffusers import DDPMScheduler

>>> scheduler = DDPMScheduler.from_config(repo_id)
>>> scheduler
DDPMScheduler {
  "_class_name": "DDPMScheduler",
  "_diffusers_version": "0.13.1",
  "beta_end": 0.02,
  "beta_schedule": "linear",
  "beta_start": 0.0001,
  "clip_sample": true,
  "clip_sample_range": 1.0,
  "num_train_timesteps": 1000,
  "prediction_type": "epsilon",
  "trained_betas": null,
  "variance_type": "fixed_small"
}

๐Ÿ’ก ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ๊ตฌ์„ฑ์—์„œ ์–ด๋–ป๊ฒŒ ์ธ์Šคํ„ด์Šคํ™”๋˜๋Š”์ง€ ์ฃผ๋ชฉํ•˜์„ธ์š”. ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ ์Šค์ผ€์ค„๋Ÿฌ์—๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๊ฐ€์ค‘์น˜๊ฐ€ ์—†์œผ๋ฉฐ ๋งค๊ฐœ๋ณ€์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค!

๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • num_train_timesteps: ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค์˜ ๊ธธ์ด, ์ฆ‰ ๋žœ๋ค ๊ฐ€์šฐ์Šค ๋…ธ์ด์ฆˆ๋ฅผ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ํƒ€์ž„์Šคํ… ์ˆ˜์ž…๋‹ˆ๋‹ค.
  • beta_schedule: ์ถ”๋ก  ๋ฐ ํ•™์Šต์— ์‚ฌ์šฉํ•  ๋…ธ์ด์ฆˆ ์Šค์ผ€์ค„ ์œ ํ˜•์ž…๋‹ˆ๋‹ค.
  • beta_start ๋ฐ beta_end: ๋…ธ์ด์ฆˆ ์Šค์ผ€์ค„์˜ ์‹œ์ž‘ ๋ฐ ์ข…๋ฃŒ ๋…ธ์ด์ฆˆ ๊ฐ’์ž…๋‹ˆ๋‹ค.

๋…ธ์ด์ฆˆ๊ฐ€ ์•ฝ๊ฐ„ ์ ์€ ์ด๋ฏธ์ง€๋ฅผ ์˜ˆ์ธกํ•˜๋ ค๋ฉด ์Šค์ผ€์ค„๋Ÿฌ์˜ [~diffusers.DDPMScheduler.step] ๋ฉ”์„œ๋“œ์— ๋ชจ๋ธ ์ถœ๋ ฅ, timestep, ํ˜„์žฌ sample์„ ์ „๋‹ฌํ•˜์„ธ์š”.

>>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
>>> less_noisy_sample.shape

less_noisy_sample์„ ๋‹ค์Œ timestep์œผ๋กœ ๋„˜๊ธฐ๋ฉด ๋…ธ์ด์ฆˆ๊ฐ€ ๋” ์ค„์–ด๋“ญ๋‹ˆ๋‹ค! ์ด์ œ ์ด ๋ชจ๋“  ๊ฒƒ์„ ํ•œ๋ฐ ๋ชจ์•„ ์ „์ฒด ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๊ณผ์ •์„ ์‹œ๊ฐํ™”ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๋จผ์ € ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋œ ์ด๋ฏธ์ง€๋ฅผ ํ›„์ฒ˜๋ฆฌํ•˜์—ฌ PIL.Image๋กœ ํ‘œ์‹œํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค:

>>> import PIL.Image
>>> import numpy as np


>>> def display_sample(sample, i):
...     image_processed = sample.cpu().permute(0, 2, 3, 1)
...     image_processed = (image_processed + 1.0) * 127.5
...     image_processed = image_processed.numpy().astype(np.uint8)

...     image_pil = PIL.Image.fromarray(image_processed[0])
...     display(f"Image at step {i}")
...     display(image_pil)

๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค์˜ ์†๋„๋ฅผ ๋†’์ด๋ ค๋ฉด ์ž…๋ ฅ๊ณผ ๋ชจ๋ธ์„ GPU๋กœ ์˜ฎ๊ธฐ์„ธ์š”:

>>> model.to("cuda")
>>> noisy_sample = noisy_sample.to("cuda")

์ด์ œ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ƒ˜ํ”Œ์˜ ์ž”์ฐจ๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  ์Šค์ผ€์ค„๋Ÿฌ๋กœ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ƒ˜ํ”Œ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

>>> import tqdm

>>> sample = noisy_sample

>>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
...     # 1. predict noise residual
...     with torch.no_grad():
...         residual = model(sample, t).sample

...     # 2. compute less noisy image and set x_t -> x_t-1
...     sample = scheduler.step(residual, t, sample).prev_sample

...     # 3. optionally look at image
...     if (i + 1) % 50 == 0:
...         display_sample(sample, i + 1)

๊ฐ€๋งŒํžˆ ์•‰์•„์„œ ๊ณ ์–‘์ด๊ฐ€ ์†Œ์Œ์œผ๋กœ๋งŒ ์ƒ์„ฑ๋˜๋Š” ๊ฒƒ์„ ์ง€์ผœ๋ณด์„ธ์š”!๐Ÿ˜ป

๋‹ค์Œ ๋‹จ๊ณ„

์ด๋ฒˆ ํ›‘์–ด๋ณด๊ธฐ์—์„œ ๐Ÿงจ Diffusers๋กœ ๋ฉ‹์ง„ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด ๋ณด์…จ๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค! ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๋„˜์–ด๊ฐ€์„ธ์š”: