|
# ์ฌ๋ฌ GPU๋ฅผ ์ฌ์ฉํ ๋ถ์ฐ ์ถ๋ก |
|
|
|
๋ถ์ฐ ์ค์ ์์๋ ์ฌ๋ฌ ๊ฐ์ ํ๋กฌํํธ๋ฅผ ๋์์ ์์ฑํ ๋ ์ ์ฉํ ๐ค [Accelerate](https://huggingface.co/docs/accelerate/index) ๋๋ [PyTorch Distributed](https://pytorch.org/tutorials/beginner/dist_overview.html)๋ฅผ ์ฌ์ฉํ์ฌ ์ฌ๋ฌ GPU์์ ์ถ๋ก ์ ์คํํ ์ ์์ต๋๋ค. |
|
|
|
์ด ๊ฐ์ด๋์์๋ ๋ถ์ฐ ์ถ๋ก ์ ์ํด ๐ค Accelerate์ PyTorch Distributed๋ฅผ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ๋ณด์ฌ๋๋ฆฝ๋๋ค. |
|
|
|
## ๐ค Accelerate |
|
|
|
๐ค [Accelerate](https://huggingface.co/docs/accelerate/index)๋ ๋ถ์ฐ ์ค์ ์์ ์ถ๋ก ์ ์ฝ๊ฒ ํ๋ จํ๊ฑฐ๋ ์คํํ ์ ์๋๋ก ์ค๊ณ๋ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์
๋๋ค. ๋ถ์ฐ ํ๊ฒฝ ์ค์ ํ๋ก์ธ์ค๋ฅผ ๊ฐ์ํํ์ฌ PyTorch ์ฝ๋์ ์ง์คํ ์ ์๋๋ก ํด์ค๋๋ค. |
|
|
|
์์ํ๋ ค๋ฉด Python ํ์ผ์ ์์ฑํ๊ณ [`accelerate.PartialState`]๋ฅผ ์ด๊ธฐํํ์ฌ ๋ถ์ฐ ํ๊ฒฝ์ ์์ฑํ๋ฉด, ์ค์ ์ด ์๋์ผ๋ก ๊ฐ์ง๋๋ฏ๋ก `rank` ๋๋ `world_size`๋ฅผ ๋ช
์์ ์ผ๋ก ์ ์ํ ํ์๊ฐ ์์ต๋๋ค. ['DiffusionPipeline`]์ `distributed_state.device`๋ก ์ด๋ํ์ฌ ๊ฐ ํ๋ก์ธ์ค์ GPU๋ฅผ ํ ๋นํฉ๋๋ค. |
|
|
|
์ด์ ์ปจํ
์คํธ ๊ด๋ฆฌ์๋ก [`~accelerate.PartialState.split_between_processes`] ์ ํธ๋ฆฌํฐ๋ฅผ ์ฌ์ฉํ์ฌ ํ๋ก์ธ์ค ์์ ๋ฐ๋ผ ํ๋กฌํํธ๋ฅผ ์๋์ผ๋ก ๋ถ๋ฐฐํฉ๋๋ค. |
|
|
|
|
|
```py |
|
from accelerate import PartialState |
|
from diffusers import DiffusionPipeline |
|
|
|
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) |
|
distributed_state = PartialState() |
|
pipeline.to(distributed_state.device) |
|
|
|
with distributed_state.split_between_processes(["a dog", "a cat"]) as prompt: |
|
result = pipeline(prompt).images[0] |
|
result.save(f"result_{distributed_state.process_index}.png") |
|
``` |
|
|
|
Use the `--num_processes` argument to specify the number of GPUs to use, and call `accelerate launch` to run the script: |
|
|
|
```bash |
|
accelerate launch run_distributed.py --num_processes=2 |
|
``` |
|
|
|
<Tip>์์ธํ ๋ด์ฉ์ [๐ค Accelerate๋ฅผ ์ฌ์ฉํ ๋ถ์ฐ ์ถ๋ก ](https://huggingface.co/docs/accelerate/en/usage_guides/distributed_inference#distributed-inference-with-accelerate) ๊ฐ์ด๋๋ฅผ ์ฐธ์กฐํ์ธ์. |
|
|
|
</Tip> |
|
|
|
## Pytoerch ๋ถ์ฐ |
|
|
|
PyTorch๋ ๋ฐ์ดํฐ ๋ณ๋ ฌ ์ฒ๋ฆฌ๋ฅผ ๊ฐ๋ฅํ๊ฒ ํ๋ [`DistributedDataParallel`](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html)์ ์ง์ํฉ๋๋ค. |
|
|
|
์์ํ๋ ค๋ฉด Python ํ์ผ์ ์์ฑํ๊ณ `torch.distributed` ๋ฐ `torch.multiprocessing`์ ์ํฌํธํ์ฌ ๋ถ์ฐ ํ๋ก์ธ์ค ๊ทธ๋ฃน์ ์ค์ ํ๊ณ ๊ฐ GPU์์ ์ถ๋ก ์ฉ ํ๋ก์ธ์ค๋ฅผ ์์ฑํฉ๋๋ค. ๊ทธ๋ฆฌ๊ณ [`DiffusionPipeline`]๋ ์ด๊ธฐํํด์ผ ํฉ๋๋ค: |
|
|
|
ํ์ฐ ํ์ดํ๋ผ์ธ์ `rank`๋ก ์ด๋ํ๊ณ `get_rank`๋ฅผ ์ฌ์ฉํ์ฌ ๊ฐ ํ๋ก์ธ์ค์ GPU๋ฅผ ํ ๋นํ๋ฉด ๊ฐ ํ๋ก์ธ์ค๊ฐ ๋ค๋ฅธ ํ๋กฌํํธ๋ฅผ ์ฒ๋ฆฌํฉ๋๋ค: |
|
|
|
```py |
|
import torch |
|
import torch.distributed as dist |
|
import torch.multiprocessing as mp |
|
|
|
from diffusers import DiffusionPipeline |
|
|
|
sd = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) |
|
``` |
|
|
|
์ฌ์ฉํ ๋ฐฑ์๋ ์ ํ, ํ์ฌ ํ๋ก์ธ์ค์ `rank`, `world_size` ๋๋ ์ฐธ์ฌํ๋ ํ๋ก์ธ์ค ์๋ก ๋ถ์ฐ ํ๊ฒฝ ์์ฑ์ ์ฒ๋ฆฌํ๋ ํจ์[`init_process_group`]๋ฅผ ๋ง๋ค์ด ์ถ๋ก ์ ์คํํด์ผ ํฉ๋๋ค. |
|
|
|
2๊ฐ์ GPU์์ ์ถ๋ก ์ ๋ณ๋ ฌ๋ก ์คํํ๋ ๊ฒฝ์ฐ `world_size`๋ 2์
๋๋ค. |
|
|
|
```py |
|
def run_inference(rank, world_size): |
|
dist.init_process_group("nccl", rank=rank, world_size=world_size) |
|
|
|
sd.to(rank) |
|
|
|
if torch.distributed.get_rank() == 0: |
|
prompt = "a dog" |
|
elif torch.distributed.get_rank() == 1: |
|
prompt = "a cat" |
|
|
|
image = sd(prompt).images[0] |
|
image.save(f"./{'_'.join(prompt)}.png") |
|
``` |
|
|
|
๋ถ์ฐ ์ถ๋ก ์ ์คํํ๋ ค๋ฉด [`mp.spawn`](https://pytorch.org/docs/stable/multiprocessing.html#torch.multiprocessing.spawn)์ ํธ์ถํ์ฌ `world_size`์ ์ ์๋ GPU ์์ ๋ํด `run_inference` ํจ์๋ฅผ ์คํํฉ๋๋ค: |
|
|
|
```py |
|
def main(): |
|
world_size = 2 |
|
mp.spawn(run_inference, args=(world_size,), nprocs=world_size, join=True) |
|
|
|
|
|
if __name__ == "__main__": |
|
main() |
|
``` |
|
|
|
์ถ๋ก ์คํฌ๋ฆฝํธ๋ฅผ ์๋ฃํ์ผ๋ฉด `--nproc_per_node` ์ธ์๋ฅผ ์ฌ์ฉํ์ฌ ์ฌ์ฉํ GPU ์๋ฅผ ์ง์ ํ๊ณ `torchrun`์ ํธ์ถํ์ฌ ์คํฌ๋ฆฝํธ๋ฅผ ์คํํฉ๋๋ค: |
|
|
|
```bash |
|
torchrun run_distributed.py --nproc_per_node=2 |
|
``` |