Update README.md
Browse files
README.md
CHANGED
@@ -47,3 +47,10 @@ We provide some qualitative comparison between FastHunyuan 6 step inference v.s.
|
|
47 |
|  |  |
|
48 |
|  |  |
|
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
|  |  |
|
48 |
|  |  |
|
49 |
|
50 |
+
## Memory requirements
|
51 |
+
|
52 |
+
For inference, we can inference FastHunyuan on single RTX4090. We now support NF4 and LLM-INT8 quantized inference using BitsAndBytes for FastHunyuan. With NF4 quantization, inference can be performed on a single RTX 4090 GPU, requiring just 20GB of VRAM.
|
53 |
+
|
54 |
+
For Lora Finetune, minimum hardware requirement
|
55 |
+
- 40 GB GPU memory each for 2 GPUs with lora
|
56 |
+
- 30 GB GPU memory each for 2 GPUs with CPU offload and lora.
|