Files changed (1) hide show
  1. README.md +23 -11
README.md CHANGED
@@ -7,13 +7,15 @@ base_model:
7
  pipeline_tag: text-to-video
8
  tags:
9
  - image-to-video
 
 
10
  ---
11
 
12
- # ⚡️Pyramid Flow⚡️
13
 
14
- [[Paper]](https://arxiv.org/abs/2410.05954) [[Project Page ✨]](https://pyramid-flow.github.io) [[Code 🚀]](https://github.com/jy0205/Pyramid-Flow) [[demo 🤗](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow)]
15
 
16
- This is the official repository for Pyramid Flow, a training-efficient **Autoregressive Video Generation** method based on **Flow Matching**. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
17
 
18
  <table class="center" border="0" style="width: 100%; text-align: left;">
19
  <tr>
@@ -28,10 +30,15 @@ This is the official repository for Pyramid Flow, a training-efficient **Autoreg
28
  </tr>
29
  </table>
30
 
 
31
  ## News
32
 
33
- * `COMING SOON` ⚡️⚡️⚡️ Training code and new model checkpoints trained from scratch.
 
 
 
34
  * `2024.10.11` 🤗🤗🤗 [Hugging Face demo](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow) is available. Thanks [@multimodalart](https://huggingface.co/multimodalart) for the commit!
 
35
  * `2024.10.10` 🚀🚀🚀 We release the [technical report](https://arxiv.org/abs/2410.05954), [project page](https://pyramid-flow.github.io) and [model checkpoint](https://huggingface.co/rain1011/pyramid-flow-sd3) of Pyramid Flow.
36
 
37
  ## Installation
@@ -48,7 +55,7 @@ conda activate pyramid
48
  pip install -r requirements.txt
49
  ```
50
 
51
- Then, you can directly download the model from [Huggingface](https://huggingface.co/rain1011/pyramid-flow-sd3). We provide both model checkpoints for 768p and 384p video generation. The 384p checkpoint supports 5-second video generation at 24FPS, while the 768p checkpoint supports up to 10-second video generation at 24FPS.
52
 
53
  ```python
54
  from huggingface_hub import snapshot_download
@@ -59,7 +66,9 @@ snapshot_download("rain1011/pyramid-flow-sd3", local_dir=model_path, local_dir_u
59
 
60
  ## Usage
61
 
62
- To use our model, please follow the inference code in `video_generation_demo.ipynb` at [this link](https://github.com/jy0205/Pyramid-Flow/blob/main/video_generation_demo.ipynb). We further simplify it into the following two-step procedure. First, load the downloaded model:
 
 
63
 
64
  ```python
65
  import torch
@@ -76,10 +85,13 @@ model = PyramidDiTForVideoGeneration(
76
  model_variant='diffusion_transformer_768p', # 'diffusion_transformer_384p'
77
  )
78
 
79
- model.vae.to("cuda")
80
- model.dit.to("cuda")
81
- model.text_encoder.to("cuda")
82
  model.vae.enable_tiling()
 
 
 
 
 
 
83
  ```
84
 
85
  Then, you can try text-to-video generation on your own prompts:
@@ -124,8 +136,6 @@ with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
124
  export_to_video(frames, "./image_to_video_sample.mp4", fps=24)
125
  ```
126
 
127
- We also support CPU offloading to allow inference with **less than 12GB** of GPU memory by adding a `cpu_offloading=True` parameter. This feature was contributed by [@Ednaordinary](https://github.com/Ednaordinary), see [#23](https://github.com/jy0205/Pyramid-Flow/pull/23) for details.
128
-
129
  ## Usage tips
130
 
131
  * The `guidance_scale` parameter controls the visual quality. We suggest using a guidance within [7, 9] for the 768p checkpoint during text-to-video generation, and 7 for the 384p checkpoint.
@@ -147,6 +157,7 @@ The following video examples are generated at 5s, 768p, 24fps. For more results,
147
  </tr>
148
  </table>
149
 
 
150
  ## Acknowledgement
151
 
152
  We are grateful for the following awesome projects when implementing Pyramid Flow:
@@ -160,6 +171,7 @@ We are grateful for the following awesome projects when implementing Pyramid Flo
160
  ## Citation
161
 
162
  Consider giving this repository a star and cite Pyramid Flow in your publications if it helps your research.
 
163
  ```
164
  @article{jin2024pyramidal,
165
  title={Pyramidal Flow Matching for Efficient Video Generative Modeling},
 
7
  pipeline_tag: text-to-video
8
  tags:
9
  - image-to-video
10
+ - sd3
11
+
12
  ---
13
 
14
+ # ⚡️Pyramid Flow SD3⚡️
15
 
16
+ [[Paper]](https://arxiv.org/abs/2410.05954) [[Project Page ✨]](https://pyramid-flow.github.io) [[Code 🚀]](https://github.com/jy0205/Pyramid-Flow) [[miniFLUX Model ⚡️]](https://huggingface.co/rain1011/pyramid-flow-miniflux) [[demo 🤗](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow)]
17
 
18
+ This is the model repository for Pyramid Flow, a training-efficient **Autoregressive Video Generation** method based on **Flow Matching**. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
19
 
20
  <table class="center" border="0" style="width: 100%; text-align: left;">
21
  <tr>
 
30
  </tr>
31
  </table>
32
 
33
+
34
  ## News
35
 
36
+ * `2024.10.29` ⚡️⚡️⚡️ We release [training code](https://github.com/jy0205/Pyramid-Flow?tab=readme-ov-file#training) and [new model checkpoints](https://huggingface.co/rain1011/pyramid-flow-miniflux) with FLUX structure trained from scratch.
37
+
38
+ > We have switched the model structure from SD3 to a mini FLUX to fix human structure issues, please try our 1024p image checkpoint and 384p video checkpoint. We will release 768p video checkpoint in a few days.
39
+
40
  * `2024.10.11` 🤗🤗🤗 [Hugging Face demo](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow) is available. Thanks [@multimodalart](https://huggingface.co/multimodalart) for the commit!
41
+
42
  * `2024.10.10` 🚀🚀🚀 We release the [technical report](https://arxiv.org/abs/2410.05954), [project page](https://pyramid-flow.github.io) and [model checkpoint](https://huggingface.co/rain1011/pyramid-flow-sd3) of Pyramid Flow.
43
 
44
  ## Installation
 
55
  pip install -r requirements.txt
56
  ```
57
 
58
+ Then, download the model from [Huggingface](https://huggingface.co/rain1011) (there are two variants: [miniFLUX](https://huggingface.co/rain1011/pyramid-flow-miniflux) or [SD3](https://huggingface.co/rain1011/pyramid-flow-sd3)). The miniFLUX models support 1024p image and 384p video generation, and the SD3-based models support 768p and 384p video generation. The 384p checkpoint generates 5-second video at 24FPS, while the 768p checkpoint generates up to 10-second video at 24FPS.
59
 
60
  ```python
61
  from huggingface_hub import snapshot_download
 
66
 
67
  ## Usage
68
 
69
+ For inference, we provide Gradio demo, single-GPU, multi-GPU, and Apple Silicon inference code, as well as VRAM-efficient features such as CPU offloading. Please check our [code repository](https://github.com/jy0205/Pyramid-Flow?tab=readme-ov-file#inference) for usage.
70
+
71
+ Below is a simplified two-step usage procedure. First, load the downloaded model:
72
 
73
  ```python
74
  import torch
 
85
  model_variant='diffusion_transformer_768p', # 'diffusion_transformer_384p'
86
  )
87
 
 
 
 
88
  model.vae.enable_tiling()
89
+ # model.vae.to("cuda")
90
+ # model.dit.to("cuda")
91
+ # model.text_encoder.to("cuda")
92
+
93
+ # if you're not using sequential offloading bellow uncomment the lines above ^
94
+ model.enable_sequential_cpu_offload()
95
  ```
96
 
97
  Then, you can try text-to-video generation on your own prompts:
 
136
  export_to_video(frames, "./image_to_video_sample.mp4", fps=24)
137
  ```
138
 
 
 
139
  ## Usage tips
140
 
141
  * The `guidance_scale` parameter controls the visual quality. We suggest using a guidance within [7, 9] for the 768p checkpoint during text-to-video generation, and 7 for the 384p checkpoint.
 
157
  </tr>
158
  </table>
159
 
160
+
161
  ## Acknowledgement
162
 
163
  We are grateful for the following awesome projects when implementing Pyramid Flow:
 
171
  ## Citation
172
 
173
  Consider giving this repository a star and cite Pyramid Flow in your publications if it helps your research.
174
+
175
  ```
176
  @article{jin2024pyramidal,
177
  title={Pyramidal Flow Matching for Efficient Video Generative Modeling},