Image-to-Video
Safetensors
English
File size: 4,520 Bytes
cc9034a
a1acaf5
cc9034a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a1acaf5
 
3035108
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d3b6733
3035108
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a1acaf5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
<<<<<<< HEAD
frameworks:
- Pytorch
license: other
tasks:
- image-to-video

#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt

#domain:
##如 nlp、cv、audio、multi-modal
#- nlp

#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn 

#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr

#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained

#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令,或者ModelScope SDK来下载模型

SDK下载
```bash
#安装ModelScope
pip install modelscope
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('ZhipuAI/CogVideoX1.1-5B-SAT')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/ZhipuAI/CogVideoX1.1-5B-SAT.git
```

<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
=======
license: other
language:
- en
base_model:
- THUDM/CogVideoX-5b
- THUDM/CogVideoX-5b-I2V
pipeline_tag: image-to-image
---

# CogVideoX1.1-5B-SAT

<p style="text-align: center;">
  <div align="center">
  <img src=https://modelscope.oss-cn-beijing.aliyuncs.com/resource/cogvideologo.svg width="50%"/>
  </div>
  <p align="center">
  <a href="https://huggingface.co/THUDM/CogVideoX1.1-5B-SAT/blob/main/README_zh.md">📄 中文阅读</a> |
  <a href="https://github.com/THUDM/CogVideo">🌐 Github </a> | 
  <a href="https://arxiv.org/pdf/2408.06072">📜 arxiv </a>
</p>
<p align="center">
📍 Visit <a href="https://chatglm.cn/video?lang=en?fr=osm_cogvideo">QingYing</a> and <a href="https://open.bigmodel.cn/?utm_campaign=open&_channel_track_key=OWTVNma9">API Platform</a> to experience commercial video generation models.
</p>

CogVideoX is an open-source video generation model originating from [Qingying](https://chatglm.cn/video?fr=osm_cogvideo). CogVideoX1.1 is the upgraded version of the open-source CogVideoX model.

The CogVideoX1.1-5B series model supports **10-second** videos and higher resolutions. The `CogVideoX1.1-5B-I2V` variant supports **any resolution** for video generation.

This repository contains the SAT-weight version of the CogVideoX1.1-5B model, specifically including the following modules:

## Transformer

Includes weights for both I2V and T2V models. Specifically, it includes the following modules:

```
├── transformer_i2v  
│   ├── 1000  
│   │   └── mp_rank_00_model_states.pt  
│   └── latest  
└── transformer_t2v  
    ├── 1000  
    │   └── mp_rank_00_model_states.pt  
    └── latest  
```

Please select the corresponding weights when performing inference.

## VAE

The VAE part is consistent with the CogVideoX-5B series and does not require updating. You can also download it directly from here. Specifically, it includes the following modules:

```
└── vae  
    └── 3d-vae.pt  
```

## Text Encoder

Consistent with the diffusers version of CogVideoX-5B, no updates are necessary. You can also download it directly from here. Specifically, it includes the following modules:

```
├── t5-v1_1-xxl  
   ├── added_tokens.json  
   ├── config.json  
   ├── model-00001-of-00002.safetensors  
   ├── model-00002-of-00002.safetensors  
   ├── model.safetensors.index.json  
   ├── special_tokens_map.json  
   ├── spiece.model  
   └── tokenizer_config.json  


0 directories, 8 files  
```

## Model License

This model is released under the [CogVideoX LICENSE](LICENSE).

## Citation

```
@article{yang2024cogvideox,
  title={CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer},
  author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
  journal={arXiv preprint arXiv:2408.06072},
  year={2024}
}
```

>>>>>>> d3b67333b542bd8c97cb01bbd5e89088b27a5ae6