Text-to-Video
Diffusers
Safetensors
Japanese
English
art
alfredplpl commited on
Commit
a0b67af
·
verified ·
1 Parent(s): a5a8a4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +133 -3
README.md CHANGED
@@ -1,3 +1,133 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - HuggingFaceFV/finevideo
4
+ - LanguageBind/Open-Sora-Plan-v1.0.0
5
+ language:
6
+ - ja
7
+ - en
8
+ library_name: diffusers
9
+ license: apache-2.0
10
+ pipeline_tag: text-to-video
11
+ tags:
12
+ - art
13
+ ---
14
+
15
+ # Model Card for CommonVideo
16
+
17
+ This is a text-to-video model learning from CC-BY, CC-0 like images.
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ At AI Picasso, we develop AI technology through active dialogue with creators, aiming for mutual understanding and cooperation.
24
+ We strive to solve challenges faced by creators and grow together.
25
+ One of these challenges is that some creators and fans want to use image generation but can't, likely due to the lack of permission to use certain images for training.
26
+ To address this issue, we have developed CommonVideo.
27
+
28
+ #### Features of CommonVideo
29
+
30
+ - Principally uses images with obtained learning permissions
31
+ - Understands both Japanese and English text inputs directly
32
+ - Minimizes the risk of exact reproduction of training images
33
+ - Utilizes cutting-edge technology for high quality and efficiency
34
+
35
+ ### Misc.
36
+
37
+ - **Developed by:** alfredplpl, matty
38
+ - **Funded by:** AIdeaLab, Inc.
39
+ - **Shared by:** AI Picasso, Inc.
40
+ - **Model type:** Rectified Flow Transformer
41
+ - **Language(s) (NLP):** Japanese, English
42
+ - **License:** Apache-2.0
43
+
44
+ ### Model Sources
45
+
46
+ - **Repository:** TBA
47
+ - **Paper :** TBA
48
+
49
+ ## How to Get Started with the Model
50
+
51
+ - diffusers for 16GB+ VRAM GPU
52
+
53
+ 1. Install libraries.
54
+
55
+ ```bash
56
+ pip install transformers diffusers
57
+ ```
58
+
59
+ 2. Run the following script
60
+
61
+ ```python
62
+ TBA
63
+ ```
64
+
65
+ ## Uses
66
+
67
+ ### Direct Use
68
+
69
+ - Assistance in creating illustrations, manga, and anime
70
+ - For both commercial and non-commercial purposes
71
+ - Communication with creators when making requests
72
+ - Commercial provision of image generation services
73
+ - Please be cautious when handling generated content
74
+ - Self-expression
75
+ - Using this AI to express "your" uniqueness
76
+ - Research and development
77
+ - Fine-tuning (also known as additional training) such as LoRA
78
+ - Merging with other models
79
+ - Examining the performance of this model using metrics like FID
80
+ - Education
81
+ - Graduation projects for art school or vocational school students
82
+ - University students' graduation theses or project assignments
83
+ - Teachers demonstrating the current state of image generation AI
84
+ - Uses described in the Hugging Face Community
85
+ - Please ask questions in Japanese or English
86
+
87
+ ### Out-of-Scope Use
88
+
89
+ - Generate misinfomation or disinformation.
90
+
91
+ ## Bias, Risks, and Limitations
92
+
93
+ TBA
94
+
95
+ ## Training Details
96
+
97
+ ### Training Data
98
+ We used these dataset to train the transformer:
99
+
100
+ - [Pixabay](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.0.0)
101
+ - [FineVideo](https://huggingface.co/datasets/HuggingFaceFV/finevideo)
102
+
103
+ ## Technical Specifications
104
+
105
+ ### Model Architecture and Objective
106
+
107
+ ## Model Architecture
108
+
109
+ [CogVideoX based architecture](https://github.com/THUDM/CogVideo)
110
+
111
+ ## Objective
112
+
113
+ []
114
+
115
+ ### Compute Infrastructure
116
+
117
+ Google Cloud (Tokyo Region).
118
+
119
+ #### Hardware
120
+
121
+ We used NVIDIA L4x8 instance 4 nodes. (Total: L4x32)
122
+
123
+ #### Software
124
+
125
+ [Finetrainers based code](https://github.com/a-r-r-o-w/finetrainers)
126
+
127
+ ## Model Card Contact
128
+
129
+ - [Contact page](https://aidealab.com/contact)
130
+
131
+ # Acknowledgement
132
+ We approciate the video providers.
133
+ So, we are **standing on the shoulders of giants**.