Update README.md
Browse files
README.md
CHANGED
@@ -1,99 +1,63 @@
|
|
1 |
-
|
2 |
-
pipeline_tag: image-to-video
|
3 |
-
license: other
|
4 |
-
license_name: stable-video-diffusion-community
|
5 |
-
license_link: LICENSE.md
|
6 |
-
---
|
7 |
-
|
8 |
-
# Stable Video Diffusion Image-to-Video Model Card
|
9 |
-
|
10 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
-
![row01](output_tile.gif)
|
12 |
-
Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it.
|
13 |
-
|
14 |
-
Please note: For commercial use, please refer to https://stability.ai/license.
|
15 |
-
|
16 |
-
## Model Details
|
17 |
-
|
18 |
-
### Model Description
|
19 |
-
|
20 |
-
(SVD) Image-to-Video is a latent diffusion model trained to generate short video clips from an image conditioning.
|
21 |
-
This model was trained to generate 25 frames at resolution 576x1024 given a context frame of the same size, finetuned from [SVD Image-to-Video [14 frames]](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid).
|
22 |
-
We also finetune the widely used [f8-decoder](https://huggingface.co/docs/diffusers/api/models/autoencoderkl#loading-from-the-original-format) for temporal consistency.
|
23 |
-
For convenience, we additionally provide the model with the
|
24 |
-
standard frame-wise decoder [here](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/blob/main/svd_xt_image_decoder.safetensors).
|
25 |
|
|
|
26 |
|
27 |
-
|
28 |
-
- **Funded by:** Stability AI
|
29 |
-
- **Model type:** Generative image-to-video model
|
30 |
-
- **Finetuned from model:** SVD Image-to-Video [14 frames]
|
31 |
-
|
32 |
-
### Model Sources
|
33 |
-
|
34 |
-
For research purposes, we recommend our `generative-models` Github repository (https://github.com/Stability-AI/generative-models),
|
35 |
-
which implements the most popular diffusion frameworks (both training and inference).
|
36 |
-
|
37 |
-
- **Repository:** https://github.com/Stability-AI/generative-models
|
38 |
-
- **Paper:** https://stability.ai/research/stable-video-diffusion-scaling-latent-video-diffusion-models-to-large-datasets
|
39 |
|
|
|
|
|
40 |
|
41 |
-
|
42 |
-
![comparison](comparison.png)
|
43 |
-
The chart above evaluates user preference for SVD-Image-to-Video over [GEN-2](https://research.runwayml.com/gen2) and [PikaLabs](https://www.pika.art/).
|
44 |
-
SVD-Image-to-Video is preferred by human voters in terms of video quality. For details on the user study, we refer to the [research paper](https://stability.ai/research/stable-video-diffusion-scaling-latent-video-diffusion-models-to-large-datasets)
|
45 |
|
46 |
-
|
|
|
47 |
|
48 |
-
|
|
|
|
|
49 |
|
50 |
-
|
|
|
|
|
51 |
|
52 |
-
|
53 |
-
-
|
54 |
-
-
|
55 |
-
- Generation of artworks and use in design and other artistic processes.
|
56 |
-
- Applications in educational or creative tools.
|
57 |
|
58 |
-
|
|
|
|
|
59 |
|
60 |
-
|
|
|
|
|
61 |
|
62 |
-
|
|
|
|
|
63 |
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
|
68 |
-
|
69 |
|
70 |
-
|
71 |
-
|
72 |
-
- The model may generate videos without motion, or very slow camera pans.
|
73 |
-
- The model cannot be controlled through text.
|
74 |
-
- The model cannot render legible text.
|
75 |
-
- Faces and people in general may not be generated properly.
|
76 |
-
- The autoencoding part of the model is lossy.
|
77 |
|
|
|
78 |
|
79 |
-
|
|
|
80 |
|
81 |
-
|
82 |
|
83 |
-
|
|
|
84 |
|
85 |
-
|
|
|
86 |
|
87 |
-
|
88 |
|
89 |
-
|
90 |
-
No explicit human labor is involved in training data preparation. However, human evaluation for model outputs and quality was extensively used to evaluate model quality and performance. The evaluations were performed with third-party contractor platforms (Amazon Sagemaker, Amazon Mechanical Turk, Prolific) with fluent English-speaking contractors from various countries, primarily from the USA, UK, and Canada. Each worker was paid $12/hr for the time invested in the evaluation.
|
91 |
-
No other third party was involved in the development of this model; the model was fully developed in-house at Stability AI.
|
92 |
-
Training the SVD checkpoints required a total of approximately 200,000 A100 80GB hours. The majority of the training occurred on 48 * 8 A100s, while some stages took more/less than that. The resulting CO2 emission is ~19,000kg CO2 eq., and energy consumed is ~64000 kWh.
|
93 |
-
The released checkpoints (SVD/SVD-XT) are image-to-video models that generate short videos/animations closely following the given input image. Since the model relies on an existing supplied image, the potential risks of disclosing specific material or novel unsafe content are minimal. This was also evaluated by third-party independent red-teaming services, which agree with our conclusion to a high degree of confidence (>90% in various areas of safety red-teaming). The external evaluations were also performed for trustworthiness, leading to >95% confidence in real, trustworthy videos.
|
94 |
-
With the default settings at the time of release, SVD takes ~100s for generation, and SVD-XT takes ~180s on an A100 80GB card. Several optimizations to trade off quality / memory / speed can be done to perform faster inference or inference on lower VRAM cards.
|
95 |
-
The information related to the model and its development process and usage protocols can be found in the GitHub repo, associated research paper, and HuggingFace model page/cards.
|
96 |
-
The released model inference & demo code has image-level watermarking enabled by default, which can be used to detect the outputs. This is done via the imWatermark Python library.
|
97 |
-
The model can be used to generate videos from static initial images. However, we prohibit unlawful, obscene, or misleading uses of the model consistent with the terms of our license and Acceptable Use Policy. For the open-weights release, our training data filtering mitigations alleviate this risk to some extent. These restrictions are explicitly enforced on user-facing interfaces at stablevideo.com, where a warning is issued. We do not take any responsibility for third-party interfaces. Submitting initial images that bypass input filters to tease out offensive or inappropriate content listed above is also prohibited. Safety filtering checks at stablevideo.com run on model inputs and outputs independently. More details on our user-facing interfaces can be found here: https://www.stablevideo.com/faq. Beyond the Acceptable Use Policy and other mitigations and conditions described here, the model is not subject to additional model behavior interventions of the type described in the Foundation Model Transparency Index.
|
98 |
-
For stablevideo.com, we store preference data in the form of upvotes/downvotes on user-generated videos, and we have a pairwise ranker that runs while a user generates videos. This usage data is solely used for improving Stability AI’s future image/video models and services. No other third-party entities are given access to the usage data beyond Stability AI and maintainers of stablevideo.com.
|
99 |
-
For usage statistics of SVD, we refer interested users to HuggingFace model download/usage statistics as a primary indicator. Third-party applications also have reported model usage statistics. We might also consider releasing aggregate usage statistics of stablevideo.com on reaching some milestones.
|
|
|
1 |
+
### Prompt de Vídeo para Comprar um Relógio
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
+
---
|
4 |
|
5 |
+
🌟 **Título:** Encontre o Relógio dos Seus Sonhos! ⏰✨
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
+
🔥 **Introdução:**
|
8 |
+
"Diga adeus aos atrasos com estilo! 🌟 Descubra o relógio perfeito que combina com você."
|
9 |
|
10 |
+
---
|
|
|
|
|
|
|
11 |
|
12 |
+
**Cena 1: Relógio de Luxo em Exposição (Imagem Inicial)**
|
13 |
+
- Transição suave de uma imagem de um relógio elegante em uma vitrine iluminada.
|
14 |
|
15 |
+
**Cena 2: Detalhes do Produto (10s)**
|
16 |
+
- Zoom nos detalhes do relógio: mostrador sofisticado, pulseira de couro/cinza, mecanismos intricados.
|
17 |
+
- Texto: "Qualidade Artesanal de Primeira Linha 🔄"
|
18 |
|
19 |
+
**Cena 3: Funções Especiais (10s)**
|
20 |
+
- Vídeo mostrando funcionalidades especiais: resistência à água, backlight, cronômetro.
|
21 |
+
- Texto: "Funções que Facilitarão Seu Dia 💧✨"
|
22 |
|
23 |
+
**Cena 4: Conforto e Estilo (10s)**
|
24 |
+
- Mão colocando o relógio, mostrando conforto e ajuste perfeito.
|
25 |
+
- Texto: "Design Ergonomicamente Pensado para Você 👌"
|
|
|
|
|
26 |
|
27 |
+
**Cena 5: Comparação Visual (10s)**
|
28 |
+
- Animação comparando este relógio com modelos anteriores, destacando melhorias.
|
29 |
+
- Texto: "Mais Leve, Mais Forte, Mais Bonito 💪"
|
30 |
|
31 |
+
**Cena 6: Testemunhos de Clientes (10s)**
|
32 |
+
- Vídeo de clientes satisfeitos compartilhando experiências positivas.
|
33 |
+
- Texto: "Nossa Comunidade Ama! ⭐⭐⭐⭐⭐"
|
34 |
|
35 |
+
**Cena 7: Ofertas Especiais (10s)**
|
36 |
+
- Mostre ofertas especiais e descontos temporários.
|
37 |
+
- Texto: "Ofertas por Tempo Limitado! 🏷️"
|
38 |
|
39 |
+
**Cena 8: Chamada para Ação (5s)**
|
40 |
+
- Texto: "Clique Agora e Garanta o Seu! 📲"
|
41 |
+
- Botão destacado de "Compre Agora"
|
42 |
|
43 |
+
--
|
44 |
|
45 |
+
❤️ **Encerramento:**
|
46 |
+
"Relógios que vão além do tempo. Marque a diferença no seu pulso. 🌟🕰️"
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
+
---
|
49 |
|
50 |
+
**Detalhe Técnico:**
|
51 |
+
Este vídeo foi gerado utilizando o modelo de difusão de imagens para vídeo, envolvendo uma sequência de frames finamente ajustados para garantir consistência temporal e qualidade visual de alta definição (576x1024).
|
52 |
|
53 |
+
---
|
54 |
|
55 |
+
**Agradecimento:**
|
56 |
+
"Desenvolvido com amor pela [Sua Loja de Relógios]."
|
57 |
|
58 |
+
🌐 **Link para Compartilhar:**
|
59 |
+
https://seuwebsite.com/compreseurelogio
|
60 |
|
61 |
+
---
|
62 |
|
63 |
+
*Aviso: Vídeo gerado com tecnologia de ponta da Stability AI para uma experiência visual deslumbrante.*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|