|
--- |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- WenhaoWang/VidProM |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- text-to-video generation |
|
- VidProM |
|
- Automatical text-to-video prompt |
|
--- |
|
|
|
|
|
# The first model for automatic text-to-video prompt completion: Given a few words as input, the model will generate a few whole text-to-video prompts. |
|
|
|
# Details |
|
|
|
It is fine-tuned on the [VidProM](https://huggingface.co/datasets/WenhaoWang/VidProM) dataset using [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and 8 A100 80G GPUs. |
|
|
|
# Usage |
|
|
|
## Download the model |
|
``` |
|
from transformers import pipeline |
|
pipe = pipeline("text-generation", model="WenhaoWang/Meta-Llama-3-8B-AutoT2VPrompt", model_kwargs={"torch_dtype": torch.bfloat16}, device_map="cuda:0") |
|
``` |
|
|
|
## Set the Parameters |
|
``` |
|
input = "An underwater world" # The input text to generate text-to-video prompt. |
|
max_length = 50 # The maximum length of the generated text. |
|
temperature = 1.2 # Controls the randomness of the generation. Higher values lead to more random outputs. |
|
top_k = 8 # Limits the number of words considered at each step to the top k most likely words. |
|
num_return_sequences = 10 # The number of different text-to-video prompts to generate from the same input. |
|
``` |
|
|
|
## Generation |
|
``` |
|
all_prompts = pipe(input, max_length = max_length, do_sample = True, temperature = temperature, top_k = top_k, num_return_sequences=num_return_sequences) |
|
|
|
def process(text): |
|
text = text.replace('\n', '.') |
|
text = text.replace(' .', '.') |
|
text = text[:text.rfind('.')] |
|
text = text + '.' |
|
return text |
|
|
|
for i in range(num_return_sequences): |
|
print(process(all_prompts[i]['generated_text'])) |
|
``` |
|
|
|
You will get 10 text-to-video prompts, and you can pick one you like most. |
|
|
|
``` |
|
An underwater world of blue wonders. A vibrant Coral Gden sways with shades of aquamine. A Clownfish dances, while a Turtle leisurely glides by. |
|
An underwater world full of colorful fish and coral formations.the sun rising over a field of corn ne a fm house on a beautiful morning.a woman is looking at vr controllers and trying to choose which one to choose, . |
|
An underwater world teeming with vious unique mine creatures. Schools of fish gracefully swim among the colorful coral reefs and seaweed, creating a stunning underwater landscape. |
|
An underwater world with a beautiful mermaid swimming in cle water and sunlight passing through the surface..the most beatuful view on the eth. |
|
An underwater world teeming with a rainbow of coral reefs, swaying gently in the sea currents, surrounded by vibrant schools of tropical fish creating a stunning visual feast. |
|
An underwater world filled with a rainbow fish and a sea turtle swiming.A woman walks in to a room where her child is sleeping. She leans over to check on the child. The child then wakes up.. |
|
An underwater world teeming with colorful creatures and vibrant coral reefs..a beautiful lady, big black eyes, with a white man bun hairstyle, weing a black professional attire, standing front and center, with a black background . |
|
An underwater world with colorful coral reefs and a viety of sea creatures, all living together in hmony..a girl weing headphones listening to music at a dk coffee cafe at nighttime -camera zoom out - 10. |
|
An underwater world full of mine life and corals, in the style of 8k 3d, photorealistic scenes, crystal cle water, mine and sea flora motifs, high details, glistening water effects, vibrant mine life, H. |
|
An underwater world of vibrant coral reefs teeming with schools of tropical fish, creating a mesmerizing display of colors and movement beneath the azure waves. |
|
``` |
|
|
|
# License |
|
|
|
The model is licensed under the [CC BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en), and you should also follow the [license](https://llama.meta.com/llama3/license/) and [Agreement](https://huggingface.co/meta-llama/Meta-Llama-3-8B) from Meta AI. |
|
|
|
# Citation |
|
``` |
|
@article{wang2024vidprom, |
|
title={VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models}, |
|
author={Wang, Wenhao and Yang, Yi}, |
|
journal={arXiv preprint arXiv:2403.06098}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
# Acknowledgment |
|
|
|
The fine-tuning process is helped by [Yaowei Zheng](https://github.com/hiyouga). |
|
|
|
# Contact |
|
|
|
If you have any questions, feel free to contact [Wenhao Wang](https://wangwenhao0716.github.io) (wangwenhao0716@gmail.com). |
|
|