Model Card for FFMPerative-7B
Model Details
This is a Llama 2 7B Large Language Model (LLM), fine-tuned specifically to automate video production workflows. It is designed to interact with FFMPerative, a tool that leverages machine learning and the FFmpeg software suite to perform a variety of video editing tasks using natural language input.
Model Description
- Developed by: [remyx.ai]
- Model type: [LlaMA2-7B]
- License: [Meta]
- Finetuned from model [optional]: [LlaMA2]
Uses
The main use case for this model is to assist in video editing tasks. Users can leverage it to execute commands in natural language to FFMPerative for tasks such as cropping, resizing, rotating videos, making gifs, adjusting audio levels, and many more. The model can be particularly useful for people without technical skills, helping them interact with complex video editing tasks in a simplified, user-friendly manner.
This checkpoint was fine-tuned on a subset of HuggingFaceH4/CodeAlpaca_20K
augmented with 500 instances of FFMPerative Tool composition for
practical video editing workflows.
The training instances are based on various video editing tasks and their corresponding commands in FFMPerative, with example questions and answers demonstrating the interaction between a user and the video editing tool. Please refer to the GitHub repository readme for more examples of the training data used.
Bias, Risks, and Limitations
Please note that this model is designed for English language inputs and may not perform well with inputs in other languages. Although this model can interpret and execute a wide range of commands, it might sometimes struggle with ambiguous instructions, complex sequences of commands, or instructions for tasks that are not included in its training data.
Please ensure you double-check the output of the model for critical tasks, and remember that it won't replace professional video editors for more advanced video editing workflows.
How to Get Started with the Model
Use the code below to get started. You can instantiate a local agent and pass additional tools:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, LocalAgent, load_tool
model = AutoModelForCausalLM.from_pretrained("remyxai/ffmperative",
device_map="auto",
torch_dtype=torch.bfloat16,
rope_scaling={"type": "dynamic", "factor": 2.0},
load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("remyxai/ffmperative")
# More tools in our spaces: https://huggingface.co/remyxai
tools = [load_tool("remyxai/video-compression-tool"), load_tool("remyxai/video-frame-sample-tool")]
agent = LocalAgent(model, tokenizer, additional_tools=tools)
agent.run("Compress my video '/path/to/vid.mp4' and save it to '/path/to/compressed_vid.mp4'")
Training Details
Training Data
Training data is a combination of HuggingFaceH4/CodeAlpaca_20K and our custom generated data reflecting the tools available in ffmperative - remyxai/ffmperative
Training Procedure
Using Parameter Efficient Fine-Tuning (PEFT), according to this guide, we fine-tuned LlaMA2 with this script.
Evaluation
We evaluated the model performance by measuring its ability to accurately interpret and execute video editing commands. Due to the proprietary nature of the evaluation process, specific metrics are not available.
The model generally performs well, but please report any inconsistencies or errors you encounter when using the model. We appreciate your feedback and will use it to improve the model further.
Model Architecture and Objective
Meta's LlaMA2-7B
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 0