metadata

license: llama2

Model Card for FFMPerative-7B

Model Details

This is a Llama 2 7B Large Language Model (LLM), fine-tuned specifically to automate video production workflows. It is designed to interact with FFMPerative, a tool that leverages machine learning and the FFmpeg software suite to perform a variety of video editing tasks using natural language input.

Model Description

Developed by: [remyx.ai]
Model type: [LlaMA2-7B]
License: [Meta]
Finetuned from model [optional]: [LlaMA2]

Uses

The main use case for this model is to assist in video editing tasks. Users can leverage it to execute commands in natural language to FFMPerative for tasks such as cropping, resizing, rotating videos, making gifs, adjusting audio levels, and many more. The model can be particularly useful for people without technical skills, helping them interact with complex video editing tasks in a simplified, user-friendly manner.

This checkpoint was fine-tuned on a subset of HuggingFaceH4/CodeAlpaca_20K augmented with 500 instances of FFMPerative Tool composition for practical video editing workflows.

The training instances are based on various video editing tasks and their corresponding commands in FFMPerative, with example questions and answers demonstrating the interaction between a user and the video editing tool. Please refer to the GitHub repository readme for more examples of the training data used.

Bias, Risks, and Limitations

Please note that this model is designed for English language inputs and may not perform well with inputs in other languages. Although this model can interpret and execute a wide range of commands, it might sometimes struggle with ambiguous instructions, complex sequences of commands, or instructions for tasks that are not included in its training data.

Please ensure you double-check the output of the model for critical tasks, and remember that it won't replace professional video editors for more advanced video editing workflows.

How to Get Started with the Model

Use the code below to get started. You can instantiate a local agent and pass additional tools:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, LocalAgent, load_tool

model = AutoModelForCausalLM.from_pretrained("remyxai/ffmperative",
                                              device_map="auto",
                                              torch_dtype=torch.bfloat16,
                                              rope_scaling={"type": "dynamic", "factor": 2.0},
                                              load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("remyxai/ffmperative")

# More tools in our spaces: https://huggingface.co/remyxai
tools = [load_tool("remyxai/video-compression-tool"), load_tool("remyxai/video-frame-sample-tool")] 


agent = LocalAgent(model, tokenizer, additional_tools=tools)
agent.run("Compress my video '/path/to/vid.mp4' and save it to '/path/to/compressed_vid.mp4'")

Training Details

Training Data

Training data is a combination of HuggingFaceH4/CodeAlpaca_20K and our custom generated data reflecting the tools available in ffmperative - remyxai/ffmperative

Training Procedure

Using Parameter Efficient Fine-Tuning (PEFT), according to this guide, we fine-tuned LlaMA2 with this script.

Evaluation

We evaluated the model performance by measuring its ability to accurately interpret and execute video editing commands. Due to the proprietary nature of the evaluation process, specific metrics are not available.

The model generally performs well, but please report any inconsistencies or errors you encounter when using the model. We appreciate your feedback and will use it to improve the model further.

Model Architecture and Objective

Meta's LlaMA2-7B

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]