|
--- |
|
license: llama2 |
|
--- |
|
|
|
# Model Card for FFMPerative-7B |
|
|
|
## Model Details |
|
|
|
This is a Llama 2 7B Large Language Model (LLM), fine-tuned specifically to automate video production workflows. |
|
It is designed to interact with FFMPerative, a tool that leverages machine learning and the FFmpeg software suite to perform a variety of |
|
video editing tasks using natural language input. |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Developed by:** [remyx.ai] |
|
- **Model type:** [LlaMA2-7B] |
|
- **License:** [Meta] |
|
- **Finetuned from model [optional]:** [LlaMA2] |
|
|
|
## Uses |
|
|
|
The main use case for this model is to assist in video editing tasks. |
|
Users can leverage it to execute commands in natural language to FFMPerative for tasks such as cropping, resizing, rotating videos, |
|
making gifs, adjusting audio levels, and many more. The model can be particularly useful for people without technical skills, |
|
helping them interact with complex video editing tasks in a simplified, user-friendly manner. |
|
|
|
This checkpoint was fine-tuned on a subset of `HuggingFaceH4/CodeAlpaca_20K` augmented with 500 instances of FFMPerative Tool composition for |
|
practical video editing workflows. |
|
|
|
The training instances are based on various video editing tasks and their corresponding commands in FFMPerative, with example questions and |
|
answers demonstrating the interaction between a user and the video editing tool. |
|
Please refer to the GitHub repository readme for more examples of the training data used. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
Please note that this model is designed for English language inputs and may not perform well with inputs in other languages. |
|
Although this model can interpret and execute a wide range of commands, it might sometimes struggle with ambiguous instructions, |
|
complex sequences of commands, or instructions for tasks that are not included in its training data. |
|
|
|
Please ensure you double-check the output of the model for critical tasks, and remember that it won't replace professional |
|
video editors for more advanced video editing workflows. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started. You can instantiate a local agent and pass additional tools: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, LocalAgent, load_tool |
|
|
|
model = AutoModelForCausalLM.from_pretrained("remyxai/ffmperative", |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16, |
|
rope_scaling={"type": "dynamic", "factor": 2.0}, |
|
load_in_8bit=True) |
|
tokenizer = AutoTokenizer.from_pretrained("remyxai/ffmperative") |
|
|
|
# More tools in our spaces: https://huggingface.co/remyxai |
|
tools = [load_tool("remyxai/video-compression-tool"), load_tool("remyxai/video-frame-sample-tool")] |
|
|
|
|
|
agent = LocalAgent(model, tokenizer, additional_tools=tools) |
|
agent.run("Compress my video '/path/to/vid.mp4' and save it to '/path/to/compressed_vid.mp4'") |
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Training data is a combination of [HuggingFaceH4/CodeAlpaca_20K](https://huggingface.co/datasets/HuggingFaceH4/CodeAlpaca_20K) and our custom generated data reflecting the tools available in ffmperative - [remyxai/ffmperative](https://huggingface.co/datasets/remyxai/ffmperative) |
|
|
|
|
|
### Training Procedure |
|
|
|
Using Parameter Efficient Fine-Tuning (PEFT), according to this [guide](https://huggingface.co/blog/llama2#fine-tuning-with-peft), we fine-tuned |
|
LlaMA2 with this [script](https://github.com/lvwerra/trl/blob/main/examples/scripts/sft_trainer.py). |
|
|
|
## Evaluation |
|
|
|
We evaluated the model performance by measuring its ability to accurately interpret and execute video editing commands. |
|
Due to the proprietary nature of the evaluation process, specific metrics are not available. |
|
|
|
The model generally performs well, but please report any inconsistencies or errors you encounter when using the model. |
|
We appreciate your feedback and will use it to improve the model further. |
|
|
|
|
|
### Model Architecture and Objective |
|
|
|
Meta's LlaMA2-7B |
|
|
|
## Citation [optional] |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |
|
|
|
**APA:** |
|
|
|
[More Information Needed] |
|
|
|
## Glossary [optional] |
|
|
|
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> |
|
|
|
[More Information Needed] |
|
|
|
## More Information [optional] |
|
|
|
[More Information Needed] |
|
|
|
## Model Card Authors [optional] |
|
|
|
[More Information Needed] |
|
|
|
## Model Card Contact |
|
|
|
[More Information Needed] |
|
|
|
|
|
|