AskYoutube's picture
Update README.md
beefb8e
|
raw
history blame
762 Bytes

AskVideos-7B-Instruct-v0.1

Model details

Model type: AskVideos-7B-Instruct-v0.1 is an open-source chatbot trained by fine-tuning a Video-LLaMA variant on additional video Q&A data. It uses a frozen Vicuna 7B v1.1 LLM to answer Video-Text queries and a frozen BLIP style image encoder. A video feature is derived from the encoded image using a video-QFormer and the result is projected onto the LLM space.

Acknowledgement This model is based on Video-LLaMA. Check out the original work here: https://github.com/DAMO-NLP-SG/Video-LLaMA

Github repo for demo:

License

Training dataset

  • 50K video synthetic Q&A pairs mined from videos.
  • Trained with 16 images sampled over 30s clips per Q&A pair.
  • Finetuned on Video-LLaAMA Vicuna 7B.