# Instruction data for [VideoChat](https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat) # :fire: Updates - **2023/05/11**: Release the **V1**: [Google Drive](https://drive.google.com/file/d/1C-7xmf42QUEi4ApXTcxBHr5nLvTWXyUi/view?usp=sharing) | [Aliyun OSS](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/data/videochat/videochat_instruct_11k.json) # :speech_balloon: V1: 7K detailed descriptions + 4K multi-turn conversations We build a video-centric multimodal instruction data based on WebVid-10M. The corresponding detailed descriptions and multi-turn conversations generations are produced by ChatGPT based on video text (aided by [**VideoChat-Text**](https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat_with_ChatGPT)) with several prompts concerning **spatiotemporal features**. Compared with detailed video descriptions, video conversations are introduced to further improve data diversity by introducing **temporal and casual features** in the video instruction data.