ShareGPT4V-13B Model Card

Model details

Model type: ShareGPT4V-13B is an open-source chatbot trained by fine-tuning CLP vision tower and LLaMA/Vicuna on GPT4-Vision-assisted ShareGPT4V data and LLaVA instruction-tuning data.

Model date: ShareGPT4V-13B was trained in Nov 2023.

Paper or resources for more information: [Project] [Paper] [Code]

Usage

You can directly utilize this model as we provide in our [repository]. Moreover, you can modify the architecture name from "Share4VLlamaForCausalLM" to "LLaVALlamaForCausalLM" and the model_type keyword from "share4v" to "llava" in our config file and seamlessly load our model in the [LLaVA repository].

License

Intended use

Primary intended uses: The primary use of ShareGPT4V-13B is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

1.2M high-quality image-text pairs, i.e., ShareGPT4V-PT data
100K GPT4-Vision-generated image-text pairs
LLaVA instruction-tuning data

Evaluation dataset

A collection of 11 benchmarks

Downloads last month: 32

Collection including Lin-Chen/ShareGPT4V-13B

ShareGPT4V

Collection

7 items • Updated May 26, 2024 • 2

Paper for Lin-Chen/ShareGPT4V-13B

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18