ShareGPT4V-13B Model Card

Model details

Model type: ShareGPT4V-13B is an open-source chatbot trained by fine-tuning CLP vision tower and LLaMA/Vicuna on GPT4-Vision-assisted ShareGPT4V data and LLaVA instruction-tuning data.

Model date: ShareGPT4V-13B was trained in Nov 2023.

Paper or resources for more information: [Project] [Paper] [Code]

Usage

You can directly utilize this model as we provide in our [repository]. Moreover, you can modify the architecture name from "Share4VLlamaForCausalLM" to "LLaVALlamaForCausalLM" and the model_type keyword from "share4v" to "llava" in our config file and seamlessly load our model in the [LLaVA repository].

License

Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

Intended use

Primary intended uses: The primary use of ShareGPT4V-13B is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

  • 1.2M high-quality image-text pairs, i.e., ShareGPT4V-PT data
  • 100K GPT4-Vision-generated image-text pairs
  • LLaVA instruction-tuning data

Evaluation dataset

A collection of 11 benchmarks

Downloads last month
183
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Collection including Lin-Chen/ShareGPT4V-13B