YipengZhang commited on
Commit
4985301
·
verified ·
1 Parent(s): 2c2750d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -13
README.md CHANGED
@@ -5,39 +5,36 @@ pipeline_tag: image-text-to-text
5
 
6
  <br>
7
 
8
- # LLaVA Model Card
9
 
10
  ## Model details
11
 
12
  **Model type:**
13
- LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data.
14
- It is an auto-regressive language model, based on the transformer architecture.
15
 
16
  **Model date:**
17
- LLaVA-v1.5-7B was trained in September 2023.
18
 
19
  **Paper or resources for more information:**
20
- https://llava-vl.github.io/
21
 
22
  ## License
23
  Llama 2 is licensed under the LLAMA 2 Community License,
24
  Copyright (c) Meta Platforms, Inc. All Rights Reserved.
25
 
26
  **Where to send questions or comments about the model:**
27
- https://github.com/haotian-liu/LLaVA/issues
28
 
29
  ## Intended use
30
  **Primary intended uses:**
31
- The primary use of LLaVA is research on large multimodal models and chatbots.
32
 
33
  **Primary intended users:**
34
  The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
35
 
36
  ## Training dataset
37
  - 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
38
- - 158K GPT-generated multimodal instruction-following data.
39
- - 450K academic-task-oriented VQA data mixture.
40
- - 40K ShareGPT data.
41
-
42
- ## Evaluation dataset
43
- A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction-following LMMs.
 
5
 
6
  <br>
7
 
8
+ # LLaVA-UHD v2 Model Card
9
 
10
  ## Model details
11
 
12
  **Model type:**
13
+ LLaVA-UHD v2, an advanced MLLM centered around a Hierarchical window transformer that enables capturing diverse visual granularity
14
+ by constructing and integrating a high resolution feature pyramid.
15
 
16
  **Model date:**
17
+ LLaVA-UHD v2 was trained in November 2024.
18
 
19
  **Paper or resources for more information:**
20
+ https://github.com/thunlp/LLaVA-UHD
21
 
22
  ## License
23
  Llama 2 is licensed under the LLAMA 2 Community License,
24
  Copyright (c) Meta Platforms, Inc. All Rights Reserved.
25
 
26
  **Where to send questions or comments about the model:**
27
+ https://github.com/thunlp/LLaVA-UHD/issues
28
 
29
  ## Intended use
30
  **Primary intended uses:**
31
+ The primary use of LLaVA-UHD v2 is research on large multimodal models and chatbots.
32
 
33
  **Primary intended users:**
34
  The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
35
 
36
  ## Training dataset
37
  - 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
38
+ - JBU Pretrain: MS-COCO stuff 2017
39
+ - Pretrain: LLaVA-Pretrain 558K (filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.)
40
+ - SFT: 858k-mixed dataset in https://huggingface.co/datasets/YipengZhang/LLaVA-UHD-v2-SFT-Data