nnethercott's picture
Adding Evaluation Results (#1)
dba92cd verified
---
license: llama2
model-index:
- name: llava-v1.5-7b-hf-vicuna
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 52.65
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nnethercott/llava-v1.5-7b-hf-vicuna
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 76.09
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nnethercott/llava-v1.5-7b-hf-vicuna
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 51.68
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nnethercott/llava-v1.5-7b-hf-vicuna
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 45.86
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nnethercott/llava-v1.5-7b-hf-vicuna
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 72.06
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nnethercott/llava-v1.5-7b-hf-vicuna
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 15.31
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nnethercott/llava-v1.5-7b-hf-vicuna
name: Open LLM Leaderboard
---
## Model details
**Motivation**
This models contains the fine-tuned weights from `llava-hf/llava-1.5-7b-hf` so LLM benchmarking can be done.
**Model type:**
LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data.
It is an auto-regressive language model, based on the transformer architecture.
## License
Llama 2 is licensed under the LLAMA 2 Community License,
Copyright (c) Meta Platforms, Inc. All Rights Reserved.
## Training dataset
- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
- 158K GPT-generated multimodal instruction-following data.
- 450K academic-task-oriented VQA data mixture.
- 40K ShareGPT data.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_nnethercott__llava-v1.5-7b-hf-vicuna)
| Metric |Value|
|---------------------------------|----:|
|Avg. |52.28|
|AI2 Reasoning Challenge (25-Shot)|52.65|
|HellaSwag (10-Shot) |76.09|
|MMLU (5-Shot) |51.68|
|TruthfulQA (0-shot) |45.86|
|Winogrande (5-shot) |72.06|
|GSM8k (5-shot) |15.31|