Tongyi-ConvAI
/

MMEvol-LLaMA3-8B

Visual Question Answering

Model card Files Files and versions Community

MMEvol-LLaMA3-8B / README.md

haonanzhang's picture

Update README.md

cb4013f verified 16 days ago

|

1.74 kB

	---
	license: apache-2.0
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	pipeline_tag: visual-question-answering
	---

	# MMEvol Model Card

	## Model Details

	Here are the pretrained weights and instruction tuning weights
	\| Model \| Pretrained Projector \| Base LLM \| PT Data \| IT Data \| Download \|
	\| ---------------- \| -------------------- \| --------- \| ------------------------------------------------------------ \| ------- \| -------- \|
	\| MMEvol-LLaMA3-8B \| [mm_projector](https://huggingface.co/Tongyi-ConvAI/MMEvol/tree/main/llama3) \| LLaMA3-8B \| [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) \| MMEvol \| [ckpt](https://huggingface.co/Tongyi-ConvAI/MMEvol/tree/main/llama3)\|

	## Performance

	### VLMEvalKit Support (OpenCompass)

	\| Model \| MME_C \| MMStar \| HallBench \| MathVista_mini \| MMMU_val \| AI2D \| POPE \| BLINK \| RWQA \|
	\| ---------------- \| ----- \| ------ \| --------- \| -------------- \| -------- \| ---- \| ---- \| ----- \| ---- \|
	\| MMEvol-LLaMA3-8B \| 47.8 \| 50.1 \| 62.3 \| 50.0 \| 40.8 \| 73.9 \| 86.8 \| 46.4 \| 62.6 \|

	### VLMEvalKit Not Support (VQADataSet)

	\| Model \| VQA_v2 \| GQA \| MIA \| MMSInst \|
	\| ---------------- \| ------ \| ---- \| ---- \| ------- \|
	\| MMEvol-LLaMA3-8B \| 83.4 \| 65.0 \| 78.8 \| 32.3 \|


	## Paper or resources for more information
	- Page: https://mmevol.github.io/
	- arXiv: https://arxiv.org/pdf/2409.05840

	## License
	Llama 3 is licensed under the LLAMA 3 Community License,
	Copyright (c) Meta Platforms, Inc. All Rights Reserved.

	## Contact us if you have any questions

	- Run Luo — r.luo@siat.ac.cn
	- Haonan Zhang — zchiowal@gmail.com