U4R
/

ChartVLM-large

Transformers

Inference Endpoints

Model card Files Files and versions Community

BoZhang commited on Jan 31, 2024

Commit

ad8ee7f

verified ·

1 Parent(s): db3c061

Upload README.md

Browse files

Files changed (1) hide show

README.md +96 -0

README.md ADDED Viewed

	@@ -0,0 +1,96 @@

+# ChartX & ChartVLM
+所提出的ChartX有两个主要贡献：
+- (1) 为了全面评价目前多模态大模型在图表领域的能力，ChartX包含了多模态（图片，代码，CSV统计数据，文本描述），多任务（感知，图表信息提取，图表问答，图表描述，图表总结，图表重新渲染等），多学科（包含22个大类学科）的42K个高质量数据，并且我们验证了目前很多多模态大模型在ChartX评测集上的全面性能。
+- (2) ChartVLM作为在图表领域定制化开发的大模型，其极大增加了大模型在图表问答等复杂推理任务上的可解释性，利用Instruction Adapter来根据用户指令动态选择需要执行的任务以及对应的模型模块。
+ChartX presents two primary contributions.
+- (1) To comprehensively and rigorously benchmark the ability of the off-the-shelf MLLMs in chart domain, we construct an evaluation set covering multi-modal (image, code, csv, text description), multi-task, multi-disciplinary, high-quality chart data, and evaluate the performance of mainstream MLLMs.
+- (2) We develop ChartVLM, offering a new perspective on handling the multi-modal tasks that strongly depend on interpretable patterns such as reasoning tasks in the field of chart or geometric images.
+<div align=center>
+<img src="https://github.com/UniModal4Reasoning/ChartVLM/blob/main/assets/motivation.png" height="85%">
+</div>
+------------------------
+## 评估集介绍（Overall of Evaluation Set）
+We collected 48K multi-modal chart data covering **22 topics**, **18 chart types**, and **7 tasks**. Each chart data within this dataset includes four modalities: image, CSV, python code, and text description.
+## ChartX下载（ChartX Download）
+<details>
+<summary> Data Download</summary>
+Please download the official [ChartX Evaluation Set](https://drive.google.com/file/d/1d6zyH3kIwgepTqR0fc67xzyUtblrvOIX/view?usp=sharing) dataset and organize the downloaded files as follows:
+```
+ChartX
+├── 3D-Bar
+│   ├── code
+|   ├── csv
+|   ├── png
+|   ├── txt
+├── area_chart
+│   ├── code
+|   ├── csv
+|   ├── png
+|   ├── txt
+....
+....
+├── rose
+│   ├── code
+|   ├── csv
+|   ├── png
+|   ├── txt
+```
+</details>
+<details>
+<summary> Visualization of Data Distribution</summary>
+<div align=center>
+<img src="https://github.com/UniModal4Reasoning/ChartVLM/blob/main/assets/tsne.png" height="85%">
+</div>
+</details>
+------------------------
+<div align="center">
+<h1>ChartVLM<br></h1>
+</div>
+## ChartVLM介绍（ChartVLM Overall）:
+- **(1)** To enhance the interpretability of the chart model in cognition tasks (e.g. answer questions based on chart image), ChartVLM first performs the base perception task (e.g. structural extraction from the given chart image to a predicted CSV data), and then, finishes other cognition tasks (e.g. chart redrawing, description, summary, and QA) based on the extracted structural data.
+- **(2)** To choose the task that users expect to perform according to the prompts they used, the instruction adapter is designed, which can cover a variety of user instructions as illustrated in this figure
+<div align=center>
+<img src="https://github.com/UniModal4Reasoning/ChartVLM/blob/main/assets/chartvlm.png" height="85%">
+</div>
+## 快速开始（Qiuckstart）
+### 依赖项安装（Dependencies）
+```base
+pip install torch==2.1.0 transformers==4.35.0 accelerate==0.24.1 sentencepiece==0.1.99 einops==0.7.0 xformers==0.0.22.post7 triton==2.1.0
+```
+## 代码示例（example）
+```
+from tools.ChartVLM import infer_ChartVLM
+if __name__ == '__main__':
+    model = '${PATH_TO_PRETRAINED_MODEL}/ChartVLM/base/'  #${PATH_TO_PRETRAINED_MODEL}
+    image = './base_decoder/train/data/test.png'
+    text = 'who has the largest value?'
+    output = infer_ChartVLM(image, text, model)
+    print(output)
+```