Rodeszones
commited on
Commit
•
a732b65
1
Parent(s):
46bfbbf
Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ pipeline_tag: object-detection
|
|
8 |
|
9 |
CogVLM Grounding generalist model quantized with bitsandbytes 4 bit precision
|
10 |
|
11 |
-
**CogVLM** is a powerful **open-source visual language model** (**VLM**). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and rank the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., **surpassing or matching PaLI-X 55B**.
|
12 |
|
13 |
<div align="center">
|
14 |
<img src="https://github.com/THUDM/CogVLM/raw/main/assets/metrics-min.png" alt="img" style="zoom: 50%;" />
|
|
|
8 |
|
9 |
CogVLM Grounding generalist model quantized with bitsandbytes 4 bit precision
|
10 |
|
11 |
+
**CogVLM** is a powerful **open-source visual language model** (**VLM**). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and rank the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., **surpassing or matching PaLI-X 55B**.
|
12 |
|
13 |
<div align="center">
|
14 |
<img src="https://github.com/THUDM/CogVLM/raw/main/assets/metrics-min.png" alt="img" style="zoom: 50%;" />
|