Rodeszones
/

CogVLM-grounding-generalist-hf-quant4

Object Detection

text-generation

4-bit precision

Model card Files Files and versions Community

Rodeszones commited on Jun 5

Commit

a732b65

•

1 Parent(s): 46bfbbf

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ pipeline_tag: object-detection
 CogVLM Grounding generalist model quantized with bitsandbytes 4 bit precision
-**CogVLM** is a powerful **open-source visual language model** (**VLM**). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and rank the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., **surpassing or matching PaLI-X 55B**. CogVLM can also [chat with you](http://36.103.203.44:7861/) about images.
 <div align="center">
     <img src="https://github.com/THUDM/CogVLM/raw/main/assets/metrics-min.png" alt="img" style="zoom: 50%;" />

 CogVLM Grounding generalist model quantized with bitsandbytes 4 bit precision
+**CogVLM** is a powerful **open-source visual language model** (**VLM**). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and rank the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., **surpassing or matching PaLI-X 55B**.
 <div align="center">
     <img src="https://github.com/THUDM/CogVLM/raw/main/assets/metrics-min.png" alt="img" style="zoom: 50%;" />