vidore
/

colpali-v1.2-merged

vidore-experimental

Model card Files Files and versions Community

tonywu71 commited on 5 days ago

Commit

a4cebd7

•

1 Parent(s): c69dd0d

Update README.md

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -18,13 +18,6 @@ It was introduced in the paper [ColPali: Efficient Document Retrieval with Visio
 <p align="center"><img width=800 src="https://github.com/illuin-tech/colpali/blob/main/assets/colpali_architecture.webp?raw=true"/></p>
-## Version specificity
-> [!NOTE]
-> This version is similar to [`vidore/colpali-v1.2`](https://huggingface.co/vidore/colpali-v1.2), except that the LoRA adapter was merged into the base model. Thus, loading ColPali from this checkpoint saves you the trouble of merging the pre-trained adapter yourself.
->
-> This can be useful if you want to train a new adapter from scratch.
 ## Model Description
 This model is built iteratively starting from an off-the-shelf [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) model.
@@ -33,6 +26,13 @@ We finetuned it to create [BiSigLIP](https://huggingface.co/vidore/bisiglip) and
 One benefit of inputting image patch embeddings through a language model is that they are natively mapped to a latent space similar to textual input (query).
 This enables leveraging the [ColBERT](https://arxiv.org/abs/2004.12832) strategy to compute interactions between text tokens and image patches, which enables a step-change improvement in performance compared to BiPali.
 ## Model Training
 ### Dataset

 <p align="center"><img width=800 src="https://github.com/illuin-tech/colpali/blob/main/assets/colpali_architecture.webp?raw=true"/></p>
 ## Model Description
 This model is built iteratively starting from an off-the-shelf [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) model.
 One benefit of inputting image patch embeddings through a language model is that they are natively mapped to a latent space similar to textual input (query).
 This enables leveraging the [ColBERT](https://arxiv.org/abs/2004.12832) strategy to compute interactions between text tokens and image patches, which enables a step-change improvement in performance compared to BiPali.
+## Version specificity
+> [!NOTE]
+> This version is similar to [`vidore/colpali-v1.2`](https://huggingface.co/vidore/colpali-v1.2), except that the LoRA adapter was merged into the base model. Thus, loading ColPali from this checkpoint saves you the trouble of merging the pre-trained adapter yourself.
+>
+> This can be useful if you want to train a new adpter from scratch.
 ## Model Training
 ### Dataset