Update README.md
Browse files
README.md
CHANGED
@@ -18,13 +18,6 @@ It was introduced in the paper [ColPali: Efficient Document Retrieval with Visio
|
|
18 |
|
19 |
<p align="center"><img width=800 src="https://github.com/illuin-tech/colpali/blob/main/assets/colpali_architecture.webp?raw=true"/></p>
|
20 |
|
21 |
-
## Version specificity
|
22 |
-
|
23 |
-
> [!NOTE]
|
24 |
-
> This version is similar to [`vidore/colpali-v1.2`](https://huggingface.co/vidore/colpali-v1.2), except that the LoRA adapter was merged into the base model. Thus, loading ColPali from this checkpoint saves you the trouble of merging the pre-trained adapter yourself.
|
25 |
-
>
|
26 |
-
> This can be useful if you want to train a new adapter from scratch.
|
27 |
-
|
28 |
## Model Description
|
29 |
|
30 |
This model is built iteratively starting from an off-the-shelf [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) model.
|
@@ -33,6 +26,13 @@ We finetuned it to create [BiSigLIP](https://huggingface.co/vidore/bisiglip) and
|
|
33 |
One benefit of inputting image patch embeddings through a language model is that they are natively mapped to a latent space similar to textual input (query).
|
34 |
This enables leveraging the [ColBERT](https://arxiv.org/abs/2004.12832) strategy to compute interactions between text tokens and image patches, which enables a step-change improvement in performance compared to BiPali.
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
## Model Training
|
37 |
|
38 |
### Dataset
|
|
|
18 |
|
19 |
<p align="center"><img width=800 src="https://github.com/illuin-tech/colpali/blob/main/assets/colpali_architecture.webp?raw=true"/></p>
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
## Model Description
|
22 |
|
23 |
This model is built iteratively starting from an off-the-shelf [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) model.
|
|
|
26 |
One benefit of inputting image patch embeddings through a language model is that they are natively mapped to a latent space similar to textual input (query).
|
27 |
This enables leveraging the [ColBERT](https://arxiv.org/abs/2004.12832) strategy to compute interactions between text tokens and image patches, which enables a step-change improvement in performance compared to BiPali.
|
28 |
|
29 |
+
## Version specificity
|
30 |
+
|
31 |
+
> [!NOTE]
|
32 |
+
> This version is similar to [`vidore/colpali-v1.2`](https://huggingface.co/vidore/colpali-v1.2), except that the LoRA adapter was merged into the base model. Thus, loading ColPali from this checkpoint saves you the trouble of merging the pre-trained adapter yourself.
|
33 |
+
>
|
34 |
+
> This can be useful if you want to train a new adpter from scratch.
|
35 |
+
|
36 |
## Model Training
|
37 |
|
38 |
### Dataset
|