ColPali
Safetensors
English
paligemma
vidore-experimental
tonywu71 commited on
Commit
a4cebd7
1 Parent(s): c69dd0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -18,13 +18,6 @@ It was introduced in the paper [ColPali: Efficient Document Retrieval with Visio
18
 
19
  <p align="center"><img width=800 src="https://github.com/illuin-tech/colpali/blob/main/assets/colpali_architecture.webp?raw=true"/></p>
20
 
21
- ## Version specificity
22
-
23
- > [!NOTE]
24
- > This version is similar to [`vidore/colpali-v1.2`](https://huggingface.co/vidore/colpali-v1.2), except that the LoRA adapter was merged into the base model. Thus, loading ColPali from this checkpoint saves you the trouble of merging the pre-trained adapter yourself.
25
- >
26
- > This can be useful if you want to train a new adapter from scratch.
27
-
28
  ## Model Description
29
 
30
  This model is built iteratively starting from an off-the-shelf [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) model.
@@ -33,6 +26,13 @@ We finetuned it to create [BiSigLIP](https://huggingface.co/vidore/bisiglip) and
33
  One benefit of inputting image patch embeddings through a language model is that they are natively mapped to a latent space similar to textual input (query).
34
  This enables leveraging the [ColBERT](https://arxiv.org/abs/2004.12832) strategy to compute interactions between text tokens and image patches, which enables a step-change improvement in performance compared to BiPali.
35
 
 
 
 
 
 
 
 
36
  ## Model Training
37
 
38
  ### Dataset
 
18
 
19
  <p align="center"><img width=800 src="https://github.com/illuin-tech/colpali/blob/main/assets/colpali_architecture.webp?raw=true"/></p>
20
 
 
 
 
 
 
 
 
21
  ## Model Description
22
 
23
  This model is built iteratively starting from an off-the-shelf [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) model.
 
26
  One benefit of inputting image patch embeddings through a language model is that they are natively mapped to a latent space similar to textual input (query).
27
  This enables leveraging the [ColBERT](https://arxiv.org/abs/2004.12832) strategy to compute interactions between text tokens and image patches, which enables a step-change improvement in performance compared to BiPali.
28
 
29
+ ## Version specificity
30
+
31
+ > [!NOTE]
32
+ > This version is similar to [`vidore/colpali-v1.2`](https://huggingface.co/vidore/colpali-v1.2), except that the LoRA adapter was merged into the base model. Thus, loading ColPali from this checkpoint saves you the trouble of merging the pre-trained adapter yourself.
33
+ >
34
+ > This can be useful if you want to train a new adpter from scratch.
35
+
36
  ## Model Training
37
 
38
  ### Dataset