andreer commited on
Commit
f612eb3
1 Parent(s): 2f08c82

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +19 -211
  2. backend/vespa_app.py +11 -3
  3. main.py +2 -1
  4. static/full_images/012200fb9d07b72da276db7e9d499a2f.jpg +0 -0
  5. static/full_images/088177a2b300ed9e9cd8d936edb7b5ca.jpg +0 -0
  6. static/full_images/0d9a8f0fa595513774d94d6058aa3c27.jpg +0 -0
  7. static/full_images/0f2073449af49b9a766d269bda7aa2ec.jpg +0 -0
  8. static/full_images/15610e06e17b391f9d6212d4a704d1c8.jpg +0 -0
  9. static/full_images/176768b9dda73154bbbe67983b07ef7b.jpg +0 -0
  10. static/full_images/189aa5523cce7b3e37b336aa1327bae8.jpg +0 -0
  11. static/full_images/23b1e0404920ce0f0257eb0e8190c557.jpg +0 -0
  12. static/full_images/2c5ab4a0f12fa673c17f0283f698bdb9.jpg +0 -0
  13. static/full_images/2e70dc0cf1e35b3f11f6d94017a17110.jpg +0 -0
  14. static/full_images/31b6ab7aae4796b0d6200380b05b25f0.jpg +0 -0
  15. static/full_images/37002ca88b2a5600b5ec7dfd03900d84.jpg +0 -0
  16. static/full_images/3cd17414305381977eae7ff3d30de6dc.jpg +0 -0
  17. static/full_images/3d0c05ab1db804e856a25f473dc5149d.jpg +0 -0
  18. static/full_images/41ff192952c5e93adebbc1852c748985.jpg +0 -0
  19. static/full_images/42adeab8461f18a692708f37444d4dee.jpg +0 -0
  20. static/full_images/44bce8cd76f96db9be93d366c2a36fd3.jpg +0 -0
  21. static/full_images/45c159551a8840a4528eb2034d562abd.jpg +0 -0
  22. static/full_images/46a9cf839d8c07f740c88020e1459d2f.jpg +0 -0
  23. static/full_images/473639db0c76b6e80664775066bd2a7b.jpg +0 -0
  24. static/full_images/47b81a110672e649487aeb708f1277d5.jpg +0 -0
  25. static/full_images/4af459669551f34d1542e9734c1722db.jpg +0 -0
  26. static/full_images/4f7a37046626de54fdf189106371eea2.jpg +0 -0
  27. static/full_images/5257e0dc94a52bbf0dff16f4b481772b.jpg +0 -0
  28. static/full_images/5466c54401a63372bda9bf0b8586ced9.jpg +0 -0
  29. static/full_images/547f1ed03149aee493b22a192146b651.jpg +0 -0
  30. static/full_images/5913b08b81182be02056225d19225039.jpg +0 -0
  31. static/full_images/64060791c89655d280a6a01eea8a8a47.jpg +0 -0
  32. static/full_images/645d0f26220a092ffc0de138c72927a8.jpg +0 -0
  33. static/full_images/694dfbca150f4cac83e27d17f0b04bbd.jpg +0 -0
  34. static/full_images/6a14ac1f65b59b0cfdc65ecc6bd57b55.jpg +0 -0
  35. static/full_images/6b1f502ac819b8a5ba2e65f8e56f22b8.jpg +0 -0
  36. static/full_images/7051b5a4804d48045fff84866923d34a.jpg +0 -0
  37. static/full_images/7364ccb5e7ff145e56014d70ec0295b4.jpg +0 -0
  38. static/full_images/743dd0b209db20acac00090e34f66d1e.jpg +0 -0
  39. static/full_images/7485ce436c35173ee7c379ad67754ae7.jpg +0 -0
  40. static/full_images/74f26585a4ecb41de76e8cb906811244.jpg +0 -0
  41. static/full_images/75a1a2c786e848e7e043f2c3b759ad73.jpg +0 -0
  42. static/full_images/76a4708381e45c35b33793dff2791c7c.jpg +0 -0
  43. static/full_images/7898c6c68d78f85ed8d8d92c406a3850.jpg +0 -0
  44. static/full_images/7901d8397e67307fa7ccbac2e1dcaba4.jpg +0 -0
  45. static/full_images/7a1546e2ad050e8a915cbcbfae7a991f.jpg +0 -0
  46. static/full_images/7cf284e8ccc5a68b4a92eca9b1176707.jpg +0 -0
  47. static/full_images/83929419be13d5a71b23bfd71344fe0b.jpg +0 -0
  48. static/full_images/8403e1a2fdb89eb1e96227427c8b833c.jpg +0 -0
  49. static/full_images/8554ad2b2ea96d1f363f0470fff572d5.jpg +0 -0
  50. static/full_images/8698cf66bd21c5ea60c80c87412da7a6.jpg +0 -0
README.md CHANGED
@@ -1,211 +1,19 @@
1
- <!-- Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. -->
2
-
3
- <picture>
4
- <source media="(prefers-color-scheme: dark)" srcset="https://assets.vespa.ai/logos/Vespa-logo-green-RGB.svg">
5
- <source media="(prefers-color-scheme: light)" srcset="https://assets.vespa.ai/logos/Vespa-logo-dark-RGB.svg">
6
- <img alt="#Vespa" width="200" src="https://assets.vespa.ai/logos/Vespa-logo-dark-RGB.svg" style="margin-bottom: 25px;">
7
- </picture>
8
-
9
- # 🔍 Visual Retrieval ColPali 👀
10
-
11
- This readme contains the code for a web app including a frontend that can be set up as a user facing interface.
12
-
13
- ## Why?
14
-
15
- To enable _you_ to showcase the power of ColPali and Vespa to your users, and to provide a starting point for your own projects.
16
-
17
- > "But I only know Python, how can I create a web app that's not Gradio or Streamlit?" 🤔
18
-
19
- No worries! This project uses [FastHTML](https://fastht.ml/) to create a beautiful web app - and it's all Python! 🐍
20
-
21
- Also, 👇
22
-
23
- <a href="https://imgflip.com/i/98mhch"><img src="https://i.imgflip.com/98mhch.jpg" title="made at imgflip.com" alt="Funny meme about json output in demo"/></a>
24
-
25
- As a prerequisite, you should run [this notebook](https://pyvespa.readthedocs.io/en/latest/examples/visual_pdf_rag_with_vespa_colpali_cloud.ipynb) to prepare the data and deploy the Vespa application.
26
-
27
- ## Setting up your .env variables
28
-
29
- The following variables are required in your `.env` file for the application to be able to connect to the Vespa application and the Gemini API:
30
-
31
- You can rename the `.env.example` file to `.env` and fill in the required values.
32
- The other variables are optional, if you want to use mTLS authentication against the Vespa application.
33
-
34
- ```bash
35
- VESPA_APP_TOKEN_URL=https://abcde.z.vespa-app.cloud
36
- VESPA_CLOUD_SECRET_TOKEN=vespa_cloud_xxxxxxxx
37
- GEMINI_API_KEY=asdf
38
- ```
39
-
40
- If you want to deploy the application to Huggingface, you also need to set a `HF_TOKEN` variable, with write permissions.
41
- This is personal, and must be created at [huggingface](https://huggingface.co/settings/tokens).
42
-
43
- ```bash
44
- HF_TOKEN=hf_xxxxxxxxxx
45
- ```
46
-
47
- ## Setting up python environment
48
-
49
- This application should work on Python 3.8 and above.
50
-
51
- You can install the dependencies with `pip`, but we recommend using `uv`.
52
- Skip to [Installing dependencies using `uv`](#installing-dependencies-using-uv) if you want to use `uv`.
53
-
54
- ### Installing dependencies using `pip`
55
-
56
- You can install the dependencies with `pip`:
57
-
58
- ```bash
59
- pip install -r src/requirements.txt
60
- ```
61
-
62
- ### Installing dependencies using `uv`
63
-
64
- We recommend installing the amazing `uv` to manage your python environment:
65
- See also [installation - uv docs](https://docs.astral.sh/uv/getting-started/installation/) for other installation options.
66
-
67
- ```bash
68
- curl -LsSf https://astral.sh/uv/install.sh | sh
69
- ```
70
-
71
- Then, create a virtual environment:
72
-
73
- ```bash
74
- uv venv
75
- ```
76
-
77
- Activate the virtual environment:
78
-
79
- ```bash
80
- source .venv/bin/activate
81
- ```
82
-
83
- Sync your virtual environment with the dependencies:
84
-
85
- ```bash
86
- uv sync --extra dev
87
- ```
88
-
89
- ## Running the application locally
90
-
91
- To run the application locally, you can change into the `src` directory and run:
92
-
93
- ```bash
94
- python main.py
95
- ```
96
-
97
- This will start a local server, and you can access the application at `http://localhost:7860`.
98
-
99
- ## Deploy to huggingface 🤗 spaces
100
-
101
- ### Compiling dependencies
102
-
103
- Before a deploy, make sure to run this to compile the `uv` lock file to `requirements.txt` if you have made changes to the dependencies:
104
-
105
- ```bash
106
- uv pip compile pyproject.toml -o src/requirements.txt
107
- ```
108
-
109
- This will make sure that the dependencies in your `pyproject.toml` are compiled to the `requirements.txt` file, which is used by the huggingface space.
110
-
111
- ### Deploying to huggingface
112
-
113
- Note that you need to set `HF_TOKEN` environment variable first.
114
- This is personal, and must be created at [huggingface](https://huggingface.co/settings/tokens).
115
- Make sure the token has `write` access.
116
- Be aware that this will not delete existing files, only modify or add,
117
- see [huggingface-cli](https://huggingface.co/docs/huggingface_hub/en/guides/upload#upload-from-the-cli) for more
118
- information.
119
-
120
- #### Update your space configuration
121
-
122
- The `src/README.md` file contains the configuration for the space.
123
- Feel free to update this file to match your own configuration - name, description, etc.
124
-
125
- Note that we can actually use the `gradio` SDK of spaces, to serve FastHTML apps as well, as long as we serve the app on port `7860`.
126
- See [Custom python spaces](https://huggingface.co/docs/hub/en/spaces-sdks-python) for more information.
127
-
128
- #### Upload the files
129
-
130
- To deploy, run
131
-
132
- (Replace `vespa-engine/colpali-vespa-visual-retrieval` with your own huggingface user/repo name, does not need to exist beforehand)
133
-
134
- ```bash
135
- huggingface-cli upload vespa-engine/colpali-vespa-visual-retrieval src . --repo-type=space
136
- ```
137
-
138
- Note that we upload only the `src` directory.
139
-
140
- ## Development
141
-
142
- This section applies if you want to make changes to the web app.
143
-
144
- ### Adding dependencies
145
-
146
- To add dependencies, you can add them to the `pyproject.toml` file and run:
147
-
148
- ```bash
149
- uv compile
150
- ```
151
-
152
- and then sync the dependencies:
153
-
154
- ```bash
155
- uv sync --extra dev
156
- ```
157
-
158
- ### Making changes to CSS
159
-
160
- To make changes to output.css apply, run
161
-
162
- ```bash
163
- shad4fast watch # watches all files passed through the tailwind.config.js content section
164
-
165
- shad4fast build # minifies the current output.css file to reduce bundle size in production.
166
- ```
167
-
168
- ### Instructions on creating and feeding the full dataset
169
-
170
- This section is only relevant if you want to create and feed the full dataset to Vespa.
171
- The notebook referenced in the beginning of this readme should be sufficient if you just want to spin up a scaled down version of the demo.
172
-
173
- #### Prepare data and Vespa application
174
-
175
- First, install `uv`:
176
-
177
- ```bash
178
- curl -LsSf https://astral.sh/uv/install.sh | sh
179
- ```
180
-
181
- Then, run:
182
-
183
- ```bash
184
- uv sync --extra dev --extra feed
185
- ```
186
-
187
- #### Converting to notebook
188
-
189
- If you want to run the `prepare_feed_deploy.py` as a notebook, you can convert it using `jupytext`:
190
-
191
- Convert the `prepare_feed_deploy.py` to notebook to:
192
-
193
- ```bash
194
- jupytext --to notebook prepare_feed_deploy.py
195
- ```
196
-
197
- And launch a Jupyter instance with:
198
-
199
- ```bash
200
- uv run --with jupyter jupyter lab
201
- ```
202
-
203
- ## Credits
204
-
205
- Huge thanks to the amazing projects that made it a joy to create this demo 🙏🙌
206
-
207
- - Freeing us from python dependency hell - [uv](https://astral.sh/uv/)
208
- - Allowing us to build **beautiful** full stack web apps in Python [FastHTML](https://fastht.ml/)
209
- - Introducing the ColPali architecture - [ColPali](https://huggingface.co/vidore/colpali-v1.2)
210
- - Adding `shadcn` components to FastHTML - [Shad4Fast](https://www.shad4fasthtml.com/)
211
-
 
1
+ ---
2
+ title: ColPali 🤝 Vespa - Visual Retrieval
3
+ short_description: Visual Retrieval with ColPali and Vespa
4
+ emoji: 👀
5
+ colorFrom: purple
6
+ colorTo: blue
7
+ sdk: gradio
8
+ sdk_version: 4.44.0
9
+ app_file: main.py
10
+ pinned: false
11
+ license: apache-2.0
12
+ suggested_hardware: t4-small
13
+ models:
14
+ - vidore/colpaligemma-3b-pt-448-base
15
+ - vidore/colpali-v1.2
16
+ preload_from_hub:
17
+ - vidore/colpaligemma-3b-pt-448-base config.json,model-00001-of-00002.safetensors,model-00002-of-00002.safetensors,model.safetensors.index.json,preprocessor_config.json,special_tokens_map.json,tokenizer.json,tokenizer_config.json 12c59eb7e23bc4c26876f7be7c17760d5d3a1ffa
18
+ - vidore/colpali-v1.2 adapter_config.json,adapter_model.safetensors,preprocessor_config.json,special_tokens_map.json,tokenizer.json,tokenizer_config.json 9912ce6f8a462d8cf2269f5606eabbd2784e764f
19
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
backend/vespa_app.py CHANGED
@@ -376,10 +376,11 @@ class VespaQueryClient:
376
  async def get_suggestions(self, query: str) -> list:
377
  async with self.app.asyncio(connections=1) as session:
378
  start = time.perf_counter()
379
- yql = f'select questions from {self.VESPA_SCHEMA_NAME} where questions matches "{query}" limit 3'
380
  response: VespaQueryResponse = await session.query(
381
  body={
382
  "yql": yql,
 
383
  "ranking": "unranked",
384
  "presentation.timing": True,
385
  "presentation.summary": "suggestions",
@@ -401,8 +402,15 @@ class VespaQueryClient:
401
  for result in search_results
402
  if "questions" in result["fields"]
403
  ]
404
- flat_questions = [item for sublist in questions for item in sublist]
405
- return flat_questions
 
 
 
 
 
 
 
406
 
407
  def get_rank_profile(self, ranking: str, sim_map: bool) -> str:
408
  if sim_map:
 
376
  async def get_suggestions(self, query: str) -> list:
377
  async with self.app.asyncio(connections=1) as session:
378
  start = time.perf_counter()
379
+ yql = f'select questions from {self.VESPA_SCHEMA_NAME} where questions matches (".*{query}.*")'
380
  response: VespaQueryResponse = await session.query(
381
  body={
382
  "yql": yql,
383
+ "query": query,
384
  "ranking": "unranked",
385
  "presentation.timing": True,
386
  "presentation.summary": "suggestions",
 
402
  for result in search_results
403
  if "questions" in result["fields"]
404
  ]
405
+ for q in questions:
406
+ print(q)
407
+ unique_questions = set([item for sublist in questions for item in sublist])
408
+
409
+ # remove an artifact from our data generation
410
+ if "string" in unique_questions:
411
+ unique_questions.remove("string")
412
+
413
+ return list(unique_questions)
414
 
415
  def get_rank_profile(self, ranking: str, sim_map: bool) -> str:
416
  if sim_map:
main.py CHANGED
@@ -220,7 +220,8 @@ async def get(session, request, query: str, ranking: str):
220
  f"Search results fetched in {end - start:.2f} seconds. Vespa search time: {result['timing']['searchtime']}"
221
  )
222
  search_time = result["timing"]["searchtime"]
223
- total_count = result["root"]["fields"]["totalCount"]
 
224
 
225
  search_results = vespa_app.results_to_search_results(result, idx_to_token)
226
 
 
220
  f"Search results fetched in {end - start:.2f} seconds. Vespa search time: {result['timing']['searchtime']}"
221
  )
222
  search_time = result["timing"]["searchtime"]
223
+ # Safely get total_count with a default of 0
224
+ total_count = result.get("root", {}).get("fields", {}).get("totalCount", 0)
225
 
226
  search_results = vespa_app.results_to_search_results(result, idx_to_token)
227
 
static/full_images/012200fb9d07b72da276db7e9d499a2f.jpg ADDED
static/full_images/088177a2b300ed9e9cd8d936edb7b5ca.jpg ADDED
static/full_images/0d9a8f0fa595513774d94d6058aa3c27.jpg ADDED
static/full_images/0f2073449af49b9a766d269bda7aa2ec.jpg ADDED
static/full_images/15610e06e17b391f9d6212d4a704d1c8.jpg ADDED
static/full_images/176768b9dda73154bbbe67983b07ef7b.jpg ADDED
static/full_images/189aa5523cce7b3e37b336aa1327bae8.jpg ADDED
static/full_images/23b1e0404920ce0f0257eb0e8190c557.jpg ADDED
static/full_images/2c5ab4a0f12fa673c17f0283f698bdb9.jpg ADDED
static/full_images/2e70dc0cf1e35b3f11f6d94017a17110.jpg ADDED
static/full_images/31b6ab7aae4796b0d6200380b05b25f0.jpg ADDED
static/full_images/37002ca88b2a5600b5ec7dfd03900d84.jpg ADDED
static/full_images/3cd17414305381977eae7ff3d30de6dc.jpg ADDED
static/full_images/3d0c05ab1db804e856a25f473dc5149d.jpg ADDED
static/full_images/41ff192952c5e93adebbc1852c748985.jpg ADDED
static/full_images/42adeab8461f18a692708f37444d4dee.jpg ADDED
static/full_images/44bce8cd76f96db9be93d366c2a36fd3.jpg ADDED
static/full_images/45c159551a8840a4528eb2034d562abd.jpg ADDED
static/full_images/46a9cf839d8c07f740c88020e1459d2f.jpg ADDED
static/full_images/473639db0c76b6e80664775066bd2a7b.jpg ADDED
static/full_images/47b81a110672e649487aeb708f1277d5.jpg ADDED
static/full_images/4af459669551f34d1542e9734c1722db.jpg ADDED
static/full_images/4f7a37046626de54fdf189106371eea2.jpg ADDED
static/full_images/5257e0dc94a52bbf0dff16f4b481772b.jpg ADDED
static/full_images/5466c54401a63372bda9bf0b8586ced9.jpg ADDED
static/full_images/547f1ed03149aee493b22a192146b651.jpg ADDED
static/full_images/5913b08b81182be02056225d19225039.jpg ADDED
static/full_images/64060791c89655d280a6a01eea8a8a47.jpg ADDED
static/full_images/645d0f26220a092ffc0de138c72927a8.jpg ADDED
static/full_images/694dfbca150f4cac83e27d17f0b04bbd.jpg ADDED
static/full_images/6a14ac1f65b59b0cfdc65ecc6bd57b55.jpg ADDED
static/full_images/6b1f502ac819b8a5ba2e65f8e56f22b8.jpg ADDED
static/full_images/7051b5a4804d48045fff84866923d34a.jpg ADDED
static/full_images/7364ccb5e7ff145e56014d70ec0295b4.jpg ADDED
static/full_images/743dd0b209db20acac00090e34f66d1e.jpg ADDED
static/full_images/7485ce436c35173ee7c379ad67754ae7.jpg ADDED
static/full_images/74f26585a4ecb41de76e8cb906811244.jpg ADDED
static/full_images/75a1a2c786e848e7e043f2c3b759ad73.jpg ADDED
static/full_images/76a4708381e45c35b33793dff2791c7c.jpg ADDED
static/full_images/7898c6c68d78f85ed8d8d92c406a3850.jpg ADDED
static/full_images/7901d8397e67307fa7ccbac2e1dcaba4.jpg ADDED
static/full_images/7a1546e2ad050e8a915cbcbfae7a991f.jpg ADDED
static/full_images/7cf284e8ccc5a68b4a92eca9b1176707.jpg ADDED
static/full_images/83929419be13d5a71b23bfd71344fe0b.jpg ADDED
static/full_images/8403e1a2fdb89eb1e96227427c8b833c.jpg ADDED
static/full_images/8554ad2b2ea96d1f363f0470fff572d5.jpg ADDED
static/full_images/8698cf66bd21c5ea60c80c87412da7a6.jpg ADDED