Multilingual Colbert embeddings as a service
Goal
- Deploy Antoine Louis' colbert-xm as an inference service: text(s) in, vector(s) out
Motivation
- use the service in a broader RAG solution
Steps followed
- Clone the original repo following this procedure
- Add a custom handler script as described here
Local development and testing
Build and start docker container hf_endpoints_emulator
docker-compose up -d --build
This can take a few moments to load, given the size of the model (> 3 GB)!
How to test locally
./embed_single_query.sh
./embed_two_chunks.sh
docker-compose exec hf_endpoints_emulator pytest
Check output
docker-compose logs --follow hf_endpoints_emulator