Finetune for document retrieval task

#90
by truong1301 - opened

I am fine-tuning for a document retrieval task but am unsure which adapter to choose. The available options include passage encoding, query encoding, and text matching, all of which seem suitable for retrieval. My initial plan is to fine-tune the query encoding adapter. However, do I still need to fine-tune the query encoding precisely if I fine-tune it? Additionally, I plan to use contrastive learning when fine-tuning the passage encoding adapter. Is the process similar to the query encoding adapter? My training data consists of queries paired with a list of documents containing the relevant evidence. thank you in advance.

Someone ask a similar question before [1]. Ideally you would fine-tune the query and passage adapters together. However, implementing this is a bit complicated. We don't have code for this that we can publish and Sentences Transformers does not support it. I would actually recommend to fine-tune the text-matching adapter. This also works for retrieval. Generally it works better for symmetric retrieval tasks, but fine-tuning should be effective to adapt it to asymmetric tasks.

[1] https://huggingface.co/jinaai/jina-embeddings-v3/discussions/71

Sign up or log in to comment