Papers
arxiv:2405.05374
Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models
Published on May 8
Authors:
Abstract
This report describes the training dataset creation and recipe behind the family of arctic-embed text embedding models (a set of five models ranging from 22 to 334 million parameters with weights open-sourced under an Apache-2 license). At the time of their release, each model achieved state-of-the-art retrieval accuracy for models of their size on the MTEB Retrieval leaderboard, with the largest model, arctic-embed-l outperforming closed source embedding models such as Cohere's embed-v3 and Open AI's text-embed-3-large. In addition to the details of our training recipe, we have provided several informative ablation studies, which we believe are the cause of our model performance.
Models citing this paper 6
Browse 6 models citing this paperDatasets citing this paper 0
No dataset linking this paper
Cite arxiv.org/abs/2405.05374 in a dataset README.md to link it from this page.