|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- thenlper/gte-large |
|
--- |
|
## News |
|
12/11/2024: Release of Algolia/Algolia-large-en-generic-v2410, Algolia's English embedding model. |
|
|
|
## Models |
|
Algolia-large-en-generic-v2410 is the first addition to Algolia's suite of embedding models built for retrieval performance and efficiency in e-commerce search. |
|
Algolia v2410 models are the state-of-the-art for their size and use cases and now available under an MIT licence. |
|
|
|
Note that generic models are trained on public and synthetic e-commerce datasets only. |
|
|
|
### Quality Benchmarks |
|
|Model|MTEB EN rank|Public e-comm rank| Algolia private e-comm rank| |
|
|------------|------------|------------|------------| |
|
|Algolia-large-en-generic-v2410|11|2|10| |
|
|
|
Note that our benchmarks are for retrieval task only, and includes open-source models that are approximately 500M parameters and smaller, and commercially available embedding models. |
|
|
|
## Usage |
|
|
|
### Using Sentence Transformers |
|
```python |
|
# Load model |
|
from scipy.spatial.distance import cosine |
|
from sentence_transformers import SentenceTransformer |
|
modelname = "algolia/algolia-large-en-generic-v2410" |
|
model = SentenceTransformer(modelname) |
|
|
|
# Define embedding and compute_similarity |
|
def get_embedding(text): |
|
embedding = model.encode([text]) |
|
return embedding[0] |
|
def compute_similarity(query, documents): |
|
query_emb = get_embedding(query) |
|
doc_embeddings = [get_embedding(doc) for doc in documents] |
|
# Calculate cosine similarity |
|
similarities = [1 - cosine(query_emb, doc_emb) for doc_emb in doc_embeddings] |
|
ranked_docs = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True) |
|
# Format output |
|
return [{"document": doc, "similarity_score": round(sim, 4)} for doc, sim in ranked_docs] |
|
|
|
# Define inputs |
|
query = "query: "+"running shoes" |
|
documents = ["adidas sneakers, great for outdoor running", |
|
"nike soccer boots indoor, it can be used on turf", |
|
"new balance light weight, good for jogging", |
|
"hiking boots, good for bushwalking" |
|
] |
|
|
|
# Output the results |
|
result_df = pd.DataFrame(compute_similarity(query,documents)) |
|
print(query) |
|
result_df.head() |
|
``` |
|
|
|
## Contact |
|
Feel free to open an issue or pull request if you have any questions or suggestions about this project. |
|
You also can email Rasit Abay(rasit.abay@algolia.com). |
|
|
|
## License |
|
Algolia EN v2410 is licensed under the [MIT](https://mit-license.org/). The released models can be used for commercial purposes free of charge. |
|
|