metadata
license: mit
language:
- en
base_model:
- thenlper/gte-large
News
12/11/2024: Release of Algolia/Algolia-large-en-generic-v2410, Algolia's English embedding model.
Models
Algolia-large-en-generic-v2410 is the first addition to Algolia's suite of embedding models built for retrieval performance and efficiency in e-commerce search. Algolia v2410 models are the state-of-the-art for their size and use cases and now available under an MIT licence.
Note that generic models are trained on public and synthetic e-commerce datasets only.
Usage
Using Sentence Transformers
# Load model
from scipy.spatial.distance import cosine
from sentence_transformers import SentenceTransformer
modelname = "algolia/algolia-large-en-generic-v2410"
model = SentenceTransformer(modelname)
# Define embedding and compute_similarity
def get_embedding(text):
embedding = model.encode([text])
return embedding[0]
def compute_similarity(query, documents):
query_emb = get_embedding(query)
doc_embeddings = [get_embedding(doc) for doc in documents]
# Calculate cosine similarity
similarities = [1 - cosine(query_emb, doc_emb) for doc_emb in doc_embeddings]
ranked_docs = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)
# Format output
return [{"document": doc, "similarity_score": round(sim, 4)} for doc, sim in ranked_docs]
# Define inputs
query = "query: "+"running shoes"
documents = ["adidas sneakers, great for outdoor running",
"nike soccer boots indoor, it can be used on turf",
"new balance light weight, good for jogging",
"hiking boots, good for bushwalking"
]
# Output the results
result_df = pd.DataFrame(compute_similarity(query,documents))
print(query)
result_df.head()
Contact
Feel free to open an issue or pull request if you have any questions or suggestions about this project. You also can email Rasit Abay(rasit.abay@algolia.com).
License
Algolia EN v2410 is licensed under the MIT. The released models can be used for commercial purposes free of charge.