Edit model card

A quantized version of multilingual-e5-small. Quantization was performed per-layer under the same conditions as our ELSERv2 model, as described here.

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022

Benchmarks

We performed a number of small benchmarks to assess both the changes in quality as well as inference latency against the baseline original model.

Quality

Measuring NDCG@10 using the dev split of the MIRACL datasets for select languages, we see mostly a marginal change in quality of the quantized model.

de yo ru ar es th
multilingual-e5-small 0.75862 0.56193 0.80309 0.82778 0.81672 0.85072
multilingual-e5-small-optimized 0.75992 0.48934 0.79668 0.82017 0.8135 0.84316

To test the English out-of-domain performance, we used the test split of various datasets in the BEIR evaluation. Measuring NDCG@10, we see a larger change in SCIFACT, but marginal in the other datasets evaluated.

FIQA SCIFACT nfcorpus
multilingual-e5-small 0.33126 0.677 0.31004
multilingual-e5-small-optimized 0.31734 0.65484 0.30126

Performance

Using a PyTorch model traced for Linux and Intel CPUs, we performed performance benchmarking with various lengths of input. Overall, we see on average a 50-20% performance improvement with the optimized model.

input length (characters) multilingual-e5-small multilingual-e5-small-optimized speedup
0 - 50 0.0181 0.00826 54.36%
50 - 100 0.0275 0.0164 40.36%
100 - 150 0.0366 0.0237 35.25%
150 - 200 0.0435 0.0301 30.80%
200 - 250 0.0514 0.0379 26.26%
250 - 300 0.0569 0.043 24.43%
300 - 350 0.0663 0.0513 22.62%
350 - 400 0.0737 0.0576 21.85%

Disclaimer

This e5 model, as defined, hosted, integrated and used in conjunction with our other Elastic Software is covered by our standard warranty.

Downloads last month
242
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.