elastic/multilingual-e5-small-optimized

A quantized version of multilingual-e5-small. Quantization was performed per-layer under the same conditions as our ELSERv2 model, as described here.

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022

Benchmarks

We performed a number of small benchmarks to assess both the changes in quality as well as inference latency against the baseline original model.

Quality

Measuring NDCG@10 using the dev split of the MIRACL datasets for select languages, we see mostly a marginal change in quality of the quantized model.

	de	yo	ru	ar	es	th
multilingual-e5-small	0.75862	0.56193	0.80309	0.82778	0.81672	0.85072
multilingual-e5-small-optimized	0.75992	0.48934	0.79668	0.82017	0.8135	0.84316

To test the English out-of-domain performance, we used the test split of various datasets in the BEIR evaluation. Measuring NDCG@10, we see a larger change in SCIFACT, but marginal in the other datasets evaluated.

	FIQA	SCIFACT	nfcorpus
multilingual-e5-small	0.33126	0.677	0.31004
multilingual-e5-small-optimized	0.31734	0.65484	0.30126

Performance

Using a PyTorch model traced for Linux and Intel CPUs, we performed performance benchmarking with various lengths of input. Overall, we see on average a 50-20% performance improvement with the optimized model.

input length (characters)	multilingual-e5-small	multilingual-e5-small-optimized	speedup
0 - 50	0.0181	0.00826	54.36%
50 - 100	0.0275	0.0164	40.36%
100 - 150	0.0366	0.0237	35.25%
150 - 200	0.0435	0.0301	30.80%
200 - 250	0.0514	0.0379	26.26%
250 - 300	0.0569	0.043	24.43%
300 - 350	0.0663	0.0513	22.62%
350 - 400	0.0737	0.0576	21.85%

Disclaimer

This e5 model, as defined, hosted, integrated and used in conjunction with our other Elastic Software is covered by our standard warranty.