metadata

language:
  - en
base_model:
  - google/gemma-2-9b-it

Gemma Embeddings v0.8

GemmaEmbed is a dense-vector embedding model, trained especially for retrieval. As of December 2, 2024, GemmaEmbed achieves the #1 position overall on the MTEB Retrieval leaderboard, with a score of 63.90.

Important Notes

This is not an official Google product.
This is a research project.

Results summary

Results compared to BGE-EN-ICL on several large datasets

Model	DBPedia	FEVER	HotPotQA	MSMARCO	NQ
BGE-EN-ICL	51.63	92.83	85.14	46.79	73.88
Gemma-Embeddings-v0.8	52.58	93.50	87.58	47.13	74.45

Model & Data

Our base encoder model is Gemma2 9B.

We use the BGE-EN-ICL training data.

Research Team

Nicholas Monath
Michael Boratko
Seungyeon Kim
Andrew McCallum
Rob Fergus
Manzil Zaheer