manzilzaheer's picture
Update README.md
0067b5d verified
|
raw
history blame
1.01 kB
metadata
language:
  - en
base_model:
  - google/gemma-2-9b-it

Gemma Embeddings v0.8

GemmaEmbed is a dense-vector embedding model, trained especially for retrieval. As of December 2, 2024, GemmaEmbed achieves the #1 position overall on the MTEB Retrieval leaderboard, with a score of 63.90.

Important Notes

  • This is not an official Google product.
  • This is a research project.

Results summary

Results compared to BGE-EN-ICL on several large datasets

Model DBPedia FEVER HotPotQA MSMARCO NQ
BGE-EN-ICL 51.63 92.83 85.14 46.79 73.88
Gemma-Embeddings-v0.8 52.58 93.50 87.58 47.13 74.45

Model & Data

Our base encoder model is Gemma2 9B.

We use the BGE-EN-ICL training data.

Research Team

  • Nicholas Monath
  • Michael Boratko
  • Seungyeon Kim
  • Andrew McCallum
  • Rob Fergus
  • Manzil Zaheer