|
--- |
|
language: |
|
- en |
|
base_model: |
|
- google/gemma-2-9b-it |
|
--- |
|
|
|
# Gemma Embeddings v0.8 |
|
|
|
GemmaEmbed is a dense-vector embedding model, trained especially for retrieval. As of December 2, 2024, GemmaEmbed achieves the #1 position overall on the _MTEB Retrieval_ leaderboard, with a score of 63.90. |
|
|
|
# Important Notes |
|
* This is not an official Google product. |
|
* This is a research project. |
|
|
|
# Results summary |
|
|
|
Results compared to BGE-EN-ICL on several large datasets |
|
|
|
Model | DBPedia | FEVER | HotPotQA | MSMARCO | NQ | |
|
------ | --------- | ------ | ------- | ------- | ------ | |
|
BGE-EN-ICL | 51.63 | 92.83 | 85.14 | 46.79 | 73.88 | |
|
Gemma-Embeddings-v0.8 | 52.58 | 93.50 | 87.58 | 47.13 | 74.45 | |
|
|
|
|
|
# Model & Data |
|
|
|
Our base encoder model is [Gemma2 9B](https://huggingface.co/google/gemma-2-9b). |
|
|
|
We use the [BGE-EN-ICL training data](https://huggingface.co/datasets/cfli/bge-full-data). |
|
|
|
# Research Team |
|
|
|
* Nicholas Monath |
|
* Michael Boratko |
|
* Seungyeon Kim |
|
* Andrew McCallum |
|
* Rob Fergus |
|
* Manzil Zaheer |
|
|
|
|
|
|