Quantized EmbeddingGemma-300M model

Description

This repository provides quantized variants (Q4_K_M and Q5_K_M) of EmbeddingGemma, Google’s 300M parameter multilingual embedding model from the Gemma family (initialized with T5Gemma). EmbeddingGemma is designed for generating high-quality vector representations of text, making it well-suited for search, retrieval, classification, clustering, and semantic similarity tasks across 100+ spoken languages.

The Q4_K_M and Q5_K_M quantized versions significantly reduce memory requirements and improve inference efficiency, enabling deployment on a wider range of resource-constrained environments such as mobile devices, laptops, desktops, and lightweight servers. While these quantized models maintain the core capabilities of the original EmbeddingGemma, they offer different trade-offs between efficiency and accuracy—Q4_K_M being more compact, and Q5_K_M providing slightly higher fidelity.

Inputs and outputs

Input:
- Text string, such as a question, a prompt, or a document to be embedded
- Maximum input context length of 2048 tokens
Output:
- Numerical vector representations of input text data
- Output embedding dimension size of 768, with smaller options available (512, 256, or 128) via Matryoshka Representation Learning (MRL). MRL allows users to truncate the output embedding of size 768 to their desired size and then re-normalize for efficient and accurate representation.

Usage

These model weights are designed to be used with Sentence Transformers, using the Gemma 3 implementation from Hugging Face Transformers as the backbone.

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Load quantized EmbeddingGemma from the Huggingface Hub
model = SentenceTransformer("SandlogicTechnologies/embeddinggemma-300m-Q4_K_M")

# Run inference with queries and documents
query = "Which planet is known as the Red Planet?"
documents = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]

# Encode query and documents
query_embeddings = model.encode_query(query)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# (768,) (4, 768)

# Compute similarities to determine a ranking
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.3011, 0.6359, 0.4930, 0.4889]])

NOTE: EmbeddingGemma activations do not support float16. Please use float32 or bfloat16 as appropriate for your hardware.

Intended Usage

Open embedding models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development.

Semantic Similarity: Embeddings optimized to assess text similarity, such as recommendation systems and duplicate detection
Classification: Embeddings optimized to classify texts according to preset labels, such as sentiment analysis and spam detection
Clustering: Embeddings optimized to cluster texts based on their similarities, such as document organization, market research, and anomaly detection
Retrieval
- Document: Embeddings optimized for document search, such as indexing articles, books, or web pages for search
- Query: Embeddings optimized for general search queries, such as custom search
- Code Query: Embeddings optimized for retrieval of code blocks based on natural language queries, such as code suggestions and search
Question Answering: Embeddings for questions in a question-answering system, optimized for finding documents that answer the question, such as chatbox.
Fact Verification: Embeddings for statements that need to be verified, optimized for retrieving documents that contain evidence supporting or refuting the statement, such as automated fact-checking systems.

Limitations

Training Data
- The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses.
- The scope of the training dataset determines the subject areas the model can handle effectively.
Language Ambiguity and Nuance
- Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language.

Ethical Considerations and Risks

Risks identified and mitigations:

Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases.
Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of embeddings. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the Gemma Prohibited Use Policy.
Privacy violations: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques.

Acknowledgments

These quantized models are based on the original work by Google development team.

Special thanks to:

The Google team for developing and releasing the embeddinggemma-300m model.
Georgi Gerganov and the entire llama.cpp open-source community for enabling efficient model quantization and inference via the GGUF format.

Contact

For any inquiries or support, please contact us at support@sandlogic.com or visit our Website.

Downloads last month: 137

GGUF

Model size

0.3B params

Architecture

gemma-embedding

Hardware compatibility

4-bit

5-bit

Model tree for SandLogicTechnologies/EmbeddingGemma-300m-GGUF

Base model

google/embeddinggemma-300m

Quantized

(22)

this model