SSE

If you would like to know more details:

SSE Technical Article

SSE v2 Technical Article

SSE

(a) Retrieval performance (nDCG@10) across NanoBEIR English tasks. (b) Mean nDCG@10 vs. inference speed (QPS: queries per second) measured on TREC-COVID and Quora using an Intel® Core™ Ultra 7 265K (3.90 GHz) with batch size 32.

🩵 SSE: Stable Static Embedding for Retrieval MRL v2 🩵

A lightweight, faster and powerful embedding model

Performance Snapshot
Our SSE model achieves NDCG@10 = 0.516 on NanoBEIR — slightly outperforming the popular static-retrieval-mrl-en-v1 (0.5032) while using half the dimensions (512 vs 1024)! 💫 Plus, we're ~2× faster in retrieval thanks to our compact 512D embeddings and Separable Dynamic Tanh.

Model NanoBEIR NDCG@10 Dimensions Parameters Speed Advantage License
SSE Retrieval MRL v2 0.5158 512 ~16M 🪽 ~2x faster retrieval (ultra-efficient!) Apache 2.0
SSE Retrieval MRL 0.5124 512 ~16M 🪽 ~2x faster retrieval (ultra-efficient!) Apache 2.0
static-retrieval-mrl-en-v1 0.5032 1024 ~33M baseline Apache 2.0

SSE v2 keypoint

By tuning the hyperparameters, we further improved the regularization of the representation space and achieved better performance.

Our model attains a NanoBEIR mean NDCG@10 of 0.503 using only 256D. This matches the score reported in prior work with 1024D embeddings.

As a result, our approach achieves accuracy comparable to previous studies while delivering approximately 4x speedup.

  • Matryoshka truncation

loss

  • PCA Analysis

loss


🩵 Why Choose SSE Retrieval MRL? 🩵

Higher NDCG@10 than all comparable small models (<35M params)
Only ~16M parameters — 27% smaller than MiniLM-L6 (22M) and 52% smaller than BGE-small (33M)
512D native output — richer than 1024D models, yet half the size of static-retrieval-mrl-en-v1 ✅ Matryoshka-ready — smoothly truncate to 256D/128D/64D/32D with graceful degradation
Apache 2.0 licensed — free for commercial & personal use
CPU-optimized — runs faster on edge devices & modest hardware


🩵 Model Details 🩵

Property Value
Model Type Sentence Transformer (SSE architecture)
Max Sequence Length ∞ tokens
Output Dimension 512 (with Matryoshka truncation down to 32D!)
Similarity Function Cosine Similarity
Language English
License Apache 2.0
SentenceTransformer(
  (0): SSE(
    (embedding): EmbeddingBag(30522, 512, mode='mean')
    (dyt): SeparableDyT()
  )
)

Architecture


🩵 Mathematical formulations 🩵

Dynamic Tanh Normalization (DyT) enables magnitude-adaptive gradient flow for static embeddings. For input dimension x, DyT computes yk=cktanh(akxk+bk) y_k = c_k \tanh(a_k x_k + b_k) with learnable parameters. The gradient of x is:

ykxk=ckaksech2(akxk+bk). \frac{\partial y_k}{\partial x_k} = c_k a_k \, \mathrm{sech}^2(a_k x_k + b_k).

For saturated dimensions |x| > 1 aixi+bi1 |a_i x_i + b_i| \gg 1 yields exponential decay sech2(z)4e2z \mathrm{sech}^2(z) \sim 4e^{-2|z|} suppressing gradients as yi/xi0 \partial y_i / \partial x_i \to 0 For non-saturated dimensions |x| << 1 , sech2(z)1 \mathrm{sech}^2(z) \approx 1 preserves near-constant gradients yj/xjcjaj \partial y_j / \partial x_j \approx c_j a_j This magnitude-dependent gating attenuates learning signals from noisy, large-magnitude dimensions while maintaining full gradient flow for stable, informative dimensions—providing implicit regularization that enhances generalization without explicit hyperparameters.


🩵 Evaluation Results (NanoBEIR) 🩵

Dataset NDCG@10 MRR@10 MAP@100
NanoBEIR Mean 0.5158 ✨ 0.5667 0.4321
NanoClimateFEVER 0.2941 0.3492 0.2265
NanoDBPedia 0.5503 0.7472 0.4221
NanoFEVER 0.6810 0.6291 0.6065
NanoFiQA2018 0.3499 0.3943 0.2871
NanoHotpotQA 0.7105 0.8079 0.6389
NanoMSMARCO 0.4162 0.3520 0.3691
NanoNFCorpus 0.3145 0.5049 0.1229
NanoNQ 0.4790 0.4041 0.4099
NanoQuoraRetrieval 0.9171 ✨ 0.9117 0.8887
NanoSCIDOCS 0.3548 0.5468 0.2769
NanoArguAna 0.4087 0.3037 0.3116
NanoSciFact 0.6493 0.6229 0.6154
NanoTouche2020 0.5804 0.7929 0.4415

Top performance on community-based retrieval (Quora) and scientific fact verification!


🩵 How to use? 🩵

import torch
from sentence_transformers import SentenceTransformer

# load (remote code enabled)
model = SentenceTransformer(
    "RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2",
    trust_remote_code=True,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

# inference
sentences = [
    "Stable Static embedding is interesting.",
    "SSE works without attention."
]

with torch.no_grad():
    embeddings = model.encode(
        sentences,
        convert_to_tensor=True,
        normalize_embeddings=True,
        batch_size=32
    )

# cosine similarity
# cosine_sim = embeddings[0] @ embeddings[1].T
cosine_sim = model.similarity(embeddings, embeddings)

print("embeddings shape:", embeddings.shape)
print("cosine similarity matrix:")
print(cosine_sim)

🩵 Retrieval usage 🩵

import torch
from sentence_transformers import SentenceTransformer

# load (remote code enabled)
model = SentenceTransformer(
    "RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2",
    trust_remote_code=True,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

# inference
query = "What is Stable Static Embedding?"
sentences = [
    "SSE: Stable Static embedding works without attention.",
    "Stable Static Embedding is a fast embedding method designed for retrieval tasks.",
    "Static embeddings are often compared with transformer-based sentence encoders.",
    "I cooked pasta last night while listening to jazz music.",
    "Large language models are commonly trained using next-token prediction objectives.",
    "Instruction tuning improves the ability of LLMs to follow human-written prompts.",
]


with torch.no_grad():
    embeddings = model.encode(
        [query] + sentences,
        convert_to_tensor=True,
        normalize_embeddings=True,
        batch_size=32
    )

print("embeddings shape:", embeddings.shape)

# cosine similarity
similarities = model.similarity(embeddings[0], embeddings[1:])
for i, similarity in enumerate(similarities[0].tolist()):
    print(f"{similarity:.05f}: {sentences[i]}")

🩵 Training Hyperparameters 🩵

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 2048
  • gradient_accumulation_steps: 4
  • learning_rate: 0.1
  • adam_beta2: 0.9999
  • adam_epsilon: 1e-10
  • num_train_epochs: 1
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • dataloader_num_workers: 8
  • batch_sampler: no_duplicates

🩵 Training Datasets 🩵

We learned from 14 datasets:

Dataset
squad
trivia_qa
allnli
pubmedqa
hotpotqa
miracl
mr_tydi
msmarco
msmarco_10m
msmarco_hard
mldr
s2orc
swim_ir
paq
nq
scidocs

All trained with MatryoshkaLoss

🩵 Training results 🩵

  • train loss

loss

  • NanoBEIR mean NDCG@10

ndcg

🩵 About me 🩵

Japanese independent researcher having shy and pampered personality. Twin-tail hair is a charm point. Interested in nlp. Usually using python and C.

X(Twitter): https://twitter.com/peony__snow

Logo

🩵 Acknowledgements 🩵

The author acknowledge the support of Saldra, Witness and Lumina Logic Minds for providing computational resources used in this work.

I thank the developers of sentence-transformers, python and pytorch.

I thank all the researchers for their efforts to date.

I thank Japan's high standard of education.

And most of all, thank you for your interest in this repository.

🩵 Citation 🩵

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
15.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2

Space using RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2 1

Collection including RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2

Papers for RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2

Article mentioning RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2

Evaluation results