|
--- |
|
base_model: aubmindlab/bert-base-arabertv02 |
|
datasets: |
|
- akhooli/arabic-triplets-1m-curated-sims-len |
|
language: |
|
- ar |
|
library_name: sentence-transformers |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- transformers.js |
|
- transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- dataset_size:75000 |
|
- loss:MatryoshkaLoss |
|
- loss:MultipleNegativesRankingLoss |
|
- mteb |
|
model-index: |
|
- name: Omartificial-Intelligence-Space/Arabert-matro-v4 |
|
results: |
|
- dataset: |
|
config: ar-ar |
|
name: MTEB STS17 (ar-ar) |
|
revision: faeb762787bd10488a50c8b5be4a3b82e411949c |
|
split: test |
|
type: mteb/sts17-crosslingual-sts |
|
metrics: |
|
- type: cosine_pearson |
|
value: 84.66883392015258 |
|
- type: cosine_spearman |
|
value: 85.30520907141938 |
|
- type: euclidean_pearson |
|
value: 82.04306779342852 |
|
- type: euclidean_spearman |
|
value: 84.58744201847996 |
|
- type: main_score |
|
value: 85.30520907141938 |
|
- type: manhattan_pearson |
|
value: 82.08829357724328 |
|
- type: manhattan_spearman |
|
value: 84.49254541383544 |
|
task: |
|
type: STS |
|
license: apache-2.0 |
|
--- |
|
|
|
# Arabic-Triplet-Matryoshka-V2-Model |
|
|
|
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02). |
|
|
|
- It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, |
|
text classification, clustering, and more. |
|
|
|
|
|
- This model is trained on 1M samples from the [akhooli/arabic-triplets-1m-curated-sims-len](https://huggingface.co/datasets/akhooli/arabic-triplets-1m-curated-sims-len) dataset. |
|
|
|
|
|
- Trained for 3 epochs, with final training loss of 0.718 (using MatryoshkaLoss). |
|
|
|
|
|
```markdown |
|
## Citation |
|
|
|
If you use the Arabic Matryoshka Embeddings Model, please cite it as follows: |
|
|
|
@misc{nacar2024enhancingsemanticsimilarityunderstanding, |
|
title={Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning}, |
|
author={Omer Nacar and Anis Koubaa}, |
|
year={2024}, |
|
eprint={2407.21139}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2407.21139}, |
|
} |