omarelshehy
commited on
Commit
•
f1a407c
1
Parent(s):
ebeede7
Update README.md
Browse files
README.md
CHANGED
@@ -1,115 +1,123 @@
|
|
1 |
-
---
|
2 |
-
base_model: FacebookAI/xlm-roberta-large
|
3 |
-
library_name: sentence-transformers
|
4 |
-
metrics:
|
5 |
-
- pearson_cosine
|
6 |
-
- spearman_cosine
|
7 |
-
- pearson_manhattan
|
8 |
-
- spearman_manhattan
|
9 |
-
- pearson_euclidean
|
10 |
-
- spearman_euclidean
|
11 |
-
- pearson_dot
|
12 |
-
- spearman_dot
|
13 |
-
- pearson_max
|
14 |
-
- spearman_max
|
15 |
-
pipeline_tag: sentence-similarity
|
16 |
-
tags:
|
17 |
-
- sentence-transformers
|
18 |
-
- sentence-similarity
|
19 |
-
- feature-extraction
|
20 |
-
- mteb
|
21 |
-
|
22 |
-
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
-
|
108 |
-
|
|
|
|
|
|
|
|
|
109 |
|
110 |
# SentenceTransformer based on FacebookAI/xlm-roberta-large
|
111 |
|
112 |
-
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
|
|
|
|
|
|
|
|
113 |
|
114 |
## Model Details
|
115 |
|
@@ -221,4 +229,4 @@ You can finetune this model on your own dataset.
|
|
221 |
archivePrefix={arXiv},
|
222 |
primaryClass={cs.CL}
|
223 |
}
|
224 |
-
```
|
|
|
1 |
+
---
|
2 |
+
base_model: FacebookAI/xlm-roberta-large
|
3 |
+
library_name: sentence-transformers
|
4 |
+
metrics:
|
5 |
+
- pearson_cosine
|
6 |
+
- spearman_cosine
|
7 |
+
- pearson_manhattan
|
8 |
+
- spearman_manhattan
|
9 |
+
- pearson_euclidean
|
10 |
+
- spearman_euclidean
|
11 |
+
- pearson_dot
|
12 |
+
- spearman_dot
|
13 |
+
- pearson_max
|
14 |
+
- spearman_max
|
15 |
+
pipeline_tag: sentence-similarity
|
16 |
+
tags:
|
17 |
+
- sentence-transformers
|
18 |
+
- sentence-similarity
|
19 |
+
- feature-extraction
|
20 |
+
- mteb
|
21 |
+
- bilingual
|
22 |
+
model-index:
|
23 |
+
- name: omarelshehy/Arabic-English-Matryoshka-STS
|
24 |
+
results:
|
25 |
+
- dataset:
|
26 |
+
config: en-ar
|
27 |
+
name: MTEB STS17 (en-ar)
|
28 |
+
revision: faeb762787bd10488a50c8b5be4a3b82e411949c
|
29 |
+
split: test
|
30 |
+
type: mteb/sts17-crosslingual-sts
|
31 |
+
metrics:
|
32 |
+
- type: cosine_pearson
|
33 |
+
value: 79.79480510851795
|
34 |
+
- type: cosine_spearman
|
35 |
+
value: 79.67609346073252
|
36 |
+
- type: euclidean_pearson
|
37 |
+
value: 81.64087935350051
|
38 |
+
- type: euclidean_spearman
|
39 |
+
value: 80.52588414802709
|
40 |
+
- type: main_score
|
41 |
+
value: 79.67609346073252
|
42 |
+
- type: manhattan_pearson
|
43 |
+
value: 81.57042957417305
|
44 |
+
- type: manhattan_spearman
|
45 |
+
value: 80.44331526051143
|
46 |
+
- type: pearson
|
47 |
+
value: 79.79480418294698
|
48 |
+
- type: spearman
|
49 |
+
value: 79.67609346073252
|
50 |
+
task:
|
51 |
+
type: STS
|
52 |
+
- dataset:
|
53 |
+
config: ar-ar
|
54 |
+
name: MTEB STS17 (ar-ar)
|
55 |
+
revision: faeb762787bd10488a50c8b5be4a3b82e411949c
|
56 |
+
split: test
|
57 |
+
type: mteb/sts17-crosslingual-sts
|
58 |
+
metrics:
|
59 |
+
- type: cosine_pearson
|
60 |
+
value: 82.22889478671283
|
61 |
+
- type: cosine_spearman
|
62 |
+
value: 83.0533648934447
|
63 |
+
- type: euclidean_pearson
|
64 |
+
value: 81.15891941165452
|
65 |
+
- type: euclidean_spearman
|
66 |
+
value: 82.14034597386936
|
67 |
+
- type: main_score
|
68 |
+
value: 83.0533648934447
|
69 |
+
- type: manhattan_pearson
|
70 |
+
value: 81.17463976232014
|
71 |
+
- type: manhattan_spearman
|
72 |
+
value: 82.09804987736345
|
73 |
+
- type: pearson
|
74 |
+
value: 82.22889389569819
|
75 |
+
- type: spearman
|
76 |
+
value: 83.0529662284269
|
77 |
+
task:
|
78 |
+
type: STS
|
79 |
+
- dataset:
|
80 |
+
config: en-en
|
81 |
+
name: MTEB STS17 (en-en)
|
82 |
+
revision: faeb762787bd10488a50c8b5be4a3b82e411949c
|
83 |
+
split: test
|
84 |
+
type: mteb/sts17-crosslingual-sts
|
85 |
+
metrics:
|
86 |
+
- type: cosine_pearson
|
87 |
+
value: 87.17053120821998
|
88 |
+
- type: cosine_spearman
|
89 |
+
value: 87.05959159411456
|
90 |
+
- type: euclidean_pearson
|
91 |
+
value: 87.63706739480517
|
92 |
+
- type: euclidean_spearman
|
93 |
+
value: 87.7675347222274
|
94 |
+
- type: main_score
|
95 |
+
value: 87.05959159411456
|
96 |
+
- type: manhattan_pearson
|
97 |
+
value: 87.7006832512623
|
98 |
+
- type: manhattan_spearman
|
99 |
+
value: 87.80128473941168
|
100 |
+
- type: pearson
|
101 |
+
value: 87.17053012311975
|
102 |
+
- type: spearman
|
103 |
+
value: 87.05959159411456
|
104 |
+
task:
|
105 |
+
type: STS
|
106 |
+
Language:
|
107 |
+
- ar
|
108 |
+
- en
|
109 |
+
language:
|
110 |
+
- ar
|
111 |
+
- en
|
112 |
+
---
|
113 |
|
114 |
# SentenceTransformer based on FacebookAI/xlm-roberta-large
|
115 |
|
116 |
+
This is a Multilingual (Arabic-English) [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
117 |
+
|
118 |
+
The model can handle both languages separately pretty well but also interchangeably which opens many possibilities for different flexible applications but also for researchers who want to further develop arabic models :)
|
119 |
+
|
120 |
+
The metrics from MTEB are good but don't focus completely on them anyway, test the model first and see if it works for you.
|
121 |
|
122 |
## Model Details
|
123 |
|
|
|
229 |
archivePrefix={arXiv},
|
230 |
primaryClass={cs.CL}
|
231 |
}
|
232 |
+
```
|