omarelshehy commited on
Commit
f1a407c
1 Parent(s): ebeede7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -110
README.md CHANGED
@@ -1,115 +1,123 @@
1
- ---
2
- base_model: FacebookAI/xlm-roberta-large
3
- library_name: sentence-transformers
4
- metrics:
5
- - pearson_cosine
6
- - spearman_cosine
7
- - pearson_manhattan
8
- - spearman_manhattan
9
- - pearson_euclidean
10
- - spearman_euclidean
11
- - pearson_dot
12
- - spearman_dot
13
- - pearson_max
14
- - spearman_max
15
- pipeline_tag: sentence-similarity
16
- tags:
17
- - sentence-transformers
18
- - sentence-similarity
19
- - feature-extraction
20
- - mteb
21
- model-index:
22
- - name: omarelshehy/Arabic-English-Matryoshka-STS
23
- results:
24
- - dataset:
25
- config: en-ar
26
- name: MTEB STS17 (en-ar)
27
- revision: faeb762787bd10488a50c8b5be4a3b82e411949c
28
- split: test
29
- type: mteb/sts17-crosslingual-sts
30
- metrics:
31
- - type: cosine_pearson
32
- value: 79.79480510851795
33
- - type: cosine_spearman
34
- value: 79.67609346073252
35
- - type: euclidean_pearson
36
- value: 81.64087935350051
37
- - type: euclidean_spearman
38
- value: 80.52588414802709
39
- - type: main_score
40
- value: 79.67609346073252
41
- - type: manhattan_pearson
42
- value: 81.57042957417305
43
- - type: manhattan_spearman
44
- value: 80.44331526051143
45
- - type: pearson
46
- value: 79.79480418294698
47
- - type: spearman
48
- value: 79.67609346073252
49
- task:
50
- type: STS
51
- - dataset:
52
- config: ar-ar
53
- name: MTEB STS17 (ar-ar)
54
- revision: faeb762787bd10488a50c8b5be4a3b82e411949c
55
- split: test
56
- type: mteb/sts17-crosslingual-sts
57
- metrics:
58
- - type: cosine_pearson
59
- value: 82.22889478671283
60
- - type: cosine_spearman
61
- value: 83.0533648934447
62
- - type: euclidean_pearson
63
- value: 81.15891941165452
64
- - type: euclidean_spearman
65
- value: 82.14034597386936
66
- - type: main_score
67
- value: 83.0533648934447
68
- - type: manhattan_pearson
69
- value: 81.17463976232014
70
- - type: manhattan_spearman
71
- value: 82.09804987736345
72
- - type: pearson
73
- value: 82.22889389569819
74
- - type: spearman
75
- value: 83.0529662284269
76
- task:
77
- type: STS
78
- - dataset:
79
- config: en-en
80
- name: MTEB STS17 (en-en)
81
- revision: faeb762787bd10488a50c8b5be4a3b82e411949c
82
- split: test
83
- type: mteb/sts17-crosslingual-sts
84
- metrics:
85
- - type: cosine_pearson
86
- value: 87.17053120821998
87
- - type: cosine_spearman
88
- value: 87.05959159411456
89
- - type: euclidean_pearson
90
- value: 87.63706739480517
91
- - type: euclidean_spearman
92
- value: 87.7675347222274
93
- - type: main_score
94
- value: 87.05959159411456
95
- - type: manhattan_pearson
96
- value: 87.7006832512623
97
- - type: manhattan_spearman
98
- value: 87.80128473941168
99
- - type: pearson
100
- value: 87.17053012311975
101
- - type: spearman
102
- value: 87.05959159411456
103
- task:
104
- type: STS
105
- Language:
106
- - ar
107
- - en
108
- ---
 
 
 
 
109
 
110
  # SentenceTransformer based on FacebookAI/xlm-roberta-large
111
 
112
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 
 
 
 
113
 
114
  ## Model Details
115
 
@@ -221,4 +229,4 @@ You can finetune this model on your own dataset.
221
  archivePrefix={arXiv},
222
  primaryClass={cs.CL}
223
  }
224
- ```
 
1
+ ---
2
+ base_model: FacebookAI/xlm-roberta-large
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - pearson_cosine
6
+ - spearman_cosine
7
+ - pearson_manhattan
8
+ - spearman_manhattan
9
+ - pearson_euclidean
10
+ - spearman_euclidean
11
+ - pearson_dot
12
+ - spearman_dot
13
+ - pearson_max
14
+ - spearman_max
15
+ pipeline_tag: sentence-similarity
16
+ tags:
17
+ - sentence-transformers
18
+ - sentence-similarity
19
+ - feature-extraction
20
+ - mteb
21
+ - bilingual
22
+ model-index:
23
+ - name: omarelshehy/Arabic-English-Matryoshka-STS
24
+ results:
25
+ - dataset:
26
+ config: en-ar
27
+ name: MTEB STS17 (en-ar)
28
+ revision: faeb762787bd10488a50c8b5be4a3b82e411949c
29
+ split: test
30
+ type: mteb/sts17-crosslingual-sts
31
+ metrics:
32
+ - type: cosine_pearson
33
+ value: 79.79480510851795
34
+ - type: cosine_spearman
35
+ value: 79.67609346073252
36
+ - type: euclidean_pearson
37
+ value: 81.64087935350051
38
+ - type: euclidean_spearman
39
+ value: 80.52588414802709
40
+ - type: main_score
41
+ value: 79.67609346073252
42
+ - type: manhattan_pearson
43
+ value: 81.57042957417305
44
+ - type: manhattan_spearman
45
+ value: 80.44331526051143
46
+ - type: pearson
47
+ value: 79.79480418294698
48
+ - type: spearman
49
+ value: 79.67609346073252
50
+ task:
51
+ type: STS
52
+ - dataset:
53
+ config: ar-ar
54
+ name: MTEB STS17 (ar-ar)
55
+ revision: faeb762787bd10488a50c8b5be4a3b82e411949c
56
+ split: test
57
+ type: mteb/sts17-crosslingual-sts
58
+ metrics:
59
+ - type: cosine_pearson
60
+ value: 82.22889478671283
61
+ - type: cosine_spearman
62
+ value: 83.0533648934447
63
+ - type: euclidean_pearson
64
+ value: 81.15891941165452
65
+ - type: euclidean_spearman
66
+ value: 82.14034597386936
67
+ - type: main_score
68
+ value: 83.0533648934447
69
+ - type: manhattan_pearson
70
+ value: 81.17463976232014
71
+ - type: manhattan_spearman
72
+ value: 82.09804987736345
73
+ - type: pearson
74
+ value: 82.22889389569819
75
+ - type: spearman
76
+ value: 83.0529662284269
77
+ task:
78
+ type: STS
79
+ - dataset:
80
+ config: en-en
81
+ name: MTEB STS17 (en-en)
82
+ revision: faeb762787bd10488a50c8b5be4a3b82e411949c
83
+ split: test
84
+ type: mteb/sts17-crosslingual-sts
85
+ metrics:
86
+ - type: cosine_pearson
87
+ value: 87.17053120821998
88
+ - type: cosine_spearman
89
+ value: 87.05959159411456
90
+ - type: euclidean_pearson
91
+ value: 87.63706739480517
92
+ - type: euclidean_spearman
93
+ value: 87.7675347222274
94
+ - type: main_score
95
+ value: 87.05959159411456
96
+ - type: manhattan_pearson
97
+ value: 87.7006832512623
98
+ - type: manhattan_spearman
99
+ value: 87.80128473941168
100
+ - type: pearson
101
+ value: 87.17053012311975
102
+ - type: spearman
103
+ value: 87.05959159411456
104
+ task:
105
+ type: STS
106
+ Language:
107
+ - ar
108
+ - en
109
+ language:
110
+ - ar
111
+ - en
112
+ ---
113
 
114
  # SentenceTransformer based on FacebookAI/xlm-roberta-large
115
 
116
+ This is a Multilingual (Arabic-English) [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
117
+
118
+ The model can handle both languages separately pretty well but also interchangeably which opens many possibilities for different flexible applications but also for researchers who want to further develop arabic models :)
119
+
120
+ The metrics from MTEB are good but don't focus completely on them anyway, test the model first and see if it works for you.
121
 
122
  ## Model Details
123
 
 
229
  archivePrefix={arXiv},
230
  primaryClass={cs.CL}
231
  }
232
+ ```