SpirinEgor
commited on
Commit
•
011da57
1
Parent(s):
ad33c33
Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,24 @@ tags:
|
|
5 |
- sentence-transformers
|
6 |
- feature-extraction
|
7 |
- sentence-similarity
|
8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
# USER-base
|
@@ -136,7 +153,11 @@ During model development, we additional collect 2 datasets:
|
|
136 |
**Total positive pairs:** 3,352,653
|
137 |
**Total negative pairs:** 792,644 (negative pairs from AIINLI, MIRACL, deepvk/ru-WANLI, deepvk/ru-HNP)
|
138 |
|
139 |
-
For all labeled datasets, we only use its training set for fine-tuning.
|
|
|
|
|
|
|
|
|
140 |
|
141 |
## Experiments
|
142 |
|
|
|
5 |
- sentence-transformers
|
6 |
- feature-extraction
|
7 |
- sentence-similarity
|
8 |
+
license: apache-2.0
|
9 |
+
datasets:
|
10 |
+
- Shitao/bge-m3-data
|
11 |
+
- RussianNLP/russian_super_glue
|
12 |
+
- reciTAL/mlsum
|
13 |
+
- Helsinki-NLP/opus-100
|
14 |
+
- Helsinki-NLP/bible_para
|
15 |
+
- d0rj/rudetoxifier_data_detox
|
16 |
+
- s-nlp/ru_paradetox
|
17 |
+
- Milana/russian_keywords
|
18 |
+
- IlyaGusev/gazeta
|
19 |
+
- d0rj/gsm8k-ru
|
20 |
+
- bragovo/dsum_ru
|
21 |
+
- CarlBrendt/Summ_Dialog_News
|
22 |
+
- deepvk/ru-HNP
|
23 |
+
- deepvk/ru-HNP
|
24 |
+
language:
|
25 |
+
- ru
|
26 |
---
|
27 |
|
28 |
# USER-base
|
|
|
153 |
**Total positive pairs:** 3,352,653
|
154 |
**Total negative pairs:** 792,644 (negative pairs from AIINLI, MIRACL, deepvk/ru-WANLI, deepvk/ru-HNP)
|
155 |
|
156 |
+
For all labeled datasets, we only use its training set for fine-tuning.
|
157 |
+
For datasets Gazeta, Mlsum, Xlsum: pairs (title/text) and (title/summary) are combined and used as asymmetric data.
|
158 |
+
|
159 |
+
|
160 |
+
`AllNLI` is an translated to Russian combination of SNLI, MNLI, and ANLI.
|
161 |
|
162 |
## Experiments
|
163 |
|