SpirinEgor commited on
Commit
011da57
1 Parent(s): ad33c33

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -2
README.md CHANGED
@@ -5,7 +5,24 @@ tags:
5
  - sentence-transformers
6
  - feature-extraction
7
  - sentence-similarity
8
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  # USER-base
@@ -136,7 +153,11 @@ During model development, we additional collect 2 datasets:
136
  **Total positive pairs:** 3,352,653
137
  **Total negative pairs:** 792,644 (negative pairs from AIINLI, MIRACL, deepvk/ru-WANLI, deepvk/ru-HNP)
138
 
139
- For all labeled datasets, we only use its training set for fine-tuning. For datasets Gazeta, Mlsum, Xlsum: pairs (title/text) and (title/summary) are combined and used as asymmetric data. AllNLI is a combination of SNLI, MNLI and ANLI.
 
 
 
 
140
 
141
  ## Experiments
142
 
 
5
  - sentence-transformers
6
  - feature-extraction
7
  - sentence-similarity
8
+ license: apache-2.0
9
+ datasets:
10
+ - Shitao/bge-m3-data
11
+ - RussianNLP/russian_super_glue
12
+ - reciTAL/mlsum
13
+ - Helsinki-NLP/opus-100
14
+ - Helsinki-NLP/bible_para
15
+ - d0rj/rudetoxifier_data_detox
16
+ - s-nlp/ru_paradetox
17
+ - Milana/russian_keywords
18
+ - IlyaGusev/gazeta
19
+ - d0rj/gsm8k-ru
20
+ - bragovo/dsum_ru
21
+ - CarlBrendt/Summ_Dialog_News
22
+ - deepvk/ru-HNP
23
+ - deepvk/ru-HNP
24
+ language:
25
+ - ru
26
  ---
27
 
28
  # USER-base
 
153
  **Total positive pairs:** 3,352,653
154
  **Total negative pairs:** 792,644 (negative pairs from AIINLI, MIRACL, deepvk/ru-WANLI, deepvk/ru-HNP)
155
 
156
+ For all labeled datasets, we only use its training set for fine-tuning.
157
+ For datasets Gazeta, Mlsum, Xlsum: pairs (title/text) and (title/summary) are combined and used as asymmetric data.
158
+
159
+
160
+ `AllNLI` is an translated to Russian combination of SNLI, MNLI, and ANLI.
161
 
162
  ## Experiments
163