update: capital of name
Browse files- README.md +3 -3
- README_JA.md +2 -2
README.md
CHANGED
@@ -23,12 +23,12 @@ datasets:
|
|
23 |
|
24 |
---
|
25 |
|
26 |
-
# Sarashina-
|
27 |
|
28 |
**[日本語のREADME/Japanese README](https://huggingface.co/sbintuitions/sarashina-embedding-v1-1b/blob/main/README_JA.md)**
|
29 |
|
30 |
|
31 |
-
"Sarashina-
|
32 |
We trained this model with multi-stage contrastive learning. We achieved the state-of-the-art average score in the average of 16 datasets in [JMTEB](https://huggingface.co/datasets/sbintuitions/JMTEB)(Japanese Massive Text Embedding Benchmark).
|
33 |
|
34 |
This model maps sentences & paragraphs to a 1792-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
@@ -95,7 +95,7 @@ print(similarities.shape)
|
|
95 |
|
96 |
## Training
|
97 |
|
98 |
-
"Sarashina-
|
99 |
|
100 |
### Stage 1: Weakly-supervised Learning
|
101 |
|
|
|
23 |
|
24 |
---
|
25 |
|
26 |
+
# Sarashina-Embedding-v1-1B
|
27 |
|
28 |
**[日本語のREADME/Japanese README](https://huggingface.co/sbintuitions/sarashina-embedding-v1-1b/blob/main/README_JA.md)**
|
29 |
|
30 |
|
31 |
+
"Sarashina-Embedding-v1-1B" is a Japanese text embedding model, based on the 1.2B-parameter Japansese LLM "Sarashina".
|
32 |
We trained this model with multi-stage contrastive learning. We achieved the state-of-the-art average score in the average of 16 datasets in [JMTEB](https://huggingface.co/datasets/sbintuitions/JMTEB)(Japanese Massive Text Embedding Benchmark).
|
33 |
|
34 |
This model maps sentences & paragraphs to a 1792-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
|
|
95 |
|
96 |
## Training
|
97 |
|
98 |
+
"Sarashina-Embedding-v1-1B" is created through the following two-stage learning process:
|
99 |
|
100 |
### Stage 1: Weakly-supervised Learning
|
101 |
|
README_JA.md
CHANGED
@@ -20,7 +20,7 @@ datasets:
|
|
20 |
- SkelterLabsInc/JaQuAD
|
21 |
---
|
22 |
|
23 |
-
# Sarashina-
|
24 |
|
25 |
「Sarashina-embedding-v1-1b」は、1.2Bパラメータの日本語LLM「Sarashina」をベースにした日本語テキスト埋め込みモデルです。
|
26 |
|
@@ -89,7 +89,7 @@ print(similarities.shape)
|
|
89 |
|
90 |
## 学習
|
91 |
|
92 |
-
"Sarashina-
|
93 |
|
94 |
### Stage 1: 弱教師あり学習
|
95 |
|
|
|
20 |
- SkelterLabsInc/JaQuAD
|
21 |
---
|
22 |
|
23 |
+
# Sarashina-Embedding-v1-1B
|
24 |
|
25 |
「Sarashina-embedding-v1-1b」は、1.2Bパラメータの日本語LLM「Sarashina」をベースにした日本語テキスト埋め込みモデルです。
|
26 |
|
|
|
89 |
|
90 |
## 学習
|
91 |
|
92 |
+
"Sarashina-Embedding-v1-1B"は、以下の2段階の学習ステージによって行われています。
|
93 |
|
94 |
### Stage 1: 弱教師あり学習
|
95 |
|