dangvantuan
commited on
Commit
•
d7b1cc5
1
Parent(s):
e433e63
Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ metrics:
|
|
17 |
|
18 |
# [bilingual-embedding-base](https://huggingface.co/Lajavaness/bilingual-embedding-base)
|
19 |
|
20 |
-
|
21 |
|
22 |
|
23 |
## Full Model Architecture
|
@@ -37,7 +37,7 @@ SentenceTransformer(
|
|
37 |
- Dataset: [STSB-fr and en]
|
38 |
- Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.
|
39 |
### Stage 4: Advanced Augmentation Fine-tuning
|
40 |
-
- Dataset: STSB
|
41 |
- Method: Employed an advanced strategy using [Augmented SBERT](https://arxiv.org/abs/2010.08240) with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.
|
42 |
|
43 |
|
@@ -53,7 +53,6 @@ Then you can use the model like this:
|
|
53 |
|
54 |
```python
|
55 |
from sentence_transformers import SentenceTransformer
|
56 |
-
from pyvi.ViTokenizer import tokenize
|
57 |
|
58 |
sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
|
59 |
|
|
|
17 |
|
18 |
# [bilingual-embedding-base](https://huggingface.co/Lajavaness/bilingual-embedding-base)
|
19 |
|
20 |
+
Bilingual-embedding is the Embedding Model for bilingual language: french and english. This model is a specialized sentence-embedding trained specifically for the bilingual language, leveraging the robust capabilities of [XLM-RoBERTa](https://huggingface.co/FacebookAI/xlm-roberta-base), a pre-trained language model based on the [XLM-RoBERTa](https://huggingface.co/FacebookAI/xlm-roberta-base) architecture. The model utilizes xlm-roberta to encode english-french sentences into a 1024-dimensional vector space, facilitating a wide range of applications from semantic search to text clustering. The embeddings capture the nuanced meanings of english-french sentences, reflecting both the lexical and contextual layers of the language.
|
21 |
|
22 |
|
23 |
## Full Model Architecture
|
|
|
37 |
- Dataset: [STSB-fr and en]
|
38 |
- Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.
|
39 |
### Stage 4: Advanced Augmentation Fine-tuning
|
40 |
+
- Dataset: STSB with generate [silver sample from gold sample](https://www.sbert.net/examples/training/data_augmentation/README.html)
|
41 |
- Method: Employed an advanced strategy using [Augmented SBERT](https://arxiv.org/abs/2010.08240) with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.
|
42 |
|
43 |
|
|
|
53 |
|
54 |
```python
|
55 |
from sentence_transformers import SentenceTransformer
|
|
|
56 |
|
57 |
sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
|
58 |
|