innocent-charles commited on
Commit
5d842c7
1 Parent(s): 0837bf6

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -1,35 +1,10 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
  *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
2
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
 
 
 
 
4
  *.h5 filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  *.tflite filter=lfs diff=lfs merge=lfs -text
6
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.ot filter=lfs diff=lfs merge=lfs -text
8
+ *.onnx filter=lfs diff=lfs merge=lfs -text
9
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
10
+ model.safetensors filter=lfs diff=lfs merge=lfs -text
 
2_Dense/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a548639f4e10e8b96be6a4846f0932ca2d011d491b37489a6b4751a3c096e49d
3
- size 132
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:981a518204a50b1a68e27c5a2539aad511e6bf5cfaba409deb8cea8605f776eb
3
+ size 2362560
2_Dense/pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c4af843f3f778124777c11604c9e22c6afdca8c27764a44961099e981cf6355d
3
- size 132
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06fb85120e40adf0ab188c4f0cc7684f702cb2023532947d1b85f325b0a3645c
3
+ size 2363431
README.md CHANGED
@@ -119,14 +119,9 @@ library_name: sentence-transformers
119
  license: apache-2.0
120
  ---
121
 
122
- # AviLaBSE
123
- This is a port of the [LaBSE](https://tfhub.dev/google/LaBSE/1) model to PyTorch. Language-agnostic BERT Sentence Encoder (LaBSE) is a BERT-based model trained for sentence embedding for 109 languages. It can be used to map 109 languages to a shared vector space. The pre-training process combines masked language modeling with translation language modeling. The model is useful for getting multilingual sentence embeddings and for bi-text retrieval.
124
 
125
- - Model: [HuggingFace's model hub](https://huggingface.co/sartifyllc/AviLaBSE).
126
- - Paper: [arXiv](https://arxiv.org/abs/2007.01852).
127
- - Original model: [TensorFlow Hub](https://tfhub.dev/google/LaBSE/2).
128
- - Blog post: [Google AI Blog](https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html).
129
- - Conversion from TensorFlow to PyTorch: [GitHub](https://github.com/sartify).
130
 
131
  ## Usage (Sentence-Transformers)
132
 
@@ -142,74 +137,11 @@ Then you can use the model like this:
142
  from sentence_transformers import SentenceTransformer
143
  sentences = ["This is an example sentence", "Each sentence is converted"]
144
 
145
- model = SentenceTransformer('sartifyllc/AviLaBSE')
146
  embeddings = model.encode(sentences)
147
  print(embeddings)
148
  ```
149
 
150
- ```python
151
- import torch
152
- from transformers import BertModel, BertTokenizerFast
153
-
154
- tokenizer = BertTokenizerFast.from_pretrained("sartifyllc/AviLaBSE")
155
- model = BertModel.from_pretrained("sartifyllc/AviLaBSE")
156
- model = model.eval()
157
-
158
- english_sentences = [
159
- "dog",
160
- "Puppies are nice.",
161
- "I enjoy taking long walks along the beach with my dog.",
162
- ]
163
- english_inputs = tokenizer(english_sentences, return_tensors="pt", padding=True)
164
-
165
- with torch.no_grad():
166
- english_outputs = model(**english_inputs)
167
- ```
168
-
169
- To get the sentence embeddings, use the pooler output:
170
-
171
- ```python
172
- english_embeddings = english_outputs.pooler_output
173
- ```
174
-
175
- Output for other languages:
176
-
177
- ```python
178
- italian_sentences = [
179
- "cane",
180
- "I cuccioli sono carini.",
181
- "Mi piace fare lunghe passeggiate lungo la spiaggia con il mio cane.",
182
- ]
183
- japanese_sentences = ["犬", "子犬はいいです", "私は犬と一緒にビーチを散歩するのが好きです"]
184
- italian_inputs = tokenizer(italian_sentences, return_tensors="pt", padding=True)
185
- japanese_inputs = tokenizer(japanese_sentences, return_tensors="pt", padding=True)
186
-
187
- with torch.no_grad():
188
- italian_outputs = model(**italian_inputs)
189
- japanese_outputs = model(**japanese_inputs)
190
-
191
- italian_embeddings = italian_outputs.pooler_output
192
- japanese_embeddings = japanese_outputs.pooler_output
193
- ```
194
-
195
- For similarity between sentences, an L2-norm is recommended before calculating the similarity:
196
-
197
- ```python
198
- import torch.nn.functional as F
199
-
200
- def similarity(embeddings_1, embeddings_2):
201
- normalized_embeddings_1 = F.normalize(embeddings_1, p=2)
202
- normalized_embeddings_2 = F.normalize(embeddings_2, p=2)
203
- return torch.matmul(
204
- normalized_embeddings_1, normalized_embeddings_2.transpose(0, 1)
205
- )
206
-
207
-
208
- print(similarity(english_embeddings, italian_embeddings))
209
- print(similarity(english_embeddings, japanese_embeddings))
210
- print(similarity(italian_embeddings, japanese_embeddings))
211
- ```
212
-
213
 
214
 
215
  ## Evaluation Results
@@ -232,4 +164,5 @@ SentenceTransformer(
232
 
233
  ## Citing & Authors
234
 
235
- Have a look at [LaBSE](https://tfhub.dev/google/LaBSE/2) for the respective publication that describes LaBSE.
 
 
119
  license: apache-2.0
120
  ---
121
 
122
+ # LaBSE
123
+ This is a port of the [LaBSE](https://tfhub.dev/google/LaBSE/1) model to PyTorch. It can be used to map 109 languages to a shared vector space.
124
 
 
 
 
 
 
125
 
126
  ## Usage (Sentence-Transformers)
127
 
 
137
  from sentence_transformers import SentenceTransformer
138
  sentences = ["This is an example sentence", "Each sentence is converted"]
139
 
140
+ model = SentenceTransformer('sentence-transformers/LaBSE')
141
  embeddings = model.encode(sentences)
142
  print(embeddings)
143
  ```
144
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
 
146
 
147
  ## Evaluation Results
 
164
 
165
  ## Citing & Authors
166
 
167
+ Have a look at [LaBSE](https://tfhub.dev/google/LaBSE/1) for the respective publication that describes LaBSE.
168
+
flax_model.msgpack CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:211fcbb6fed2aedfa31e7da2ecd7ac485ac8010de6e18afbd8c00f722b18c8cc
3
- size 135
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4cbe50771a6b147d2da0beb6da1d80908a706cec2e2e06a09873649ed183e884
3
+ size 1883714625
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6566b39f08255d6abd32052675a6534fc20f5c262ce06e2ab5862dbd01cf7b7d
3
- size 135
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77d8e1f2dbab6eb5d3c261ce9d3dbf1e3c69e02938c95f934f94f42c22dfa31f
3
+ size 1883734344
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2e6ff92c0dc1e0e18a7abf27a78921b7cd1a8c51373b44970e718efc81aada64
3
- size 135
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9e7daf739f87c2168a6d1baffdae5782eceb03eb6de61950284a925234c6865
3
+ size 1883785969
tf_model.h5 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e0ebd757dea4709bb5d69d66ed94e6e46f5275d67ac5f04d791b7854106a885
3
- size 135
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e971d0404bba02ce8ab4568bc9625a74e6fbe99c7cdc927d5f3095597a70c55d
3
+ size 1883974632