imvladikon
commited on
Commit
โข
77163e1
1
Parent(s):
f28a8ac
Update README.md
Browse files
README.md
CHANGED
@@ -9,13 +9,13 @@ language:
|
|
9 |
- he
|
10 |
library_name: sentence-transformers
|
11 |
---
|
12 |
-
# WIP!!!
|
13 |
|
14 |
-
|
|
|
15 |
|
16 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
17 |
|
18 |
-
|
19 |
|
20 |
## Usage (Sentence-Transformers)
|
21 |
|
@@ -29,11 +29,16 @@ Then you can use the model like this:
|
|
29 |
|
30 |
```python
|
31 |
from sentence_transformers import SentenceTransformer
|
32 |
-
|
33 |
|
34 |
-
|
|
|
|
|
35 |
embeddings = model.encode(sentences)
|
36 |
-
|
|
|
|
|
|
|
37 |
```
|
38 |
|
39 |
|
@@ -42,8 +47,9 @@ print(embeddings)
|
|
42 |
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
|
43 |
|
44 |
```python
|
45 |
-
from transformers import AutoTokenizer, AutoModel
|
46 |
import torch
|
|
|
|
|
47 |
|
48 |
|
49 |
#Mean Pooling - Take attention mask into account for correct averaging
|
@@ -54,11 +60,11 @@ def mean_pooling(model_output, attention_mask):
|
|
54 |
|
55 |
|
56 |
# Sentences we want sentence embeddings for
|
57 |
-
sentences = [
|
58 |
|
59 |
# Load model from HuggingFace Hub
|
60 |
-
tokenizer = AutoTokenizer.from_pretrained('
|
61 |
-
model = AutoModel.from_pretrained('
|
62 |
|
63 |
# Tokenize sentences
|
64 |
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
@@ -70,16 +76,14 @@ with torch.no_grad():
|
|
70 |
# Perform pooling. In this case, mean pooling.
|
71 |
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
|
72 |
|
73 |
-
|
74 |
-
print(sentence_embeddings)
|
75 |
```
|
76 |
|
77 |
|
78 |
|
79 |
## Evaluation Results
|
80 |
|
81 |
-
<!--- Describe how your model was evaluated -->
|
82 |
-
|
83 |
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
|
84 |
|
85 |
|
@@ -129,4 +133,3 @@ SentenceTransformer(
|
|
129 |
|
130 |
## Citing & Authors
|
131 |
|
132 |
-
<!--- Describe where people can find more information -->
|
|
|
9 |
- he
|
10 |
library_name: sentence-transformers
|
11 |
---
|
|
|
12 |
|
13 |
+
|
14 |
+
# imvladikon/sentence-transformers-alephbert[WIP]
|
15 |
|
16 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
17 |
|
18 |
+
|
19 |
|
20 |
## Usage (Sentence-Transformers)
|
21 |
|
|
|
29 |
|
30 |
```python
|
31 |
from sentence_transformers import SentenceTransformer
|
32 |
+
from sentence_transformers.util import cos_sim
|
33 |
|
34 |
+
sentences = ["ืื ืืื ืฉืืืื ืืจืืืช ืืช ืืืืจืืข ืฉืืชืงืืื.", "ืืจืืืช ืืช ืืืืจืืข ืฉืืชืงืืื ืืื ืืืื ืืฉืื ืืื."]
|
35 |
+
|
36 |
+
model = SentenceTransformer('imvladikon/sentence-transformers-alephbert')
|
37 |
embeddings = model.encode(sentences)
|
38 |
+
|
39 |
+
|
40 |
+
print(cos_sim(*tuple(embeddings)).item())
|
41 |
+
# 0.883316159248352
|
42 |
```
|
43 |
|
44 |
|
|
|
47 |
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
|
48 |
|
49 |
```python
|
|
|
50 |
import torch
|
51 |
+
from torch import nn
|
52 |
+
from transformers import AutoTokenizer, AutoModel
|
53 |
|
54 |
|
55 |
#Mean Pooling - Take attention mask into account for correct averaging
|
|
|
60 |
|
61 |
|
62 |
# Sentences we want sentence embeddings for
|
63 |
+
sentences = ["ืื ืืื ืฉืืืื ืืจืืืช ืืช ืืืืจืืข ืฉืืชืงืืื.", "ืืจืืืช ืืช ืืืืจืืข ืฉืืชืงืืื ืืื ืืืื ืืฉืื ืืื."]
|
64 |
|
65 |
# Load model from HuggingFace Hub
|
66 |
+
tokenizer = AutoTokenizer.from_pretrained('imvladikon/sentence-transformers-alephbert')
|
67 |
+
model = AutoModel.from_pretrained('imvladikon/sentence-transformers-alephbert')
|
68 |
|
69 |
# Tokenize sentences
|
70 |
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
|
|
76 |
# Perform pooling. In this case, mean pooling.
|
77 |
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
|
78 |
|
79 |
+
cos_sim = nn.CosineSimilarity(dim=0, eps=1e-6)
|
80 |
+
print(cos_sim(sentence_embeddings[0], sentence_embeddings[1]).item())
|
81 |
```
|
82 |
|
83 |
|
84 |
|
85 |
## Evaluation Results
|
86 |
|
|
|
|
|
87 |
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
|
88 |
|
89 |
|
|
|
133 |
|
134 |
## Citing & Authors
|
135 |
|
|