imvladikon commited on
Commit
77163e1
โ€ข
1 Parent(s): f28a8ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -15
README.md CHANGED
@@ -9,13 +9,13 @@ language:
9
  - he
10
  library_name: sentence-transformers
11
  ---
12
- # WIP!!!
13
 
14
- # {MODEL_NAME}
 
15
 
16
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
17
 
18
- <!--- Describe your model here -->
19
 
20
  ## Usage (Sentence-Transformers)
21
 
@@ -29,11 +29,16 @@ Then you can use the model like this:
29
 
30
  ```python
31
  from sentence_transformers import SentenceTransformer
32
- sentences = ["This is an example sentence", "Each sentence is converted"]
33
 
34
- model = SentenceTransformer('{MODEL_NAME}')
 
 
35
  embeddings = model.encode(sentences)
36
- print(embeddings)
 
 
 
37
  ```
38
 
39
 
@@ -42,8 +47,9 @@ print(embeddings)
42
  Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
43
 
44
  ```python
45
- from transformers import AutoTokenizer, AutoModel
46
  import torch
 
 
47
 
48
 
49
  #Mean Pooling - Take attention mask into account for correct averaging
@@ -54,11 +60,11 @@ def mean_pooling(model_output, attention_mask):
54
 
55
 
56
  # Sentences we want sentence embeddings for
57
- sentences = ['This is an example sentence', 'Each sentence is converted']
58
 
59
  # Load model from HuggingFace Hub
60
- tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
61
- model = AutoModel.from_pretrained('{MODEL_NAME}')
62
 
63
  # Tokenize sentences
64
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -70,16 +76,14 @@ with torch.no_grad():
70
  # Perform pooling. In this case, mean pooling.
71
  sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
72
 
73
- print("Sentence embeddings:")
74
- print(sentence_embeddings)
75
  ```
76
 
77
 
78
 
79
  ## Evaluation Results
80
 
81
- <!--- Describe how your model was evaluated -->
82
-
83
  For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
84
 
85
 
@@ -129,4 +133,3 @@ SentenceTransformer(
129
 
130
  ## Citing & Authors
131
 
132
- <!--- Describe where people can find more information -->
 
9
  - he
10
  library_name: sentence-transformers
11
  ---
 
12
 
13
+
14
+ # imvladikon/sentence-transformers-alephbert[WIP]
15
 
16
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
17
 
18
+
19
 
20
  ## Usage (Sentence-Transformers)
21
 
 
29
 
30
  ```python
31
  from sentence_transformers import SentenceTransformer
32
+ from sentence_transformers.util import cos_sim
33
 
34
+ sentences = ["ื”ื ื”ื™ื• ืฉืžื—ื™ื ืœืจืื•ืช ืืช ื”ืื™ืจื•ืข ืฉื”ืชืงื™ื™ื.", "ืœืจืื•ืช ืืช ื”ืื™ืจื•ืข ืฉื”ืชืงื™ื™ื ื”ื™ื” ืžืื•ื“ ืžืฉืžื— ืœื”ื."]
35
+
36
+ model = SentenceTransformer('imvladikon/sentence-transformers-alephbert')
37
  embeddings = model.encode(sentences)
38
+
39
+
40
+ print(cos_sim(*tuple(embeddings)).item())
41
+ # 0.883316159248352
42
  ```
43
 
44
 
 
47
  Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
48
 
49
  ```python
 
50
  import torch
51
+ from torch import nn
52
+ from transformers import AutoTokenizer, AutoModel
53
 
54
 
55
  #Mean Pooling - Take attention mask into account for correct averaging
 
60
 
61
 
62
  # Sentences we want sentence embeddings for
63
+ sentences = ["ื”ื ื”ื™ื• ืฉืžื—ื™ื ืœืจืื•ืช ืืช ื”ืื™ืจื•ืข ืฉื”ืชืงื™ื™ื.", "ืœืจืื•ืช ืืช ื”ืื™ืจื•ืข ืฉื”ืชืงื™ื™ื ื”ื™ื” ืžืื•ื“ ืžืฉืžื— ืœื”ื."]
64
 
65
  # Load model from HuggingFace Hub
66
+ tokenizer = AutoTokenizer.from_pretrained('imvladikon/sentence-transformers-alephbert')
67
+ model = AutoModel.from_pretrained('imvladikon/sentence-transformers-alephbert')
68
 
69
  # Tokenize sentences
70
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
76
  # Perform pooling. In this case, mean pooling.
77
  sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
78
 
79
+ cos_sim = nn.CosineSimilarity(dim=0, eps=1e-6)
80
+ print(cos_sim(sentence_embeddings[0], sentence_embeddings[1]).item())
81
  ```
82
 
83
 
84
 
85
  ## Evaluation Results
86
 
 
 
87
  For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
88
 
89
 
 
133
 
134
  ## Citing & Authors
135