Update README.md
Browse files
README.md
CHANGED
@@ -10,15 +10,19 @@ tags:
|
|
10 |
---
|
11 |
|
12 |
# hku-nlp/instructor-xl
|
13 |
-
This is a general embedding model: It maps
|
14 |
-
|
15 |
-
|
16 |
-
|
|
|
|
|
17 |
git clone https://github.com/HKUNLP/instructor-embedding
|
18 |
cd sentence-transformers
|
19 |
pip install -e .
|
20 |
```
|
21 |
-
|
|
|
|
|
22 |
```python
|
23 |
from sentence_transformers import SentenceTransformer
|
24 |
sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
|
@@ -26,4 +30,18 @@ instruction = "Represent the Science title; Input:"
|
|
26 |
model = SentenceTransformer('hku-nlp/instructor-xl')
|
27 |
embeddings = model.encode([[instruction,sentence,0]])
|
28 |
print(embeddings)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
```
|
|
|
10 |
---
|
11 |
|
12 |
# hku-nlp/instructor-xl
|
13 |
+
This is a general embedding model: It maps **any** piece of text (e.g., a title, a sentence, a document, etc.) to a fixed-length vector in test time **without further training**. With instructions, the embeddings are **domain-specific** (e.g., specialized for science, finance, etc.) and **task-aware** (e.g., customized for classification, information retrieval, etc.)
|
14 |
+
|
15 |
+
The model is easy to use with `sentence-transformer` library.
|
16 |
+
|
17 |
+
## Installation
|
18 |
+
```bash
|
19 |
git clone https://github.com/HKUNLP/instructor-embedding
|
20 |
cd sentence-transformers
|
21 |
pip install -e .
|
22 |
```
|
23 |
+
|
24 |
+
## Compute your customized embeddings
|
25 |
+
Then you can use the model like this to calculate domain-specific and task-aware embeddings:
|
26 |
```python
|
27 |
from sentence_transformers import SentenceTransformer
|
28 |
sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
|
|
|
30 |
model = SentenceTransformer('hku-nlp/instructor-xl')
|
31 |
embeddings = model.encode([[instruction,sentence,0]])
|
32 |
print(embeddings)
|
33 |
+
```
|
34 |
+
|
35 |
+
## Calculate Sentence similarities
|
36 |
+
You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**.
|
37 |
+
```python
|
38 |
+
from sklearn.metrics.pairwise import cosine_similarity
|
39 |
+
sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0],
|
40 |
+
['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0]
|
41 |
+
sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0],
|
42 |
+
['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0]
|
43 |
+
embeddings_a = model.encode(sentences_a)
|
44 |
+
embeddings_b = model.encode(sentences_b)
|
45 |
+
similarities = cosine_similarity(embeddings_a,embeddings_b)
|
46 |
+
print(similarities)
|
47 |
```
|