retrainai
/

instructor-xl

Model card Files Files and versions Community

hku-nlp commited on Dec 19, 2022

Commit

20c37ba

•

1 Parent(s): 0c8d9fc

Update README.md

Files changed (1) hide show

README.md +23 -5

README.md CHANGED Viewed

@@ -10,15 +10,19 @@ tags:
 ---
 # hku-nlp/instructor-xl
-This is a general embedding model: It maps sentences & paragraphs to a 768 dimensional dense vector space.
-The model was trained on diverse tasks.
-It takes customized instructions and text inputs, and generates task-specific embeddings for general purposes, e.g., information retrieval, classification, clustering, etc.
-```
 git clone https://github.com/HKUNLP/instructor-embedding
 cd sentence-transformers
 pip install -e .
 ```
-Then you can use the model like this:
 ```python
 from sentence_transformers import SentenceTransformer
 sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
@@ -26,4 +30,18 @@ instruction = "Represent the Science title; Input:"
 model = SentenceTransformer('hku-nlp/instructor-xl')
 embeddings = model.encode([[instruction,sentence,0]])
 print(embeddings)
 ```

 ---
 # hku-nlp/instructor-xl
+This is a general embedding model: It maps **any** piece of text (e.g., a title, a sentence, a document, etc.) to a fixed-length vector in test time **without further training**. With instructions, the embeddings are **domain-specific** (e.g., specialized for science, finance, etc.) and **task-aware** (e.g., customized for classification, information retrieval, etc.)
+The model is easy to use with `sentence-transformer` library.
+## Installation
+```bash
 git clone https://github.com/HKUNLP/instructor-embedding
 cd sentence-transformers
 pip install -e .
 ```
+## Compute your customized embeddings
+Then you can use the model like this to calculate domain-specific and task-aware embeddings:
 ```python
 from sentence_transformers import SentenceTransformer
 sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
 model = SentenceTransformer('hku-nlp/instructor-xl')
 embeddings = model.encode([[instruction,sentence,0]])
 print(embeddings)
+```
+## Calculate Sentence similarities
+You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**.
+```python
+from sklearn.metrics.pairwise import cosine_similarity
+sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0],
+               ['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0]
+sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0],
+               ['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0]
+embeddings_a = model.encode(sentences_a)
+embeddings_b = model.encode(sentences_b)
+similarities = cosine_similarity(embeddings_a,embeddings_b)
+print(similarities)
 ```