knowledgator
/

gliner-multitask-large-v0.5

@@ -58,6 +58,85 @@ for entity in entities:
     print(entity["text"], "=>", entity["label"])
 ```
 **How to use for open information extraction:**
 ```python
@@ -128,21 +207,6 @@ for summary in summaries:
     print(summary["text"], "=>", summary["score"])
 ```
-**How to use for relation extraction:**
-```python
-text = """
-Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975 to develop and sell BASIC interpreters for the Altair 8800. During his career at Microsoft, Gates held the positions of chairman, chief executive officer, president and chief software architect, while also being the largest individual shareholder until May 2014.
-"""
-labels = ["Microsoft <> founder", "Microsoft <> inception date", "Bill Gates <> held position"]
-entities = model.predict_entities(text, labels)
-for entity in entities:
-    print(entity["label"], “=>”, entity["text"])
-```
 ### Benchmarks:
 Our multitask model demonstrates comparable performance on different zero-shot benchmarks to dedicated models to NER task:

     print(entity["text"], "=>", entity["label"])
 ```
+**How to use for relation extraction:**
+```python
+text = """
+Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975 to develop and sell BASIC interpreters for the Altair 8800. During his career at Microsoft, Gates held the positions of chairman, chief executive officer, president and chief software architect, while also being the largest individual shareholder until May 2014.
+"""
+labels = ["Microsoft <> founder", "Microsoft <> inception date", "Bill Gates <> held position"]
+entities = model.predict_entities(text, labels)
+for entity in entities:
+    print(entity["label"], “=>”, entity["text"])
+```
+### Construct relations extraction pipeline with [utca](https://github.com/Knowledgator/utca)
+First of all, we need import neccessary components of the library and initalize predictor - GLiNER model and construct pipeline that combine NER and realtions extraction:
+```python
+from utca.core import RenameAttribute
+from utca.implementation.predictors import (
+    GLiNERPredictor,
+    GLiNERPredictorConfig
+)
+from utca.implementation.tasks import (
+    GLiNER,
+    GLiNERPreprocessor,
+    GLiNERRelationExtraction,
+    GLiNERRelationExtractionPreprocessor,
+)
+predictor = GLiNERPredictor( # Predictor manages the model that will be used by tasks
+    GLiNERPredictorConfig(
+        model_name = "knowledgator/gliner-multitask-large-v0.5", # Model to use
+        device = "cuda:0", # Device to use
+    )
+)
+pipe = (
+    GLiNER( # GLiNER task produces classified entities that will be at the "output" key.
+        predictor=predictor,
+        preprocess=GLiNERPreprocessor(threshold=0.7) # Entities threshold
+    )
+    | RenameAttribute("output", "entities") # Rename output entities from GLiNER task to use them as inputs in GLiNERRelationExtraction
+    | GLiNERRelationExtraction( # GLiNERRelationExtraction is used for relation extraction.
+        predictor=predictor,
+        preprocess=(
+            GLiNERPreprocessor(threshold=0.5) # Relations threshold
+            | GLiNERRelationExtractionPreprocessor()
+        )
+    )
+)
+```
+To run pipeline we need to specify entity types and relations with their parameters:
+```python
+r = pipe.run({
+    "text": text, # Text to process
+    "labels": [ # Labels used by GLiNER for entity extraction
+        "scientist",
+        "university",
+        "city",
+        "research",
+        "journal",
+    ],
+    "relations": [{ # Relation parameters
+        "relation": "published at", # Relation label. Required parameter.
+        "pairs_filter": [("scientist", "journal")], # Optional parameter. It specifies possible members of relations by their entity labels.
+        # Here, "scientist" is the entity label of the source, and "journal" is the target's entity label.
+        # If provided, only specified pairs will be returned.
+    },{
+        "relation": "worked at",
+        "pairs_filter": [("scientist", "university"), ("scientist", "other")],
+        "distance_threshold": 100, # Optional parameter. It specifies the max distance between spans in the text (i.e., the end of the span that is closer to the start of the text and the start of the next one).
+    }]
+})
+print(r["output"])
+```
 **How to use for open information extraction:**
 ```python
     print(summary["text"], "=>", summary["score"])
 ```
 ### Benchmarks:
 Our multitask model demonstrates comparable performance on different zero-shot benchmarks to dedicated models to NER task: