Token Classification
GLiNER
PyTorch
urchade commited on
Commit
437994f
1 Parent(s): 4cb6c3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -1
README.md CHANGED
@@ -9,4 +9,43 @@ language:
9
  - it
10
  library_name: gliner
11
  pipeline_tag: token-classification
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - it
10
  library_name: gliner
11
  pipeline_tag: token-classification
12
+ datasets:
13
+ - urchade/synthetic-pii-ner-mistral-v1
14
+ ---
15
+
16
+
17
+ # Model Card for GLiNER-multi
18
+
19
+ GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
20
+
21
+ This version has been ot recognize and classify **Personally Identifiable Information** (PII) within text. The training dataset has been generated using `mistralai/Mistral-7B-Instruct-v0.2`.
22
+
23
+ ## Links
24
+
25
+ * Paper: https://arxiv.org/abs/2311.08526
26
+ * Repository: https://github.com/urchade/GLiNER
27
+
28
+ ```python
29
+ from gliner import GLiNER
30
+
31
+ model = GLiNER.from_pretrained("urchade/gliner_multi_pii-v1")
32
+
33
+ text = """
34
+ Harilala Rasoanaivo, un homme d'affaires local d'Antananarivo, a enregistré une nouvelle société nommée "Rasoanaivo Enterprises" au Lot II M 92 Antohomadinika. Son numéro est le +261 32 22 345 67, et son adresse électronique est harilala.rasoanaivo@telma.mg. Il a fourni son numéro de sécu 501-02-1234 pour l'enregistrement.
35
+ """
36
+
37
+ labels = ["work", "booking number", "personally identifiable information", "driver licence", "person", "book", "full address", "company", "actor", "character", "email", "passport number", "Social Security Number", "phone number"]
38
+ entities = model.predict_entities(text, labels)
39
+
40
+ for entity in entities:
41
+ print(entity["text"], "=>", entity["label"])
42
+ ```
43
+
44
+ ```
45
+ Harilala Rasoanaivo => person
46
+ Rasoanaivo Enterprises => company
47
+ Lot II M 92 Antohomadinika => full address
48
+ +261 32 22 345 67 => phone number
49
+ harilala.rasoanaivo@telma.mg => email
50
+ 501-02-1234 => Social Security Number
51
+ ```