Shaltiel commited on
Commit
d4a3ab4
โ€ข
1 Parent(s): 202b668

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -1,3 +1,91 @@
1
  ---
2
  license: cc-by-4.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
+ language:
4
+ - he
5
  ---
6
+ # DictaBERT-Large: A State-of-the-Art BERT-Large Suite for Modern Hebrew
7
+
8
+ State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687).
9
+
10
+ This is the fine-tuned BERT-large model for the named-entity-recognition task.
11
+
12
+ For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).
13
+
14
+ For the bert-large models for other tasks, see [to-be-added].
15
+
16
+
17
+ Sample usage:
18
+
19
+ ```python
20
+ from transformers import pipeline
21
+
22
+ oracle = pipeline('ner', model='dicta-il/dictabert-large-ner', aggregation_strategy='simple')
23
+
24
+ # if we set aggregation_strategy to simple, we need to define a decoder for the tokenizer. Note that the last wordpiece of a group will still be emitted
25
+ from tokenizers.decoders import WordPiece
26
+ oracle.tokenizer.backend_tokenizer.decoder = WordPiece()
27
+
28
+ sentence = 'ื”ื›ื™ ื“ืจืžื˜ื™ ืฉื™ืฉ: ืฉืขืจ ืฉืœ ืกื“ืจื™ืง ื”ืžื—ืœื™ืฃ ื”ืขื ื™ืง ืœื–ื™ื• ืืจื™ื” ื ื™ืฆื—ื•ืŸ ืฉื ื™ ื‘ืฉืœื•ืฉื” ืžืฉื—ืงื™ื ื•ืขืœื™ื™ื” ืžืขืœ ื”ืงื• ื”ืื“ื•ื.'
29
+ oracle(sentence)
30
+ ```
31
+
32
+ Output:
33
+ ```json
34
+ [
35
+ {
36
+ "entity_group": "PER",
37
+ "score": 0.9998621,
38
+ "word": "ืกื“ืจื™ืง",
39
+ "start": 22,
40
+ "end": 27
41
+ },
42
+ {
43
+ "entity_group": "PER",
44
+ "score": 0.9999503,
45
+ "word": "ืœื–ื™",
46
+ "start": 41,
47
+ "end": 44
48
+ },
49
+ {
50
+ "entity_group": "PER",
51
+ "score": 0.9998287,
52
+ "word": "ืืจื™ื”",
53
+ "start": 46,
54
+ "end": 50
55
+ }
56
+ ]
57
+ ```
58
+
59
+ ## Citation
60
+
61
+ If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```
62
+
63
+ **BibTeX:**
64
+
65
+ ```bibtex
66
+ @misc{shmidman2023dictabert,
67
+ title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew},
68
+ author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
69
+ year={2023},
70
+ eprint={2308.16687},
71
+ archivePrefix={arXiv},
72
+ primaryClass={cs.CL}
73
+ }
74
+ ```
75
+
76
+ ## License
77
+
78
+ Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
79
+
80
+ This work is licensed under a
81
+ [Creative Commons Attribution 4.0 International License][cc-by].
82
+
83
+ [![CC BY 4.0][cc-by-image]][cc-by]
84
+
85
+ [cc-by]: http://creativecommons.org/licenses/by/4.0/
86
+ [cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
87
+ [cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg
88
+
89
+
90
+
91
+