File size: 3,886 Bytes
5a0b659 622f6ab 5a0b659 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab c002bd4 622f6ab |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
---
license: cc-by-4.0
language:
- he
inference: false
---
# DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew
State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687).
This is the fine-tuned model for the syntax dependency tree parsing task.
For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).
Sample usage:
```python
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictabert-syntax')
model = AutoModel.from_pretrained('dicta-il/dictabert-syntax', trust_remote_code=True)
model.eval()
sentence = '讘砖谞转 1948 讛砖诇讬诐 讗驻专讬诐 拽讬砖讜谉 讗转 诇讬诪讜讚讬讜 讘驻讬住讜诇 诪转讻转 讜讘转讜诇讚讜转 讛讗诪谞讜转 讜讛讞诇 诇驻专住诐 诪讗诪专讬诐 讛讜诪讜专讬住讟讬讬诐'
print(model.predict([sentence], tokenizer))
```
Output:
```json
[
{
"tree": [
{
"word": "讘砖谞转",
"dep_head_idx": 2,
"dep_func": "obl",
"dep_head": "讛砖诇讬诐"
},
{
"word": "1948",
"dep_head_idx": 0,
"dep_func": "compound",
"dep_head": "讘砖谞转"
},
{
"word": "讛砖诇讬诐",
"dep_head_idx": -1,
"dep_func": "root",
"dep_head": "讛讜诪讜专讬住讟讬讬诐"
},
{
"word": "讗驻专讬诐",
"dep_head_idx": 2,
"dep_func": "nsubj",
"dep_head": "讛砖诇讬诐"
},
{
"word": "拽讬砖讜谉",
"dep_head_idx": 3,
"dep_func": "flat",
"dep_head": "讗驻专讬诐"
},
{
"word": "讗转",
"dep_head_idx": 6,
"dep_func": "case",
"dep_head": "诇讬诪讜讚讬讜"
},
{
"word": "诇讬诪讜讚讬讜",
"dep_head_idx": 2,
"dep_func": "obj",
"dep_head": "讛砖诇讬诐"
},
{
"word": "讘驻讬住讜诇",
"dep_head_idx": 6,
"dep_func": "nmod",
"dep_head": "诇讬诪讜讚讬讜"
},
{
"word": "诪转讻转",
"dep_head_idx": 7,
"dep_func": "compound",
"dep_head": "讘驻讬住讜诇"
},
{
"word": "讜讘转讜诇讚讜转",
"dep_head_idx": 7,
"dep_func": "conj",
"dep_head": "讘驻讬住讜诇"
},
{
"word": "讛讗诪谞讜转",
"dep_head_idx": 9,
"dep_func": "compound",
"dep_head": "讜讘转讜诇讚讜转"
},
{
"word": "讜讛讞诇",
"dep_head_idx": 2,
"dep_func": "conj",
"dep_head": "讛砖诇讬诐"
},
{
"word": "诇驻专住诐",
"dep_head_idx": 11,
"dep_func": "xcomp",
"dep_head": "讜讛讞诇"
},
{
"word": "诪讗诪专讬诐",
"dep_head_idx": 12,
"dep_func": "obj",
"dep_head": "诇驻专住诐"
},
{
"word": "讛讜诪讜专讬住讟讬讬诐",
"dep_head_idx": 13,
"dep_func": "amod",
"dep_head": "诪讗诪专讬诐"
}
],
"root_idx": 2
}
]
```
## Citation
If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```
**BibTeX:**
```bibtex
@misc{shmidman2023dictabert,
title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew},
author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
year={2023},
eprint={2308.16687},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
## License
Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
This work is licensed under a
[Creative Commons Attribution 4.0 International License][cc-by].
[![CC BY 4.0][cc-by-image]][cc-by]
[cc-by]: http://creativecommons.org/licenses/by/4.0/
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg
|