File size: 3,886 Bytes

5a0b659
 
622f6ab
 
 
5a0b659
622f6ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab
 
 
 
c002bd4
 
622f6ab

---
license: cc-by-4.0
language:
- he
inference: false
---
# DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew

State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687).

This is the fine-tuned model for the syntax dependency tree parsing task.  

For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).

Sample usage:

```python
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictabert-syntax')
model = AutoModel.from_pretrained('dicta-il/dictabert-syntax', trust_remote_code=True)

model.eval()

sentence = 'בשנת 1948 השלים אפרים קישון את לימודיו בפיסול מתכת ובתולדות האמנות והחל לפרסם מאמרים הומוריסטיים'
print(model.predict([sentence], tokenizer))
```

Output:
```json
[
  {
    "tree": [
      {
        "word": "בשנת",
        "dep_head_idx": 2,
        "dep_func": "obl",
        "dep_head": "השלים"
      },
      {
        "word": "1948",
        "dep_head_idx": 0,
        "dep_func": "compound",
        "dep_head": "בשנת"
      },
      {
        "word": "השלים",
        "dep_head_idx": -1,
        "dep_func": "root",
        "dep_head": "הומוריסטיים"
      },
      {
        "word": "אפרים",
        "dep_head_idx": 2,
        "dep_func": "nsubj",
        "dep_head": "השלים"
      },
      {
        "word": "קישון",
        "dep_head_idx": 3,
        "dep_func": "flat",
        "dep_head": "אפרים"
      },
      {
        "word": "את",
        "dep_head_idx": 6,
        "dep_func": "case",
        "dep_head": "לימודיו"
      },
      {
        "word": "לימודיו",
        "dep_head_idx": 2,
        "dep_func": "obj",
        "dep_head": "השלים"
      },
      {
        "word": "בפיסול",
        "dep_head_idx": 6,
        "dep_func": "nmod",
        "dep_head": "לימודיו"
      },
      {
        "word": "מתכת",
        "dep_head_idx": 7,
        "dep_func": "compound",
        "dep_head": "בפיסול"
      },
      {
        "word": "ובתולדות",
        "dep_head_idx": 7,
        "dep_func": "conj",
        "dep_head": "בפיסול"
      },
      {
        "word": "האמנות",
        "dep_head_idx": 9,
        "dep_func": "compound",
        "dep_head": "ובתולדות"
      },
      {
        "word": "והחל",
        "dep_head_idx": 2,
        "dep_func": "conj",
        "dep_head": "השלים"
      },
      {
        "word": "לפרסם",
        "dep_head_idx": 11,
        "dep_func": "xcomp",
        "dep_head": "והחל"
      },
      {
        "word": "מאמרים",
        "dep_head_idx": 12,
        "dep_func": "obj",
        "dep_head": "לפרסם"
      },
      {
        "word": "הומוריסטיים",
        "dep_head_idx": 13,
        "dep_func": "amod",
        "dep_head": "מאמרים"
      }
    ],
    "root_idx": 2
  }
]
```


## Citation

If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```

**BibTeX:**

```bibtex
@misc{shmidman2023dictabert,
      title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew}, 
      author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
      year={2023},
      eprint={2308.16687},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

## License

Shield: [![CC BY 4.0][cc-by-shield]][cc-by]

This work is licensed under a
[Creative Commons Attribution 4.0 International License][cc-by].

[![CC BY 4.0][cc-by-image]][cc-by]

[cc-by]: http://creativecommons.org/licenses/by/4.0/
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg