--- license: cc-by-4.0 language: - he inference: false --- # DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2403.06970). This is the fine-tuned model for the syntax dependency tree parsing task. For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b). Sample usage: ```python from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictabert-syntax') model = AutoModel.from_pretrained('dicta-il/dictabert-syntax', trust_remote_code=True) model.eval() sentence = 'בשנת 1948 השלים אפרים קישון את לימודיו בפיסול מתכת ובתולדות האמנות והחל לפרסם מאמרים הומוריסטיים' print(model.predict([sentence], tokenizer)) ``` Output: ```json [ { "tree": [ { "word": "בשנת", "dep_head_idx": 2, "dep_func": "obl", "dep_head": "השלים" }, { "word": "1948", "dep_head_idx": 0, "dep_func": "compound", "dep_head": "בשנת" }, { "word": "השלים", "dep_head_idx": -1, "dep_func": "root", "dep_head": "הומוריסטיים" }, { "word": "אפרים", "dep_head_idx": 2, "dep_func": "nsubj", "dep_head": "השלים" }, { "word": "קישון", "dep_head_idx": 3, "dep_func": "flat", "dep_head": "אפרים" }, { "word": "את", "dep_head_idx": 6, "dep_func": "case", "dep_head": "לימודיו" }, { "word": "לימודיו", "dep_head_idx": 2, "dep_func": "obj", "dep_head": "השלים" }, { "word": "בפיסול", "dep_head_idx": 6, "dep_func": "nmod", "dep_head": "לימודיו" }, { "word": "מתכת", "dep_head_idx": 7, "dep_func": "compound", "dep_head": "בפיסול" }, { "word": "ובתולדות", "dep_head_idx": 7, "dep_func": "conj", "dep_head": "בפיסול" }, { "word": "האמנות", "dep_head_idx": 9, "dep_func": "compound", "dep_head": "ובתולדות" }, { "word": "והחל", "dep_head_idx": 2, "dep_func": "conj", "dep_head": "השלים" }, { "word": "לפרסם", "dep_head_idx": 11, "dep_func": "xcomp", "dep_head": "והחל" }, { "word": "מאמרים", "dep_head_idx": 12, "dep_func": "obj", "dep_head": "לפרסם" }, { "word": "הומוריסטיים", "dep_head_idx": 13, "dep_func": "amod", "dep_head": "מאמרים" } ], "root_idx": 2 } ] ``` ## Citation If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew``` **BibTeX:** ```bibtex @misc{shmidman2023dictabert, title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew}, author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel}, year={2023}, eprint={2308.16687}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ## Citation If you use DictaBERT-syntax in your research, please cite ```MRL Parsing without Tears: The Case of Hebrew``` **BibTeX:** ```bibtex @misc{shmidman2024mrl, title={MRL Parsing Without Tears: The Case of Hebrew}, author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel and Reut Tsarfaty}, year={2024}, eprint={2403.06970}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ## License Shield: [![CC BY 4.0][cc-by-shield]][cc-by] This work is licensed under a [Creative Commons Attribution 4.0 International License][cc-by]. [![CC BY 4.0][cc-by-image]][cc-by] [cc-by]: http://creativecommons.org/licenses/by/4.0/ [cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png [cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg