license: cc-by-4.0
language:
- he
inference: false
DictaLM-rab: A Large Generative Language Model for Rabbinic Hebrew
A large generative pretrained transformer (GPT) language model for Hebrew, released here.
This is an alpha version of the model, and there are many improvements to come.
We are actively working on improving the model, so stay tuned.
This is the base-model pretrained on general text completion. On it's own, it isn't very useful, but it can be fine-tuned for specific tasks (instruct, chat, QA, and more).
This model differs from the regular DictaLM regarding the training data used for pretraining. The regular DictaLM
was pretrained on modern texts only, and this model (DictaLM-Rab
) was pretrained on a mixture of 50% modern texts and 50% rabbinic/historical texts.
Sample usage (for text completion):
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictalm-rab-7b')
model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-rab-7b', trust_remote_code=True).cuda()
model.eval()
with torch.inference_mode():
prompt = '讗诪专 专讘 讬讛讜讚讛 讗诪专 砖诪讜讗诇 讛讻讜转讘'
kwargs = dict(
inputs=tokenizer(prompt, return_tensors='pt').input_ids.to(model.device),
do_sample=True,
top_k=50,
top_p=0.95,
temperature=0.75,
max_length=100,
min_new_tokens=5
)
print(tokenizer.batch_decode(model.generate(**kwargs), skip_special_tokens=True))
There are many different parameters you can input into kwargs
for different results (greedy, beamsearch, different samplign configurations, longer/shorter respones, etc.).
You can view the full list of parameters you can pass to the generate
function here.
Alternative ways to initialize the model:
If you have multiple smaller GPUs, and the package accelerate
is installed, you can initialize the model split across the devices:
model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-rab-7b', trust_remote_code=True, device_map='auto')
If you are running on linux and have the bitsandbytes
package installed, you can initialize the model in 4/8 bit inference mode:
model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-rab-7b', trust_remote_code=True, load_in_8bit=True)
If you have FlashAttention installed in your environment, you can instruct the model to use the flash attention implementation (either V1 or V2, whichever is installed):
model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-rab-7b', trust_remote_code=True, use_flash_attention=True)
Citation
If you use DictaLM in your research, please cite DictaLM -- A Large Generative Language Model for Modern Hebrew
BibTeX:
@misc{shmidman2023introducing,
title={Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew},
author={Shaltiel Shmidman and Avi Shmidman and Amir David Nissan Cohen and Moshe Koppel},
year={2023},
eprint={2309.14568},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
License
This work is licensed under a Creative Commons Attribution 4.0 International License.