Hebrew_Nemo: State-of-the-Art Hebrew Language Model
Hebrew_Nemo is a state-of-the-art (SOTA) Hebrew language large language model specifically optimized for Hebrew language understanding and generation. Built upon the Mistral Nemo architecture, this model represents a significant advancement in Hebrew NLP capabilities, combining the robust multilingual foundations of Mistral Nemo with extensive Hebrew-specific fine-tuning and optimization.
As part of SicariusSicariiStuff efforts to truly democratize AI, Hebrew_Nemo is released with a permissive Apache 2.0 license. The model demonstrates competitive performance with Gemma3-27B, one of the worldโs leading open-source models in multilingual capabilitiesโdespite Gemma3-27B being more than twice its size. This result highlights Hebrew_Nemoโs efficiency and effectiveness, making SOTA capabilities widely available for consumers, as well as corporations.
Unfortunately, Gemma-3-27b-it doesn't benchmark well, but I still believe Gemma-3-27b-it is by far the best multi-lingual model:
| Model | Average | SNLI Acc | QA (HeQ) | Translation BLEU | Israeli Trivia | Params (B) |
|---|---|---|---|---|---|---|
| google/gemma-3-27b-pt | 69.5 | 85.24 | 78.27 | 36.45 | 70.43 | 27 |
| google/gemma-3-27b-it | 13.41 | 0 | 80.31 | 0.17 | 0 | 27 |
Hebrew_Nemo demonstrates SOTA performance for its size, with particularly outstanding results in Hebrew translation. At only 12B parameters, it achieves a BLEU score of 30.83, outperforming significantly larger models such as DeepSeek-14B and AI21 Jamba-Mini (52B)โ a model more than x4 times its size.
The model maintains high competence across reasoning and QA, with SNLI accuracy of 79.76 and HeQ score of 70.51, indicating solid sentence-level understanding and contextual reasoning in Hebrew. Its Israeli Trivia score (50.83) demonstrates exceptional knowledge for its size, coming very close to a model more than 4x times its size, while vastly outperforming models of similar and even of a slightly larger size.
| Model | Average | SNLI Acc | QA (HeQ) | Translation BLEU | Israeli Trivia | Params (B) |
|---|---|---|---|---|---|---|
| Hebrew_Nemo | 57.98 | 79.76 | 70.51 | 30.83 | 50.83 | 12 |
| ai21labs/AI21-Jamba-1.5-Mini | 54.68 | 69.52 | 69.38 | 22.00 | 57.81 | 52 |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 53.19 | 85.48 | 71.38 | 22.99 | 32.89 | 14 |
| SicariusSicariiStuff/Zion_Alpha | 53.55 | 84.05 | 67.67 | 27.93 | 34.55 | 7 |
| Qwen/Qwen3-8B | 53.54 | 80.00 | 78.53 | 25.73 | 29.90 | 8 |
| Mistral-Nemo-Base-2407 | 51.24 | 65.95 | 68.48 | 28.99 | 41.53 | 12.0 |
Hebrew_Nemo also vastly improves upon the original Mistral Nemo by adding massive amounts of new knowledge while refining existing capabilities:
| Metric | Hebrew_Nemo | Mistral-Nemo-Base | (% Improvement) |
|---|---|---|---|
| Average | 57.98 | 51.24 | +13.2% |
| SNLI Accuracy | 79.76 | 65.95 | +20.9% |
| QA (HeQ) | 70.51 | 68.48 | +3.0% |
| Translation BLEU | 30.83 | 28.99 | +6.3% |
| Israeli Trivia | 50.83 | 41.53 | +22.4% |
Technical Overview
- Model Type: Causal Language Model (Decoder-only Transformer)
- Base Architecture: Mistral Nemo
- Language Focus: Hebrew (ืขืืจืืช) with maintained multilingual capabilities
- License: Apache 2.0
- Parameters: 12B
- Context Length: 128K tokens
- Layers: 40
- Dim: 5,120
- Head dim: 128
- Hidden dim: 14,336
- Activation Function: SwiGLU
- Number of heads: 32
- Number of kv-heads: 8 (GQA)
- Vocabulary size: 2**17 ~= 128k
- Rotary embeddings (theta = 1M)
Primary Use Cases
- Hebrew Text Generation: High-quality content creation in modern Hebrew
- Translation: Bidirectional translation between Hebrew and other languages
- Question Answering: Advanced reasoning and comprehension in Hebrew contexts
- Dialogue Systems: Conversational AI applications for Hebrew speakers
- Text Classification: Sentiment analysis, topic modeling, and categorization of Hebrew content
- Named Entity Recognition: Extraction of entities from Hebrew text
- Summarization: Concise summaries of Hebrew documents and articles
Out-of-Scope Uses
- Real-time critical decision-making systems (medical, legal, financial) without human oversight
- Generation of content intended to deceive or manipulate
- Applications requiring 100% factual accuracy without verification
Training Data and Training Methodology
Hebrew_Nemo was trained on a diverse corpus including:
| Source Type | Description | Language Coverage |
|---|---|---|
| Hebrew Wikipedia | Encyclopedia-style text | 100% Hebrew |
| Hebrew Literature & Proverbs | Classic and modern | 100% Hebrew |
| Hebrew-English Code-Mix | Social media & dialogue | 70% Hebrew / 30% English |
| Synthetic Data | Instruction-following & reasoning | Mixed |
Data was filtered, normalized, and token-balanced to reduce bias and improve generalization across dialects.
Additional data trained:
- Modern Hebrew web text and news articles
- Hebrew literature and academic publications
- Biblical and Rabbinic Hebrew texts for cultural depth
- Hebrew social media and conversational data
- Technical documentation in Hebrew
- Parallel corpora for translation capabilities
The training process involved:
- Continued pre-training on Hebrew-rich datasets
- Instruction fine-tuning on Hebrew task-specific data
- Alignment through RLHF/DPO for Hebrew linguistic preferences
๐ Key Features
- Native Hebrew Understanding: Trained on millions of high-quality Hebrew documents spanning literature, news, Wikipedia, academic, and colloquial domains.
- Contextual Mastery: Handles complex anaphora, idiomatic expressions, and mixed Hebrew-English text with high fidelity.
- Instruction-Tuned: Aligned for chat, Q&A, summarization, and reasoning use cases.
- Cultural Awareness: Sensitive to Hebrew cultural, religious, and social nuances.
- Optimized Inference: Enhanced performance with Mistralโs memory-efficient attention and dynamic context window.
Out of scope usage
- Generating disinformation or biased political content
- Automated decision-making without human oversight
โ๏ธ Limitations
- May reflect training corpus biases (e.g., urban dialect prevalence, widespread opinions in Israeli social media)
- Limited performance on rare biblical or archaic Hebrew
- Occasionally mixes Hebrew and English when the context is ambiguous
- Does not include alignment for safety moderation out of the box
Model instruction template: ChatML
<|im_start|>system
You answer the questions in Hebrew.<|im_end|>
<|im_start|>User
{prompt}<|im_end|>
<|im_start|>AI answer
๐ฃ๏ธ Example Usage
Basic Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "SicariusSicariiStuff/Hebrew_Nemo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
prompt = "ืืื ืืื ื ืืืืืืชืืช?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Chat Format
messages = [
{"role": "user", "content": "ืกืคืจ ืื ืขื ืืืืกืืืจืื ืฉื ืืจืืฉืืื"}
]
formatted_prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Quantization (for lower VRAM)
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map="auto"
)
Available quantizations:
- Original: FP16
- GGUF: Static Quants
- Specialized: FP8
- Mobile (ARM): Q4_0
Citation
@misc{hebrew_nemo_2025,
author = {SicariusSicariiStuff},
title = {Hebrew_Nemo: State-of-the-Art Hebrew Language Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo}
}
๐งฐ Acknowledgements
- Mistral for the base architecture
- NVIDIA NeMo framework inspiration
- Employee#11 for her unwavering support
Contact
For questions, issues, or collaboration opportunities:
- HuggingFace: @SicariusSicariiStuff
- Issues: Report technical issues on the model repository
Model Card Authors
- Downloads last month
- 682