Hebrew_Nemo: State-of-the-Art Hebrew Language Model

Hebrew_Nemo

12B

Hebrew_Nemo is a state-of-the-art (SOTA) Hebrew language large language model specifically optimized for Hebrew language understanding and generation. Built upon the Mistral Nemo architecture, this model represents a significant advancement in Hebrew NLP capabilities, combining the robust multilingual foundations of Mistral Nemo with extensive Hebrew-specific fine-tuning and optimization.

As part of SicariusSicariiStuff efforts to truly democratize AI, Hebrew_Nemo is released with a permissive Apache 2.0 license. The model demonstrates competitive performance with Gemma3-27B, one of the world’s leading open-source models in multilingual capabilities—despite Gemma3-27B being more than twice its size. This result highlights Hebrew_Nemo’s efficiency and effectiveness, making SOTA capabilities widely available for consumers, as well as corporations.

Unfortunately, Gemma-3-27b-it doesn't benchmark well, but I still believe Gemma-3-27b-it is by far the best multi-lingual model:

Model	Average	SNLI Acc	QA (HeQ)	Translation BLEU	Israeli Trivia	Params (B)
google/gemma-3-27b-pt	69.5	85.24	78.27	36.45	70.43	27
google/gemma-3-27b-it	13.41	0	80.31	0.17	0	27

Hebrew_Nemo demonstrates SOTA performance for its size, with particularly outstanding results in Hebrew translation. At only 12B parameters, it achieves a BLEU score of 30.83, outperforming significantly larger models such as DeepSeek-14B and AI21 Jamba-Mini (52B)— a model more than x4 times its size.

The model maintains high competence across reasoning and QA, with SNLI accuracy of 79.76 and HeQ score of 70.51, indicating solid sentence-level understanding and contextual reasoning in Hebrew. Its Israeli Trivia score (50.83) demonstrates exceptional knowledge for its size, coming very close to a model more than 4x times its size, while vastly outperforming models of similar and even of a slightly larger size.

Model	Average	SNLI Acc	QA (HeQ)	Translation BLEU	Israeli Trivia	Params (B)
Hebrew_Nemo	57.98	79.76	70.51	30.83	50.83	12
ai21labs/AI21-Jamba-1.5-Mini	54.68	69.52	69.38	22.00	57.81	52
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B	53.19	85.48	71.38	22.99	32.89	14
SicariusSicariiStuff/Zion_Alpha	53.55	84.05	67.67	27.93	34.55	7
Qwen/Qwen3-8B	53.54	80.00	78.53	25.73	29.90	8
Mistral-Nemo-Base-2407	51.24	65.95	68.48	28.99	41.53	12.0

Hebrew_Nemo also vastly improves upon the original Mistral Nemo by adding massive amounts of new knowledge while refining existing capabilities:

Metric	Hebrew_Nemo	Mistral-Nemo-Base	(% Improvement)
Average	57.98	51.24	+13.2%
SNLI Accuracy	79.76	65.95	+20.9%
QA (HeQ)	70.51	68.48	+3.0%
Translation BLEU	30.83	28.99	+6.3%
Israeli Trivia	50.83	41.53	+22.4%

Technical Overview

Model Type: Causal Language Model (Decoder-only Transformer)
Base Architecture: Mistral Nemo
Language Focus: Hebrew (עברית) with maintained multilingual capabilities
License: Apache 2.0
Parameters: 12B
Context Length: 128K tokens
Layers: 40
Dim: 5,120
Head dim: 128
Hidden dim: 14,336
Activation Function: SwiGLU
Number of heads: 32
Number of kv-heads: 8 (GQA)
Vocabulary size: 2**17 ~= 128k
Rotary embeddings (theta = 1M)

Primary Use Cases

Hebrew Text Generation: High-quality content creation in modern Hebrew
Translation: Bidirectional translation between Hebrew and other languages
Question Answering: Advanced reasoning and comprehension in Hebrew contexts
Dialogue Systems: Conversational AI applications for Hebrew speakers
Text Classification: Sentiment analysis, topic modeling, and categorization of Hebrew content
Named Entity Recognition: Extraction of entities from Hebrew text
Summarization: Concise summaries of Hebrew documents and articles

Out-of-Scope Uses

Real-time critical decision-making systems (medical, legal, financial) without human oversight
Generation of content intended to deceive or manipulate
Applications requiring 100% factual accuracy without verification

Training Data and Training Methodology

Hebrew_Nemo was trained on a diverse corpus including:

Source Type	Description	Language Coverage
Hebrew Wikipedia	Encyclopedia-style text	100% Hebrew
Hebrew Literature & Proverbs	Classic and modern	100% Hebrew
Hebrew-English Code-Mix	Social media & dialogue	70% Hebrew / 30% English
Synthetic Data	Instruction-following & reasoning	Mixed

Data was filtered, normalized, and token-balanced to reduce bias and improve generalization across dialects.

Additional data trained:

Modern Hebrew web text and news articles
Hebrew literature and academic publications
Biblical and Rabbinic Hebrew texts for cultural depth
Hebrew social media and conversational data
Technical documentation in Hebrew
Parallel corpora for translation capabilities

The training process involved:

Continued pre-training on Hebrew-rich datasets
Instruction fine-tuning on Hebrew task-specific data
Alignment through RLHF/DPO for Hebrew linguistic preferences

🚀 Key Features

Native Hebrew Understanding: Trained on millions of high-quality Hebrew documents spanning literature, news, Wikipedia, academic, and colloquial domains.
Contextual Mastery: Handles complex anaphora, idiomatic expressions, and mixed Hebrew-English text with high fidelity.
Instruction-Tuned: Aligned for chat, Q&A, summarization, and reasoning use cases.
Cultural Awareness: Sensitive to Hebrew cultural, religious, and social nuances.
Optimized Inference: Enhanced performance with Mistral’s memory-efficient attention and dynamic context window.

Out of scope usage

Generating disinformation or biased political content
Automated decision-making without human oversight

⚙️ Limitations

May reflect training corpus biases (e.g., urban dialect prevalence, widespread opinions in Israeli social media)
Limited performance on rare biblical or archaic Hebrew
Occasionally mixes Hebrew and English when the context is ambiguous
Does not include alignment for safety moderation out of the box

Model instruction template: ChatML

<|im_start|>system
You answer the questions in Hebrew.<|im_end|>
<|im_start|>User 
{prompt}<|im_end|>
<|im_start|>AI answer

🗣️ Example Usage

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "SicariusSicariiStuff/Hebrew_Nemo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "מהי בינה מלאכותית?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Chat Format

messages = [
    {"role": "user", "content": "ספר לי על ההיסטוריה של ירושלים"}
]

formatted_prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quantization (for lower VRAM)

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto"
)

Available quantizations:

Original: FP16
GGUF: Static Quants
Specialized: FP8
Mobile (ARM): Q4_0

Citation

@misc{hebrew_nemo_2025,
  author = {SicariusSicariiStuff},
  title = {Hebrew_Nemo: State-of-the-Art Hebrew Language Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo}
}