Hebrew_Nemo: State-of-the-Art Hebrew Language Model


Hebrew_Nemo
12B

Developed by: SicariusSicariiStuff

Hebrew_Nemo is a state-of-the-art (SOTA) Hebrew language large language model specifically optimized for Hebrew language understanding and generation. Built upon the Mistral Nemo architecture, this model represents a significant advancement in Hebrew NLP capabilities, combining the robust multilingual foundations of Mistral Nemo with extensive Hebrew-specific fine-tuning and optimization.

As part of SicariusSicariiStuff efforts to truly democratize AI, Hebrew_Nemo is released with a permissive Apache 2.0 license. The model demonstrates competitive performance with Gemma3-27B, one of the worldโ€™s leading open-source models in multilingual capabilitiesโ€”despite Gemma3-27B being more than twice its size. This result highlights Hebrew_Nemoโ€™s efficiency and effectiveness, making SOTA capabilities widely available for consumers, as well as corporations.

Unfortunately, Gemma-3-27b-it doesn't benchmark well, but I still believe Gemma-3-27b-it is by far the best multi-lingual model:

Model Average SNLI Acc QA (HeQ) Translation BLEU Israeli Trivia Params (B)
google/gemma-3-27b-pt 69.5 85.24 78.27 36.45 70.43 27
google/gemma-3-27b-it 13.41 0 80.31 0.17 0 27

Hebrew_Nemo demonstrates SOTA performance for its size, with particularly outstanding results in Hebrew translation. At only 12B parameters, it achieves a BLEU score of 30.83, outperforming significantly larger models such as DeepSeek-14B and AI21 Jamba-Mini (52B)โ€” a model more than x4 times its size.

The model maintains high competence across reasoning and QA, with SNLI accuracy of 79.76 and HeQ score of 70.51, indicating solid sentence-level understanding and contextual reasoning in Hebrew. Its Israeli Trivia score (50.83) demonstrates exceptional knowledge for its size, coming very close to a model more than 4x times its size, while vastly outperforming models of similar and even of a slightly larger size.

Model Average SNLI Acc QA (HeQ) Translation BLEU Israeli Trivia Params (B)
Hebrew_Nemo 57.98 79.76 70.51 30.83 50.83 12
ai21labs/AI21-Jamba-1.5-Mini 54.68 69.52 69.38 22.00 57.81 52
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B 53.19 85.48 71.38 22.99 32.89 14
SicariusSicariiStuff/Zion_Alpha 53.55 84.05 67.67 27.93 34.55 7
Qwen/Qwen3-8B 53.54 80.00 78.53 25.73 29.90 8
Mistral-Nemo-Base-2407 51.24 65.95 68.48 28.99 41.53 12.0

Hebrew_Nemo also vastly improves upon the original Mistral Nemo by adding massive amounts of new knowledge while refining existing capabilities:

Metric Hebrew_Nemo Mistral-Nemo-Base (% Improvement)
Average 57.98 51.24 +13.2%
SNLI Accuracy 79.76 65.95 +20.9%
QA (HeQ) 70.51 68.48 +3.0%
Translation BLEU 30.83 28.99 +6.3%
Israeli Trivia 50.83 41.53 +22.4%

Technical Overview

  • Model Type: Causal Language Model (Decoder-only Transformer)
  • Base Architecture: Mistral Nemo
  • Language Focus: Hebrew (ืขื‘ืจื™ืช) with maintained multilingual capabilities
  • License: Apache 2.0
  • Parameters: 12B
  • Context Length: 128K tokens
  • Layers: 40
  • Dim: 5,120
  • Head dim: 128
  • Hidden dim: 14,336
  • Activation Function: SwiGLU
  • Number of heads: 32
  • Number of kv-heads: 8 (GQA)
  • Vocabulary size: 2**17 ~= 128k
  • Rotary embeddings (theta = 1M)

Primary Use Cases

  • Hebrew Text Generation: High-quality content creation in modern Hebrew
  • Translation: Bidirectional translation between Hebrew and other languages
  • Question Answering: Advanced reasoning and comprehension in Hebrew contexts
  • Dialogue Systems: Conversational AI applications for Hebrew speakers
  • Text Classification: Sentiment analysis, topic modeling, and categorization of Hebrew content
  • Named Entity Recognition: Extraction of entities from Hebrew text
  • Summarization: Concise summaries of Hebrew documents and articles

Out-of-Scope Uses

  • Real-time critical decision-making systems (medical, legal, financial) without human oversight
  • Generation of content intended to deceive or manipulate
  • Applications requiring 100% factual accuracy without verification

Training Data and Training Methodology

Hebrew_Nemo was trained on a diverse corpus including:

Source Type Description Language Coverage
Hebrew Wikipedia Encyclopedia-style text 100% Hebrew
Hebrew Literature & Proverbs Classic and modern 100% Hebrew
Hebrew-English Code-Mix Social media & dialogue 70% Hebrew / 30% English
Synthetic Data Instruction-following & reasoning Mixed

Data was filtered, normalized, and token-balanced to reduce bias and improve generalization across dialects.

Additional data trained:

  • Modern Hebrew web text and news articles
  • Hebrew literature and academic publications
  • Biblical and Rabbinic Hebrew texts for cultural depth
  • Hebrew social media and conversational data
  • Technical documentation in Hebrew
  • Parallel corpora for translation capabilities

The training process involved:

  1. Continued pre-training on Hebrew-rich datasets
  2. Instruction fine-tuning on Hebrew task-specific data
  3. Alignment through RLHF/DPO for Hebrew linguistic preferences

๐Ÿš€ Key Features

  • Native Hebrew Understanding: Trained on millions of high-quality Hebrew documents spanning literature, news, Wikipedia, academic, and colloquial domains.
  • Contextual Mastery: Handles complex anaphora, idiomatic expressions, and mixed Hebrew-English text with high fidelity.
  • Instruction-Tuned: Aligned for chat, Q&A, summarization, and reasoning use cases.
  • Cultural Awareness: Sensitive to Hebrew cultural, religious, and social nuances.
  • Optimized Inference: Enhanced performance with Mistralโ€™s memory-efficient attention and dynamic context window.

Out of scope usage

  • Generating disinformation or biased political content
  • Automated decision-making without human oversight

โš™๏ธ Limitations

  • May reflect training corpus biases (e.g., urban dialect prevalence, widespread opinions in Israeli social media)
  • Limited performance on rare biblical or archaic Hebrew
  • Occasionally mixes Hebrew and English when the context is ambiguous
  • Does not include alignment for safety moderation out of the box

Model instruction template: ChatML

<|im_start|>system
You answer the questions in Hebrew.<|im_end|>
<|im_start|>User 
{prompt}<|im_end|>
<|im_start|>AI answer

๐Ÿ—ฃ๏ธ Example Usage

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "SicariusSicariiStuff/Hebrew_Nemo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "ืžื”ื™ ื‘ื™ื ื” ืžืœืื›ื•ืชื™ืช?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Chat Format

messages = [
    {"role": "user", "content": "ืกืคืจ ืœื™ ืขืœ ื”ื”ื™ืกื˜ื•ืจื™ื” ืฉืœ ื™ืจื•ืฉืœื™ื"}
]

formatted_prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quantization (for lower VRAM)

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto"
)

Available quantizations:


Citation

@misc{hebrew_nemo_2025,
  author = {SicariusSicariiStuff},
  title = {Hebrew_Nemo: State-of-the-Art Hebrew Language Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo}
}

๐Ÿงฐ Acknowledgements

  • Mistral for the base architecture
  • NVIDIA NeMo framework inspiration
  • Employee#11 for her unwavering support

Contact

For questions, issues, or collaboration opportunities:

Model Card Authors

Downloads last month
682
Safetensors
Model size
12B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SicariusSicariiStuff/Hebrew_Nemo

Finetuned
(76)
this model
Quantizations
5 models

Collection including SicariusSicariiStuff/Hebrew_Nemo