License

We release the model under Apache 2.0 license to indicate that we do not impose any additional constraints on the model weights. However, we do not own the data in the training collection.

Training and data

Generally speaking, the training follows our fluency-preserving post-training setup from Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages.

The training data is published alongside the model at norallm/normistral-11b-thinking-training. Training code will be available at github.com/ltgoslo/normistral-post-training.

1. Supervised finetuning (SFT)

We start by "injecting" the instruction-following and reasoning capabilities by SFT training on English responses and reasoning traces from Kimi-K2-Thinking. The full SFT collection is published in train_sft.jsonl.

2. Reinforcement learning (d-RLAIF)

The short SFT stage is followed by on-policy training on a large collection of Norwegian (Bokmål and Nynorsk) prompts (also available at norallm/normistral-11b-thinking-training). The specific setup of d-RLAIF (direct reinforcement learning from AI feedback) and its motivation is extensively described in our paper. The "AI" reward model used here is Mistral-Large-Instruct-2411.

Evaluation

We compared NorMistral against state-of-the-art instruction-tuned models of similar size. What follows is a preliminary evaluation on a generative version of NorEval (that is still work-in-progress). The responses from all evaluated models below are fully available for closer inspection at norallm/normistral-11b-thinking-evaluation.

Classification tasks

All classification scores are reported as accuracy. NoReC sentiment analysis is done on sentence level. The generative scores (NorRewrite and Norsummarize) are reported as the average win-rates against Llama-3.1-8B evaluated using LLM-as-a-judge setup with Llama-3.3-70B (see NorEval for more information). * denotes "thinking" models.

Model	NoReC_binary	NoReC_ternary	NorIdiom_NB	NorIdiom_NN	NorCSQA_NB	NorCSQA_NN
NorMistral-11B*	86.3	65.2	55.7	27.7	70.7	64.2
Llama-3.1-8B	79.8	52.9	12.7	6.7	64.0	57.9
Mistral-Nemo-12B	67.9	49.1	12.9	8.5	61.6	49.5
Qwen3-15B*	83.5	69.6	22.1	13.2	83.8	71.6
Gemma3-12B	85.2	67.1	43.7	23.7	81.9	80.0
OLMo3-7B*	72.0	63.3	5.0	2.2	50.8	17.9
OLMo2-13B	32.8	13.2	3.5	2.2	48.0	45.3
Apertus-8B	78.4	58.8	34.3	15.7	69.2	63.2

Model	NorOBQA_NB	NorOBQA_NN	NRK_NB	NRK_NN	NorRewrite	NorSummarize
NorMistral-11B*	83.0	84.4	58.8	62.3	51.9	54.3
Llama-3.1-8B	78.5	71.1	49.8	46.2	50.0	50.0
Mistral-Nemo-12B	75.3	67.8	47.3	45.0	42.5	39.2
Qwen3-15B*	94.4	88.9	63.3	55.9	77.6	83.1
Gemma3-12B	91.5	88.9	59.8	58.4	86.8	77.8.
OLMo3-7B*	70.5	54.4	43.3	35.9	7.8	14.2
OLMo2-13B	55.3	56.7	45.3	39.4	48.3	53.7
Apertus-8B	76.1	74.4	50.2	48.3	39.6	42.1

Citation

@misc{samuel2025fluentalignmentdisfluentjudges,
      title={Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages}, 
      author={David Samuel and Lilja Øvrelid and Erik Velldal and Andrey Kutuzov},
      year={2025},
      eprint={2512.08777},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.08777}, 
}

@inproceedings{samuel-etal-2025-small,
    title = "Small Languages, Big Models: {A} Study of Continual Training on Languages of {Norway}",
    author = "Samuel, David  and
      Mikhailov, Vladislav  and
      Velldal, Erik  and
      {\O}vrelid, Lilja  and
      Charpentier, Lucas Georges Gabriel  and
      Kutuzov, Andrey  and
      Oepen, Stephan",
    editor = "Johansson, Richard  and
      Stymne, Sara",
    booktitle = "Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)",
    month = mar,
    year = "2025",
    address = "Tallinn, Estonia",
    publisher = "University of Tartu Library",
    url = "https://aclanthology.org/2025.nodalida-1.61/",
    pages = "573--608",
    ISBN = "978-9908-53-109-0",
}

Contact

Please write a community message or contact David Samuel (davisamu@ifi.uio.no) if you have any questions about this model.

Downloads last month: 198

Safetensors

Model size

11B params

Tensor type

BF16

Model tree for norallm/normistral-11b-thinking

Base model

mistralai/Mistral-Nemo-Base-2407

Quantized

norallm/normistral-11b-warm

Finetuned

norallm/normistral-11b-long

Finetuned

(1)

this model