README.md · nvidia/Mistral-NeMo-12B-Base at f96a64a60e7d4deebefb39f209395fab11332112

metadata

license: apache-2.0
tags:
  - nvidia

Mistral-NeMo-12B-Base

Model Overview:

Mistral-NeMo-12B-Base is a Large Language Model (LLM) composed of 12B parameters, trained jointly by NVIDIA and Mistral AI. It significantly outperforms existing models smaller or similar in size.

Key features

Released under the Apache 2 License
Pre-trained and instructed versions
Trained with a 128k context window
Trained on a large proportion of multilingual and code data

Intended use

Mistral-NeMo-12B-Base is a completion model intended for use in over 80+ programming languages and designed for global, multilingual applications. It is fast, trained on function-calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It is compatible with NVIDIA NeMo Framework. For best performance on a given task, users are encouraged to customize the model using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using NeMo-Aligner. Refer to the documentation for examples.

Model Developer: NVIDIA and MistralAI

Model Dates: Mistral-NeMo-12B-Base was trained between 2023 and July 2024.

Model Architecture:

Mistral-NeMo-12B-Base is a transformer model, with the following architecture choices:

Layers: 40
Dim: 5,120
Head dim: 128
Hidden dim: 14,436
Activation Function: SwiGLU
Number of heads: 32
Number of kv-heads: 8 (GQA)
Rotary embeddings (theta = 1M)
Vocabulary size: 2**17 ~= 128k

Architecture Type: Transformer Decoder (auto-regressive language model)

Evaluation Results

Main Benchmarks

HellaSwag (0-shot): 83.5%
Winogrande (0-shot): 76.8%
OpenBookQA (0-shot): 60.6%
CommonSenseQA (0-shot): 70.4%
TruthfulQA (0-shot): 50.3%
MMLU (5-shot): 68.0%
TriviaQA (5-shot): 73.8%
NaturalQuestions (5-shot): 31.2%

Multilingual benchmarks

Multilingual MMLU in 5-shot setting:

French: 62.3%
German: 62.7%
Spanish: 64.6%
Italian: 61.3%
Portuguese: 63.3%
Russian: 59.2%
Chinese: 59.0%
Japanese: 59.0%