nvidia
/

Mistral-NeMo-12B-Base

Model card Files Files and versions Community

Mistral-NeMo-12B-Base / README.md

shrimai19's picture

Update README.md

f96a64a verified 5 months ago

|

2.75 kB

	---
	license: apache-2.0
	tags:
	- nvidia
	---

	## Mistral-NeMo-12B-Base

	[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)[![Model size](https://img.shields.io/badge/Params-12B-green)](#model-architecture)[![Language](https://img.shields.io/badge/Language-Multilingual-green)](#datasets)

	### Model Overview:

	Mistral-NeMo-12B-Base is a Large Language Model (LLM) composed of 12B parameters, trained jointly by NVIDIA and Mistral AI. It significantly outperforms existing models smaller or similar in size.

	Key features
	- Released under the Apache 2 License
	- Pre-trained and instructed versions
	- Trained with a 128k context window
	- Trained on a large proportion of multilingual and code data

	### Intended use

	Mistral-NeMo-12B-Base is a completion model intended for use in over 80+ programming languages and designed for global, multilingual applications. It is fast, trained on function-calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It is compatible with [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html). For best performance on a given task, users are encouraged to customize the model using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner). Refer to the [documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron/index.html) for examples.

	Model Developer: [NVIDIA](https://www.nvidia.com/en-us/) and [MistralAI](https://mistral.ai/)

	Model Dates: Mistral-NeMo-12B-Base was trained between 2023 and July 2024.

	### Model Architecture:

	Mistral-NeMo-12B-Base is a transformer model, with the following architecture choices:

	- Layers: 40
	- Dim: 5,120
	- Head dim: 128
	- Hidden dim: 14,436
	- Activation Function: SwiGLU
	- Number of heads: 32
	- Number of kv-heads: 8 (GQA)
	- Rotary embeddings (theta = 1M)
	- Vocabulary size: 2**17 ~= 128k

	Architecture Type: Transformer Decoder (auto-regressive language model)

	### Evaluation Results

	Main Benchmarks
	- HellaSwag (0-shot): 83.5%
	- Winogrande (0-shot): 76.8%
	- OpenBookQA (0-shot): 60.6%
	- CommonSenseQA (0-shot): 70.4%
	- TruthfulQA (0-shot): 50.3%
	- MMLU (5-shot): 68.0%
	- TriviaQA (5-shot): 73.8%
	- NaturalQuestions (5-shot): 31.2%

	Multilingual benchmarks

	Multilingual MMLU in 5-shot setting:
	- French: 62.3%
	- German: 62.7%
	- Spanish: 64.6%
	- Italian: 61.3%
	- Portuguese: 63.3%
	- Russian: 59.2%
	- Chinese: 59.0%
	- Japanese: 59.0%