NavinspireIA/NavinLLM-400M

Model Overview

NavinLLM is a bilingual (English/French) Mamba2-Hybrid model that integrates Mamba2, attention, and hybrid layers, designed with a sequence length of 4K tokens. The model training methodology is based on the techniques outlined in the “An Empirical Study of Mamba-based Language Models”. Each version of the NavinLLM models has been trained on varying amounts of data, ranging from 10 billion tokens for the smallest model (200M parameters) to 800 billion tokens for the largest (7B parameters). These models are provided as base models without fine-tuning, except for the instruct version.

Versions

NavinLLM-200M: 200M Parameters Hybrid trained on 10B tokens (bilingual).
NavinLLM-400M : 400M Parameters Hybrid trained on 20B tokens (bilingual).
NavinLLM-2B : 2B Parameters pure SSM trained on 200B tokens (French).
NavinLLM-7B : 7B Parameters Hybrid trained on 800B tokens (bilingual).
NavinLLM-7B-Instruct : Fine-tuned on several tasks (Summarization / QA / Translation...)

Tokenizer

NavinLLM was trained using a custom SentencePiece Tokenizer, with two versions available: a 32k token vocabulary for more efficient representation, and a 52k token vocabulary designed to accommodate a broader range of tokens and linguistic variability.

Datasets

NavinLLM was trained on proprietary datasets, consisting of both publicly available data and synthetically generated content.

NavinspireIA
/

NavinLLM-400M

You need to agree to share your contact information to access this model

Model Overview

Versions

Tokenizer

Datasets

Collection including NavinspireIA/NavinLLM-400M

NavinLLM SSM Mamba Models