Text Generation
Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Overview

NavinLLM is a bilingual (English/French) Mamba2-Hybrid model that integrates Mamba2, attention, and hybrid layers, designed with a sequence length of 4K tokens. The model training methodology is based on the techniques outlined in the “An Empirical Study of Mamba-based Language Models”. Each version of the NavinLLM models has been trained on varying amounts of data, ranging from 10 billion tokens for the smallest model (200M parameters) to 800 billion tokens for the largest (7B parameters). These models are provided as base models without fine-tuning, except for the instruct version.

Versions

NavinLLM-200M: 200M Parameters Hybrid trained on 10B tokens (bilingual).
NavinLLM-400M : 400M Parameters Hybrid trained on 20B tokens (bilingual).
NavinLLM-2B : 2B Parameters pure SSM trained on 200B tokens (French).
NavinLLM-7B : 7B Parameters Hybrid trained on 800B tokens (bilingual).
NavinLLM-7B-Instruct : Fine-tuned on several tasks (Summarization / QA / Translation...)

Tokenizer

NavinLLM was trained using a custom SentencePiece Tokenizer, with two versions available: a 32k token vocabulary for more efficient representation, and a 52k token vocabulary designed to accommodate a broader range of tokens and linguistic variability.

Datasets

NavinLLM was trained on proprietary datasets, consisting of both publicly available data and synthetically generated content.

Downloads last month
2
Inference Examples
Unable to determine this model's library. Check the docs .

Collection including NavinspireIA/NavinLLM-400M