Edit model card

This repo contains a SHARDED version of: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k

Huge thanks to the publishers for their amazing work, all credits go to them: https://huggingface.co/NousResearch

Model Card: Nous-Yarn-Mistral-7b-128k

Preprint (arXiv)
GitHub yarn

Model Description

Nous-Yarn-Mistral-7b-128k is a state-of-the-art language model for long context, further pretrained on long context data for 1500 steps using the YaRN extension method. It is an extension of Mistral-7B-v0.1 and supports a 128k token context window.

To use, pass trust_remote_code=True when loading the model, for example

model = AutoModelForCausalLM.from_pretrained("NousResearch/Yarn-Mistral-7b-128k",
  use_flash_attention_2=True,
  torch_dtype=torch.bfloat16,
  device_map="auto",
  trust_remote_code=True)

In addition you will need to use the latest version of transformers (until 4.35 comes out)

pip install git+https://github.com/huggingface/transformers

Benchmarks

Long context benchmarks:

Model Context Window 8k PPL 16k PPL 32k PPL 64k PPL 128k PPL
Mistral-7B-v0.1 8k 2.96 - - - -
Yarn-Mistral-7b-64k 64k 3.04 2.65 2.44 2.20 -
Yarn-Mistral-7b-128k 128k 3.08 2.68 2.47 2.24 2.19

Short context benchmarks showing that quality degradation is minimal:

Model Context Window ARC-c Hellaswag MMLU Truthful QA
Mistral-7B-v0.1 8k 59.98 83.31 64.16 42.15
Yarn-Mistral-7b-64k 64k 59.38 81.21 61.32 42.50
Yarn-Mistral-7b-128k 128k 58.87 80.58 60.64 42.46

Collaborators

The authors would like to thank LAION AI for their support of compute for this model. It was trained on the JUWELS supercomputer.

Downloads last month
17
Safetensors
Model size
7.24B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for yanismiraoui/Yarn-Mistral-7b-128k-sharded

Merges
3 models

Dataset used to train yanismiraoui/Yarn-Mistral-7b-128k-sharded