Quantization made by Richard Erkhov. [Github](https://github.com/RichardErkhov) [Discord](https://discord.gg/pvy7H8DZMG) [Request more models](https://github.com/RichardErkhov/quant_request) Llama-3.1-Storm-8B - GGUF - Model creator: https://huggingface.co/unsloth/ - Original model: https://huggingface.co/unsloth/Llama-3.1-Storm-8B/ | Name | Quant method | Size | | ---- | ---- | ---- | | [Llama-3.1-Storm-8B.Q2_K.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q2_K.gguf) | Q2_K | 2.96GB | | [Llama-3.1-Storm-8B.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.IQ3_XS.gguf) | IQ3_XS | 3.28GB | | [Llama-3.1-Storm-8B.IQ3_S.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.IQ3_S.gguf) | IQ3_S | 3.43GB | | [Llama-3.1-Storm-8B.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q3_K_S.gguf) | Q3_K_S | 3.41GB | | [Llama-3.1-Storm-8B.IQ3_M.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.IQ3_M.gguf) | IQ3_M | 3.52GB | | [Llama-3.1-Storm-8B.Q3_K.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q3_K.gguf) | Q3_K | 3.74GB | | [Llama-3.1-Storm-8B.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q3_K_M.gguf) | Q3_K_M | 3.74GB | | [Llama-3.1-Storm-8B.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q3_K_L.gguf) | Q3_K_L | 4.03GB | | [Llama-3.1-Storm-8B.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.IQ4_XS.gguf) | IQ4_XS | 4.18GB | | [Llama-3.1-Storm-8B.Q4_0.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q4_0.gguf) | Q4_0 | 4.34GB | | [Llama-3.1-Storm-8B.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.IQ4_NL.gguf) | IQ4_NL | 4.38GB | | [Llama-3.1-Storm-8B.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q4_K_S.gguf) | Q4_K_S | 4.37GB | | [Llama-3.1-Storm-8B.Q4_K.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q4_K.gguf) | Q4_K | 4.58GB | | [Llama-3.1-Storm-8B.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q4_K_M.gguf) | Q4_K_M | 4.58GB | | [Llama-3.1-Storm-8B.Q4_1.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q4_1.gguf) | Q4_1 | 4.78GB | | [Llama-3.1-Storm-8B.Q5_0.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q5_0.gguf) | Q5_0 | 5.21GB | | [Llama-3.1-Storm-8B.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q5_K_S.gguf) | Q5_K_S | 5.21GB | | [Llama-3.1-Storm-8B.Q5_K.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q5_K.gguf) | Q5_K | 5.34GB | | [Llama-3.1-Storm-8B.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q5_K_M.gguf) | Q5_K_M | 5.34GB | | [Llama-3.1-Storm-8B.Q5_1.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q5_1.gguf) | Q5_1 | 5.65GB | | [Llama-3.1-Storm-8B.Q6_K.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q6_K.gguf) | Q6_K | 6.14GB | | [Llama-3.1-Storm-8B.Q8_0.gguf](https://huggingface.co/RichardErkhov/unsloth_-_Llama-3.1-Storm-8B-gguf/blob/main/Llama-3.1-Storm-8B.Q8_0.gguf) | Q8_0 | 7.95GB | Original model description: --- base_model: meta-llama/Meta-Llama-3.1-8B language: - en library_name: transformers license: llama3.1 tags: - llama-3 - llama - meta - facebook - unsloth - transformers --- # Finetune Llama 3.1, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth! We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing ## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Llama-3.1 8b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing) | 2.4x faster | 58% less | | **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/drive/1lN6hPQveB_mHSnTOYifygFcrO8C1bxq4?usp=sharing) | 2x faster | 50% less | | **Gemma-2 9b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing) | 2.4x faster | 58% less | ## Llama 3.1 Storm ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64c75c1237333ccfef30a602/tmOlbERGKP7JSODa6T06J.jpeg) Authors: [Ashvini Kumar Jindal](https://www.linkedin.com/in/ashvini-jindal-26653262/), [Pawan Kumar Rajpoot](https://www.linkedin.com/in/pawanrajpoot/), [Ankur Parikh](https://www.linkedin.com/in/ankurnlpexpert/), [Akshita Sukhlecha](https://www.linkedin.com/in/akshita-sukhlecha/) **🤗 Hugging Face Announcement Blog**: https://huggingface.co/blog/akjindal53244/llama31-storm8b **🚀Ollama:** `ollama run ajindal/llama3.1-storm:8b` ## TL;DR ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c75c1237333ccfef30a602/mDtDeiHwnBupw1k_n99Lf.png) We present the [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) model that outperforms Meta AI's [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) and [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) models significantly across diverse benchmarks as shown in the performance comparison plot in the next section. Our approach consists of three key steps: 1. **Self-Curation**: We applied two self-curation methods to select approximately 1 million high-quality examples from a pool of ~2.8 million open-source examples. **Our curation criteria focused on educational value and difficulty level, using the same SLM for annotation instead of larger models (e.g. 70B, 405B).** 2. **Targeted fine-tuning**: We performed [Spectrum](https://arxiv.org/abs/2406.06623)-based targeted fine-tuning over the Llama-3.1-8B-Instruct model. The Spectrum method accelerates training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), and freezing the remaining modules. In our work, 50% of layers are frozen. 3. **Model Merging**: We merged our fine-tuned model with the [Llama-Spark](https://huggingface.co/arcee-ai/Llama-Spark) model using [SLERP](https://huggingface.co/blog/mlabonne/merge-models#1-slerp) method. The merging method produces a blended model with characteristics smoothly interpolated from both parent models, ensuring the resultant model captures the essence of both its parents. [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) improves Llama-3.1-8B-Instruct across 10 diverse benchmarks. These benchmarks cover areas such as instruction-following, knowledge-driven QA, reasoning, truthful answer generation, and function calling. ## 🏆 Introducing Llama-3.1-Storm-8B [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) builds upon the foundation of Llama-3.1-8B-Instruct, aiming to enhance both conversational and function calling capabilities within the 8B parameter model class. As shown in the left subplot of the above figure, [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) model improves Meta-Llama-3.1-8B-Instruct across various benchmarks - Instruction-following ([IFEval](https://arxiv.org/abs/2311.07911)), Knowledge-driven QA benchmarks ([GPQA](https://arxiv.org/abs/2311.12022), [MMLU-Pro](https://arxiv.org/pdf/2406.01574)), Reasoning ([ARC-C](https://arxiv.org/abs/1803.05457), [MuSR](https://arxiv.org/abs/2310.16049), [BBH](https://arxiv.org/pdf/2210.09261)), Reduced Hallucinations ([TruthfulQA](https://arxiv.org/abs/2109.07958)), and Function-Calling ([BFCL](https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard)). This improvement is particularly significant for AI developers and enthusiasts who work with limited computational resources. We also benchmarked our model with the recently published model [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) built on top of the Llama-3.1-8B-Instruct model. As shown in the right subplot of the above figure, **Llama-3.1-Storm-8B outperforms Hermes-3-Llama-3.1-8B on 7 out of 9 benchmarks**, with Hermes-3-Llama-3.1-8B surpassing Llama-3.1-Storm-8B on the MuSR benchmark and both models showing comparable performance on the BBH benchmark. ## Llama-3.1-Storm-8B Model Strengths Llama-3.1-Storm-8B is a powerful generalist model useful for diverse applications. We invite the AI community to explore [Llama-3.1-Storm-8B](https://huggingface.co/collections/akjindal53244/storm-66ba6c96b7e24ecb592787a9) and look forward to seeing how it will be utilized in various projects and applications.
Model Strength | Relevant Benchmarks |
🎯 Improved Instruction Following | IFEval Strict (+3.93%) |
🌐 Enhanced Knowledge Driven Question Answering | GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%) |
🧠 Better Reasoning | ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%) |
🤖 Superior Agentic Capabilities | BFCL: Overall Acc (+7.92%), BFCL: AST Summary (+12.32%) |
🚫 Reduced Hallucinations | TruthfulQA (+9%) |