YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Quantization made by Richard Erkhov.
st-llama-1-5.5b-ppl - GGUF
- Model creator: https://huggingface.co/nota-ai/
- Original model: https://huggingface.co/nota-ai/st-llama-1-5.5b-ppl/
Name | Quant method | Size |
---|---|---|
st-llama-1-5.5b-ppl.Q2_K.gguf | Q2_K | 1.94GB |
st-llama-1-5.5b-ppl.IQ3_XS.gguf | IQ3_XS | 2.15GB |
st-llama-1-5.5b-ppl.IQ3_S.gguf | IQ3_S | 2.26GB |
st-llama-1-5.5b-ppl.Q3_K_S.gguf | Q3_K_S | 2.26GB |
st-llama-1-5.5b-ppl.IQ3_M.gguf | IQ3_M | 2.38GB |
st-llama-1-5.5b-ppl.Q3_K.gguf | Q3_K | 2.52GB |
st-llama-1-5.5b-ppl.Q3_K_M.gguf | Q3_K_M | 2.52GB |
st-llama-1-5.5b-ppl.Q3_K_L.gguf | Q3_K_L | 2.75GB |
st-llama-1-5.5b-ppl.IQ4_XS.gguf | IQ4_XS | 2.79GB |
st-llama-1-5.5b-ppl.Q4_0.gguf | Q4_0 | 2.93GB |
st-llama-1-5.5b-ppl.IQ4_NL.gguf | IQ4_NL | 2.94GB |
st-llama-1-5.5b-ppl.Q4_K_S.gguf | Q4_K_S | 2.95GB |
st-llama-1-5.5b-ppl.Q4_K.gguf | Q4_K | 3.12GB |
st-llama-1-5.5b-ppl.Q4_K_M.gguf | Q4_K_M | 3.12GB |
st-llama-1-5.5b-ppl.Q4_1.gguf | Q4_1 | 3.24GB |
st-llama-1-5.5b-ppl.Q5_0.gguf | Q5_0 | 3.55GB |
st-llama-1-5.5b-ppl.Q5_K_S.gguf | Q5_K_S | 3.55GB |
st-llama-1-5.5b-ppl.Q5_K.gguf | Q5_K | 3.65GB |
st-llama-1-5.5b-ppl.Q5_K_M.gguf | Q5_K_M | 3.65GB |
st-llama-1-5.5b-ppl.Q5_1.gguf | Q5_1 | 3.87GB |
st-llama-1-5.5b-ppl.Q6_K.gguf | Q6_K | 4.22GB |
st-llama-1-5.5b-ppl.Q8_0.gguf | Q8_0 | 5.47GB |
Original model description:
Shortened LLaMA Model Card
Shortened LLaMA is a depth-pruned version of LLaMA models & variants for efficient text generation.
- Developed by: Nota AI
- License: Non-commercial license
- Repository: https://github.com/Nota-NetsPresso/shortened-llm
- Paper: https://arxiv.org/abs/2402.02834
Compression Method
After identifying unimportant Transformer blocks, we perform one-shot pruning and light LoRA-based retraining.
Click to see a method figure.
Model Links
Source Model |
Pruning Ratio |
Pruning Criterion |
HF Models Link |
---|---|---|---|
LLaMA-1-7B | 20% | PPL | nota-ai/st-llama-1-5.5b-ppl |
LLaMA-1-7B | 20% | Taylor+ | nota-ai/st-llama-1-5.5b-taylor |
Vicuna-v1.3-7B | 20% | PPL | nota-ai/st-vicuna-v1.3-5.5b-ppl |
Vicuna-v1.3-7B | 20% | Taylor+ | nota-ai/st-vicuna-v1.3-5.5b-taylor |
Vicuna-v1.3-13B | 21% | PPL | nota-ai/st-vicuna-v1.3-10.5b-ppl |
Vicuna-v1.3-13B | 21% | Taylor+ | nota-ai/st-vicuna-v1.3-10.5b-taylor |
Zero-shot Performance & Efficiency Results
- EleutherAI/lm-evaluation-harness version 3326c54
License
- All rights related to this repository and the compressed models are reserved by Nota Inc.
- The intended use is strictly limited to research and non-commercial projects.
Acknowledgments
- LLM-Pruner, which utilizes LM Evaluation Harness, PEFT, and Alpaca-LoRA. Thanks for the pioneering work on structured pruning of LLMs!
- Meta AI's LLaMA and LMSYS Org's Vicuna. Thanks for the open-source LLMs!
Citation
@article{kim2024shortened,
title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
journal={arXiv preprint arXiv:2402.02834},
year={2024},
url={https://arxiv.org/abs/2402.02834}
}
@article{kim2024mefomo,
title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
journal={ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)},
year={2024},
url={https://openreview.net/forum?id=18VGxuOdpu}
}
- Downloads last month
- 6