wang7776
/

Llama-2-7b-chat-hf-20-attention-sparsity

Text Generation

text-generation-inference

Model card Files Files and versions Community

wang7776 commited on Feb 5, 2024

Commit

24543df

·

verified ·

1 Parent(s): d095cb8

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -21,6 +21,9 @@ tags:
 - llama-2
 license: other
 ---
 # **Llama 2**
 Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.

 - llama-2
 license: other
 ---
+# Overview
+This model has been pruned to 20% sparsity using the [Wanda pruning method](https://arxiv.org/abs/2306.11695) on attention layers. This method requires no retraining or weight updates and still achieves competitive performance. A link to the base model can be found [here](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
 # **Llama 2**
 Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.