Update README.md
Browse files
README.md
CHANGED
@@ -5,18 +5,18 @@ tags:
|
|
5 |
- vllm
|
6 |
---
|
7 |
|
8 |
-
#
|
9 |
|
10 |
This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
|
11 |
It was then quantized using [AutoFP8](https://github.com/neuralmagic/AutoFP8) to FP8 weights and activations with per-tensor scales, calibrated on UltraChat2k.
|
12 |
|
13 |
-
**Note:** The unquantized [
|
14 |
|
15 |
## Evaluation Benchmark Results
|
16 |
|
17 |
Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
|
18 |
|
19 |
-
| Benchmark | Meta-Llama-3-8B |
|
20 |
|:----------------------------------------------:|:-----------:|:-----------------------------:|:-----------------------------:|
|
21 |
| [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot | 59.47% | 57.76% | 58.02% |
|
22 |
| [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot | 65.29% | 60.44% | 60.71% |
|
|
|
5 |
- vllm
|
6 |
---
|
7 |
|
8 |
+
# SparseLlama-3-8B-pruned_50.2of4-FP8
|
9 |
|
10 |
This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
|
11 |
It was then quantized using [AutoFP8](https://github.com/neuralmagic/AutoFP8) to FP8 weights and activations with per-tensor scales, calibrated on UltraChat2k.
|
12 |
|
13 |
+
**Note:** The unquantized [SparseLlama-3-8B-pruned_50.2of4-FP8](https://huggingface.co/nm-testing/SparseLlama-3-8B-pruned_50.2of4) is still a work in progress and subject to change. This FP8 model will be updated once the unquantized model is updated too.
|
14 |
|
15 |
## Evaluation Benchmark Results
|
16 |
|
17 |
Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
|
18 |
|
19 |
+
| Benchmark | Meta-Llama-3-8B | SparseLlama-3-8B-pruned_50.2of4 | SparseLlama-3-8B-pruned_50.2of4-FP8<br>(this model) |
|
20 |
|:----------------------------------------------:|:-----------:|:-----------------------------:|:-----------------------------:|
|
21 |
| [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot | 59.47% | 57.76% | 58.02% |
|
22 |
| [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot | 65.29% | 60.44% | 60.71% |
|