Sparse-Llama-3.1-8B-evolcodealpaca-2of4
Model Overview
- Model Architecture: Llama-3.1-8B
- Input: Text
- Output: Text
- Model Optimizations:
- Sparsity: 2:4
- Release Date: 11/21/2024
- Version: 1.0
- License(s): llama3.1
- Model Developers: Neural Magic
This is a code completion AI model obtained by fine-tuning the 2:4 sparse Sparse-Llama-3.1-8B-2of4 on the evol-codealpaca-v1 dataset. On the HumanEval benchmark, it achieves a pass@1 of 49.1, compared to 48.5 for the fine-tuned dense model Llama-3.1-8B-evolcodealpaca — demonstrating over 100% accuracy recovery.
Model Optimizations
This inherits the optimizations from its parent, Sparse-Llama-3.1-8B-2of4. Namely, all linear operators within transformer blocks were pruned to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned.
Deployment with vLLM
This model can be deployed efficiently using the vLLM backend. vLLM aslo supports OpenAI-compatible serving. See the documentation for more details.
Evaluation
This model was evaluated on Neural Magic's fork of EvalPlus.
Accuracy
Human Benchmark
Metric | Llama-3.1-8B-evolcodealpaca | Sparse-Llama-3.1-8B-evolcodealpaca-2of4 |
HumanEval pass@1 | 48.5 | 49.1 |
HumanEval+ pass@1 | 44.2 | 46.3 |
- Downloads last month
- 0