base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
pipeline_tag: text-generation
metrics:
- accuracy
Model Description:
Pruned from meta-llama/Meta-Llama-3-8B-Instruct
using the Random Pruner from LLM-Pruner: On the Structural Pruning of Large Language Models
Done to test viability of LLM-Pruner for task-agnostic, low resource Generative AI for Commercial and Personal Use
compared to using out-of-the-box models like meta-llama/Llama-3.2-3B-Instruct
Our presentation slides may be found here
To replicate,
- First, clone the official implementation and run:
python llama3.py --pruning_ratio 0.25 \
--device cuda --eval_device cuda \
--base_model meta-llama/Meta-Llama-3-8B-Instruct \
--block_wise --block_mlp_layer_start 4 --block_mlp_layer_end 30 \
--block_attention_layer_start 4 --block_attention_layer_end 30 \
--save_ckpt_log_name llama3_prune \
--pruner_type random \
--max_seq_len 512 \
--test_after_train --test_before_train --save_model
to get the pruned model.
NOTE: We removed 'ptb'
from the datasets in llama3.py
since it requires foreign code to load.
- Then, to post-train, follow the official implementation, section 2
Benchmark Results
Benchmark Evaluation: The model follows the original paper's evaluation and perform zero-shot task classification on 5 common sense reasoning datasets that doesn't require foreign code to load:
Model | BoolQ | HellaSwag | ARC-e | ARC-c | OBQA | Average Accuracy |
---|---|---|---|---|---|---|
Llama-3-6.6B-R-Pruned | 74.25 | 67.59 | 71.21 | 42.49 | 38.8 | 58.87 |
Usage:
Follow the official implementation for usage,
section Pruned Model with Post-Training
.