metadata
base_model: meta-llama/Llama-2-7b-hf
inference: true
model_type: llama
datasets:
- cerebras/SlimPajama-627B
tags:
- sparse
Llama-2-7b-pruned50-retrained
This repo contains model files for a Llama 2 7B model that has had 50% of the parameters pruned in one-shot with SparseGPT, then retrained by Cerebras with 45B tokens from SlimPajama while maintaining sparsity.
Authors: Neural Magic, Cerebras
Usage
Below we share some code snippets on how to get quickly started with running the model.
Fine-tuning examples
Coming soon.
Running the model
# pip install transformers accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("neuralmagic/Llama-2-7b-pruned50-retrained")
model = AutoModelForCausalLM.from_pretrained("neuralmagic/Llama-2-7b-pruned50-retrained", device_map="auto")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Evaluation Benchmark Results
Model evaluation metrics and results.
Benchmark | Metric | Llama-2-7b | Llama-2-7b-pruned50-retrained |
---|---|---|---|
MMLU | 5-shot, top-1 | xxxx | xxxx |
HellaSwag | 0-shot | xxxx | xxxx |
WinoGrande | partial score | xxxx | xxxx |
ARC-c | xxxx | xxxx | |
TruthfulQA | 5-shot | xxxx | xxxx |
HumanEval | pass@1 | xxxx | xxxx |
GSM8K | maj@1 | xxxx | xxxx |
------------------------------ | ------------- | ----------- | --------- |
Average | xxxx | xxxx |
Model Training Data
Coming soon.