metadata
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
inference: true
model_type: Llama
TinyLlama-1.1B-Chat-v1.0
This repo contains pruned model files for TinyLlama-1.1B-Chat-v1.0.
This model was pruned with SparseGPT, using SparseML.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
prompt = "How to make banana bread?"
formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
model_id = "nm-testing/TinyLlama-1.1B-Chat-v1.0-pruned50-24"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer(formatted_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.batch_decode(outputs)[0])
"""
<s> <|im_start|>user
How to make banana bread?<|im_end|>
<|im_start|>assistant
Banana bread is a delicious dessert that is made with bananas. Here is how to make banana bread:
1. Firstly, you need to cut bananas into small pieces.
"""