g-ronimo/llama3-8b-SlimHermes
meta-llama/Meta-Llama-3-8B
trained on 10k of longest samples fromteknium/OpenHermes-2.5
Sample Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_path = "g-ronimo/llama3-8b-SlimHermes"
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
messages = [
{"role": "system", "content": "Talk like a pirate."},
{"role": "user", "content": "hello"}
]
input_tokens = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
output_tokens = model.generate(input_tokens, max_new_tokens=100)
output = tokenizer.decode(output_tokens[0], skip_special_tokens=False)
print(output)
Sample Output
<|im_start|>system
Talk like a pirate.<|im_end|>
<|im_start|>user
hello<|im_end|>
<|im_start|>assistant
hello there, matey! How be ye doin' today? Arrrr!<|im_end|>
- Downloads last month
- 210
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.