File size: 1,802 Bytes
39e71b0 2296696 39e71b0 340f03d c694902 39e71b0 2296696 39e71b0 0f28bc4 32b4f00 aa39935 32b4f00 7e69488 32b4f00 940fac7 32b4f00 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
library_name: peft
base_model: LSX-UniWue/LLaMmlein_1B
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: LLaMmlein_1b_chat_guanako
results: []
datasets:
- LSX-UniWue/Guanako
language:
- de
license: other
---
# LLäMmlein 1B Chat
This is a chat adapter for the German Tinyllama 1B language model.
Find more details on our [page](https://www.informatik.uni-wuerzburg.de/datascience/projects/nlp/llammlein/) and our [preprint](arxiv.org/abs/2411.11171)!
We also merged the adapter and converted it to GGUF [here](LSX-UniWue/LLaMmlein_1B_alternative_formats).
## Run it
```py
import torch
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.manual_seed(42)
# script config
base_model_name = "LSX-UniWue/LLaMmlein_1B"
chat_adapter_name = "LSX-UniWue/LLaMmlein_1B_chat_guanako"
device = "cuda" # or mps
# chat history
messages = [
{
"role": "user",
"content": """Na wie geht's?""",
},
]
# load model
config = PeftConfig.from_pretrained(chat_adapter_name)
base_model = model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.bfloat16,
device_map=device,
)
base_model.resize_token_embeddings(32064)
model = PeftModel.from_pretrained(base_model, chat_adapter_name)
tokenizer = AutoTokenizer.from_pretrained(chat_adapter_name)
# encode message in "ChatML" format
chat = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True,
).to(device)
# generate response
print(
tokenizer.decode(
model.generate(
chat,
max_new_tokens=300,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)[0],
skip_special_tokens=False,
)
)
``` |