Model Details
This is an adapter for meta-llama/Meta-Llama-3-8B fine-tuned for function calling on xLAM. This adapter is undertrained. Its main purpose is for testing function calling capabilities of LLMs.
import torch, os
from peft import PeftModel
from transformers import (
AutoModelForCausalLM,
AutoTokenizer
)
#use bf16 and FlashAttention if supported
if torch.cuda.is_bf16_supported():
os.system('pip install flash_attn')
compute_dtype = torch.bfloat16
attn_implementation = 'flash_attention_2'
else:
compute_dtype = torch.float16
attn_implementation = 'sdpa'
adapter= "kaitchup/Meta-Llama-3-8B-xLAM-Adapter"
model_name = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=compute_dtype,
device_map={"": 0},
attn_implementation=attn_implementation,
)
model = PeftModel.from_pretrained(model, adapter)
prompt = "<user>Check if the numbers 8 and 1233 are powers of two.</user>\n\n<tools>"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, do_sample=False, temperature=0.0, max_new_tokens=150)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
- Developed by: The Kaitchup
- Language(s) (NLP): English
- License: cc-by-4.0
- Downloads last month
- 277