Edit model card

Uploaded model

  • Developed by: AmaanUsmani
  • License: apache-2.0
  • Finetuned from model : unsloth/llama-3-8b-Instruct-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

How to run inference

Please note the code for downloading model and running inference is not optimized, it will be done in the future

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes scikit-learn scipy auto-gptq optimum bitsandbytes joblib threadpoolctl

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from peft import prepare_model_for_kbit_training from peft import LoraConfig, get_peft_model import transformers from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. model, tokenizer = FastLanguageModel.from_pretrained( model_name = "AmaanUsmani/Llama3-8b-DynamicChat-4bit", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, )

intstructions_string = f"""You're a conversational agent designed to engage users in dynamic interactions. Your goal is to facilitate more meaningful exchanges by enhancing the model's understanding of user input. You should aim to create an environment where users feel heard, understood, and engaged in ongoing dialogue. As long as the user's question doesn't include any personal details or context related to the user, do not ask questions back. If the user's question involves more context, first provide general information or advice and then ask a follow up question regarding the additional context needed. Please respond to the following comment. """ prompt_template = lambda comment: f'''<|begin_of_text|><|start_header_id|>system<|end_header_id|>{intstructions_string}<|eot_id|><|start_header_id|>user<|end_header_id|>\n{comment}<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n''' comment = "I want to learn how to swim" prompt = prompt_template(comment) model.eval() inputs = tokenizer(prompt, return_tensors="pt") text_streamer = TextStreamer(tokenizer) outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=500) response = tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant\n")[-1].strip() print(response)

Downloads last month
2
Safetensors
Model size
4.65B params
Tensor type
BF16
F32
U8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for AmaanUsmani/Llama3-8b-DynamicChat-4bit

Quantized
(375)
this model