Falcon 7B LLM Fine Tune Model
Model description
This model is a fine-tuned version of the tiiuae/falcon-7b
model using the QLoRa library and the PEFT library.
Intended uses & limitations
How to use
- The model and tokenizer are loaded using the
from_pretrained
methods. - The padding token of the tokenizer is set to be the same as the end-of-sentence (EOS) token.
- The
generation_config
is used to set parameters for generating responses, such as the maximum number of new tokens to generate and the temperature for the softmax function. - The prompt is defined, encoded using the tokenizer, and passed to the
model.generate
method to generate a response. - The generated response is decoded using the tokenizer and printed.
# Import necessary classes and functions
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftConfig, PeftModel
# Specify the model
PEFT_MODEL = "hipnologo/falcon-7b-qlora-finetune-chatbot"
# Load the PEFT config
config = PeftConfig.from_pretrained(PEFT_MODEL)
# Load the base model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
config.based_model_name_or_path,
return_dict=True,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# Set the padding token to be the same as the EOS token
tokenizer.pad_token = tokenizer.eos_token
# Load the PEFT model
model = PeftModel.from_pretrained(model, PEFT_MODEL)
# Set the generation parameters
generation_config = model.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id
# Define the prompt
prompt = """
<human>: How can I create an account?
<assistant>:
""".strip()
print(prompt)
# Encode the prompt
encoding = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate a response
with torch.inference_mode():
outputs = model.generate(
input_ids=encoding.input_ids,
attention_mask=encoding.attention_mask,
generation_config=generation_config,
)
# Print the generated response
print(tokenizer.decode(outputs[0],skip_special_tokens=True))
Training procedure
The model was fine-tuned on the Ecommerce-FAQ-Chatbot-Dataset using the bitsandbytes
quantization config:
- load_in_8bit:
False
- load_in_4bit:
True
- llm_int8_threshold:
6.0
- llm_int8_skip_modules:
None
- llm_int8_enable_fp32_cpu_offload:
False
- llm_int8_has_fp16_weight:
False
- bnb_4bit_quant_type:
nf4
- bnb_4bit_use_double_quant:
True
- bnb_4bit_compute_dtype:
bfloat16
Framework versions
- PEFT 0.4.0.dev0
Evaluation results
The model was trained for 80 steps, with the training loss decreasing from 0.184 to nearly 0. The final training loss was 0.03094411873175886
.
- Trainable params: 2359296
- All params: 3611104128
- Trainable%: 0.06533447711203746
License
This model is licensed under Apache 2.0. Please see the LICENSE for more information.
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for hipnologo/falcon-7b-qlora-finetune-chatbot
Base model
tiiuae/falcon-7b