File size: 8,345 Bytes

---
library_name: transformers
tags: []
---


## Model Details

### Model Description

This model is created for answering the KUET(Khulna University of Engineering & Technology) information.

- **Developed by:** Md. Shahidul Salim
- **Model type:** Question answering
- **Language(s) (NLP):** English
- **Finetuned from model:** mistralai/Mistral-7B-Instruct-v0.1


## How to Get Started with the Model
```
import transformers
from transformers import AutoTokenizer
model_name="shahidul034/KUET_LLM_Mistral"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
pipe = pipeline("text-generation",
                model=full_output,
                tokenizer= tokenizer,
                torch_dtype=torch.bfloat16,
                device_map="auto",
                max_new_tokens = 512,
                do_sample=True,
                top_k=30,
                num_return_sequences=1,
                eos_token_id=tokenizer.eos_token_id
                )
from langchain import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline = pipe, model_kwargs = {'temperature':0})
from langchain.llms import HuggingFaceTextGenInference
from langchain.llms import HuggingFaceTextGenInference
from langchain import PromptTemplate
from langchain.schema import StrOutputParser

template = """
    <s>[INST] <<SYS>>
    {role}
    <</SYS>>       
    {text} [/INST]
"""

prompt = PromptTemplate(
    input_variables = [
        "role", 
        "text"
    ],
    template = template,
)
role = "You are a KUET authority managed chatbot, help users by answering their queries about KUET."
chain = prompt | llm | StrOutputParser()
ques="What is KUET?"
ans=chain.invoke({"role": role,"text":ques})
print(ans)
```

[More Information Needed]

## Training Details

### Training Data

Custom dataset, which is collected from the KUET website.

### Training Procedure 

```
import os
import torch
from datasets import load_dataset, Dataset
import pandas as pd
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from trl import SFTTrainer
import transformers
# from peft import AutoPeftModelForCausalLM
from transformers import GenerationConfig
from pynvml import *
import glob
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
lora_output = 'models/lora_KUET_LLM_Mistral'
full_output = 'models/full_KUET_LLM_Mistral'
DEVICE = 'cuda'
bnb_config = BitsAndBytesConfig(  
    load_in_8bit= True,
#     bnb_4bit_quant_type= "nf4",
#     bnb_4bit_compute_dtype= torch.bfloat16,
#     bnb_4bit_use_double_quant= False,
)
model = AutoModelForCausalLM.from_pretrained(
        base_model,
        # load_in_4bit=True,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
)
model.config.use_cache = False # silence the warnings
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token

### read csv with Prompt, Answer pair 
data_location = r"/home/sdm/Desktop/shakib/KUET LLM/data/dataset_shakibV2.xlsx" ## replace here
data_df=pd.read_excel( data_location )
def formatted_text(x):
    temp = [
    # {"role": "system", "content": "Answer as a medical assistant. Respond concisely."},
    {"role": "user", "content": """Answer the question concisely as a medical assisstant.
     Question: """ + x["Prompt"]},
    {"role": "assistant", "content": x["Reply"]}
    ]
    return tokenizer.apply_chat_template(temp, add_generation_prompt=False, tokenize=False)

### set formatting
data_df["text"] = data_df[["Prompt", "Reply"]].apply(lambda x: formatted_text(x), axis=1) ## replace Prompt and Answer if collected dataset has different column names
print(data_df.iloc[0])
dataset = Dataset.from_pandas(data_df)
# Set PEFT adapter config (16:32)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# target modules are currently selected for zephyr base model
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj","k_proj","o_proj","gate_proj","up_proj","down_proj"],   # target all the linear layers for full finetuning
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM")

# stabilize output layer and layernorms
model = prepare_model_for_kbit_training(model, 8)
# Set PEFT adapter on model (Last step)
model = get_peft_model(model, config)
# Set Hyperparameters
MAXLEN=512
BATCH_SIZE=4
GRAD_ACC=4
OPTIMIZER='paged_adamw_8bit' # save memory
LR=5e-06                      # slightly smaller than pretraining lr | and close to LoRA standard
# Set training config
training_config = transformers.TrainingArguments(per_device_train_batch_size=BATCH_SIZE,
                                                 gradient_accumulation_steps=GRAD_ACC,
                                                 optim=OPTIMIZER,
                                                 learning_rate=LR,
                                                 fp16=True,            # consider compatibility when using bf16
                                                 logging_steps=10,
                                                 num_train_epochs = 2,
                                                 output_dir=lora_output,
                                                 remove_unused_columns=True,
                                                 )

# Set collator
data_collator = transformers.DataCollatorForLanguageModeling(tokenizer,mlm=False)

# Setup trainer
trainer = SFTTrainer(model=model,
                               train_dataset=dataset,
                               data_collator=data_collator,
                               args=training_config,
                               dataset_text_field="text",
                            #    callbacks=[early_stop], need to learn, lora easily overfits
                              )

trainer.train()
trainer.save_model(lora_output)

# Get peft config
from peft import PeftConfig
config = PeftConfig.from_pretrained(lora_output)
# Get base model
model = transformers.AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
tokenizer = transformers.AutoTokenizer.from_pretrained(base_model)
# Load the Lora model
from peft import PeftModel
model = PeftModel.from_pretrained(model, lora_output)

# Get tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(config.base_model_name_or_path)
merged_model = model.merge_and_unload()
merged_model.save_pretrained(full_output)
tokenizer.save_pretrained(full_output)

```

#### Preprocessing [optional]

[More Information Needed]


#### Training Hyperparameters

- The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 24
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 96
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2
- mixed_precision_training: Native AMP

#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[More Information Needed]

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

194 questions are generated by students.

[More Information Needed]

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]


## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hours used:** 2 hours


#### Hardware
RTX 4090