--- base_model: unsloth/phi-4-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl license: apache-2.0 language: - en --- # Uploaded model - **Developed by:** Haq Nawaz Malik - **License:** apache-2.0 - **Finetuned from model :** unsloth/phi-4-unsloth-bnb-4bit # Fine-tuned Phi-4 Model Documentation ## πŸ“Œ Introduction This documentation provides an in-depth overview of the **fine-tuned Phi-4 conversational AI model**, detailing its **training methodology, parameters, dataset, model deployment, and usage instructions**. ## πŸ”Ή Model Overview **Phi-4** is a transformer-based language model optimized for **natural language understanding and text generation**. We have fine-tuned it using **LoRA (Low-Rank Adaptation)** with the **Unsloth framework**, making it lightweight and efficient while preserving the base model's capabilities. ## πŸ”Ή Training Details ### **πŸ›  Fine-tuning Methodology** We employed **LoRA (Low-Rank Adaptation)** for fine-tuning, which significantly reduces the number of trainable parameters while retaining the model’s expressive power. ### **πŸ“‘ Dataset Used** - **Dataset Name**: `mlabonne/FineTome-100k` - **Dataset Size**: 100,000 examples - **Data Format**: Conversational AI dataset with structured prompts and responses. - **Preprocessing**: The dataset was standardized using `unsloth.chat_templates.standardize_sharegpt()` ### **πŸ”’ Training Parameters** | Parameter | Value | |----------------------|-------| | LoRA Rank (`r`) | 16 | | LoRA Alpha | 16 | | LoRA Dropout | 0 | | Target Modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` | | Max Sequence Length | 2048 | | Load in 4-bit | True | | Gradient Checkpointing | `unsloth` | | Fine-tuning Duration | **10 epochs** | | Optimizer Used | AdamW | | Learning Rate | 2e-5 | ## πŸ”Ή How to Load the Model To load the fine-tuned model, use the **Unsloth framework**: ```python from unsloth import FastLanguageModel from unsloth.chat_templates import get_chat_template from peft import PeftModel model_name = "Omarrran/lora_model" max_seq_length = 2048 load_in_4bit = True # Load model and tokenizer model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, load_in_4bit=load_in_4bit ) # Apply LoRA adapter model = FastLanguageModel.get_peft_model( model, r=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_alpha=16, lora_dropout=0, bias="none", use_gradient_checkpointing="unsloth" ) ``` ## NoTE : USE GPU ## πŸ”Ή Deploying the Model ### **πŸš€ Using Google Colab** 1. Install dependencies: ```bash pip install gradio transformers torch unsloth peft ``` 2. Load the model using the script above. 3. Run inference using the chatbot interface. ### **πŸš€ Deploy on Hugging Face Spaces** 1. Save the script as `app.py`. 2. Create a `requirements.txt` file with: ``` gradio transformers torch unsloth peft ``` 3. Upload the files to a new **Hugging Face Space**. 4. Select **Python environment** and click **Deploy**. ## πŸ”Ή Using the Model ### **πŸ—¨ Chatbot Interface (Gradio UI)** To interact with the fine-tuned model using **Gradio**, use: ```python import gradio as gr import torch from unsloth import FastLanguageModel from unsloth.chat_templates import get_chat_template from peft import PeftModel # Load the Base Model with Unsloth model_name = "Omarrran/lora_model" # Change this if needed max_seq_length = 2048 load_in_4bit = True # Use 4-bit quantization to save memory # Load model and tokenizer base_model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, load_in_4bit=load_in_4bit ) # Apply LoRA Adapter model = FastLanguageModel.get_peft_model( base_model, r=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_alpha=16, lora_dropout=0, bias="none", use_gradient_checkpointing="unsloth" ) # Apply Chat Formatting Template tokenizer = get_chat_template(tokenizer, chat_template="phi-4") # Chat Function def chat_with_model(user_input): try: inputs = tokenizer(user_input, return_tensors="pt") output = model.generate(**inputs, max_length=200) response = tokenizer.decode(output[0], skip_special_tokens=True) return response except Exception as e: return f"Error: {str(e)}" # Define Gradio Interface description = """ ### 🧠 Phi-4 Conversational AI Chatbot This chatbot is powered by **Unsloth's Phi-4 model**, optimized with **LoRA fine-tuning**. #### πŸ”Ή Features: βœ… **Lightweight LoRA adapter for efficiency** βœ… **Supports long-context conversations (2048 tokens)** βœ… **Optimized with 4-bit quantization for fast inference** #### πŸ”Ή Example Questions: - "What is the capital of France?" - "Tell me a joke!" - "Explain black holes in simple terms." """ examples = [ "Hello, how are you?", "What is the capital of France?", "Tell me a joke!", "What is quantum physics?", "Translate 'Hello' to French." ] # Launch Gradio UI demo = gr.Interface( fn=chat_with_model, inputs=gr.Textbox(label="Your Message", placeholder="Type something here..."), outputs=gr.Textbox(label="Chatbot's Response"), title="πŸ”Ή HNM_Phi_4_finetuned", description=description, examples=examples, allow_flagging="never" ) if __name__ == "__main__": demo.launch() ``` ## πŸ“Œ Conclusion This **fine-tuned Phi-4 model** delivers **optimized conversational AI capabilities** using **LoRA fine-tuning and Unsloth’s 4-bit quantization**. The model is **lightweight, memory-efficient**, and suitable for chatbot applications in both **research and production environments**.