````markdown # lora_fine_tuned_phi-4_quantized_vision This repository contains a fine-tuned version of the **Phi-4** language model specifically adapted for **image-to-text generation**. The model has been fine-tuned using **LoRA (Low-Rank Adaptation)** on the **FGVC Aircraft** dataset, which consists of images of aircraft with corresponding textual descriptions. This fine-tuning process enables the model to generate more accurate and descriptive captions for aircraft images. **Key Features:** * **4-bit Quantization:** The model utilizes 4-bit quantization techniques to reduce its size and memory footprint, making it more efficient to deploy and use. * **LoRA:** Fine-tuning is performed with LoRA, which allows for efficient adaptation of the model while keeping the number of trainable parameters low. * **Image Captioning:** The model is specifically trained to generate textual descriptions (captions) for images of aircraft. **Intended Use Cases:** * **Image Captioning:** Generate descriptive captions for aircraft images. * **Aircraft Recognition:** Assist in identifying different types of aircraft based on their visual features. * **Educational Purposes:** Used as a tool for learning about different aircraft models. **How to Use:** You can use this model directly from Hugging Face Transformers: ```python from transformers import pipeline, AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM from peft import PeftModel # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained("frankmorales2020/lora_fine_tuned_phi-4_quantized_vision") # Load the base model with 4-bit quantization bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) base_model = AutoModelForCausalLM.from_pretrained( "microsoft/phi-4", quantization_config=bnb_config, low_cpu_mem_usage=True ) # Load the locally fine-tuned model with LoRA adapter model = PeftModel.from_pretrained( base_model, # Pass the base model instance "frankmorales2020/lora_fine_tuned_phi-4_quantized_vision", # Load from HF Hub device_map={"": 0}, ) # Set the pad_token_id for the model explicitly model.generation_config.pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id tokenizer.pad_token = tokenizer.eos_token model.pad_token_id = model.config.eos_token_id # Create a text generation pipeline generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer) # Generate captions for an image (replace with your image processing logic) image_path = "path/to/your/aircraft/image.jpg" # ... (Add your image loading and preprocessing code here) ... prompt = f"Generate a caption for the following image: {processed_image}" generated_caption = generator(prompt, max_length=64)[0]['generated_text'] print(generated_caption) ```` **Training Data:** The model was trained on the FGVC Aircraft dataset ([https://www.robots.ox.ac.uk/\~vgg/data/fgvc-aircraft/](https://www.google.com/url?sa=E&source=gmail&q=https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/)). **Evaluation:** The model was evaluated using the BLEU metric on a held-out test set from the FGVC Aircraft dataset. **Limitations:** * The model is specifically fine-tuned for aircraft images and may not generalize well to other types of images. * The generated captions may sometimes be overly generic or lack fine-grained details. **Future Work:** * Fine-tune the model on a larger and more diverse dataset of images. * Explore more advanced image encoding techniques to improve the model's understanding of visual features. * Experiment with different decoding strategies to generate more detailed and human-like captions. **Acknowledgements:** This work is based on the Phi-4 language model developed by Microsoft and utilizes the Hugging Face Transformers and Datasets libraries. ``` **Remember to:** * Replace `"path/to/your/aircraft/image.jpg"` with the actual path to your image. * Add your image loading and preprocessing code in the designated section. * Consider adding a license (e.g., MIT License) to your repository. ```