frankmorales2020
/

lora_fine_tuned_phi-4_quantized_vision

Safetensors

Model card Files Files and versions Community

frankmorales2020 commited on 29 days ago

Commit

9605200

verified ·

1 Parent(s): dbf0100

Create README.md

Browse files

Files changed (1) hide show

README.md +99 -0

README.md ADDED Viewed

	@@ -0,0 +1,99 @@

+````markdown
+# lora_fine_tuned_phi-4_quantized_vision
+This repository contains a fine-tuned version of the **Phi-4** language model specifically adapted for **image-to-text generation**.
+The model has been fine-tuned using **LoRA (Low-Rank Adaptation)** on the **FGVC Aircraft** dataset, which consists of images of aircraft with corresponding textual descriptions. This fine-tuning process enables the model to generate more accurate and descriptive captions for aircraft images.
+**Key Features:**
+* **4-bit Quantization:** The model utilizes 4-bit quantization techniques to reduce its size and memory footprint, making it more efficient to deploy and use.
+* **LoRA:**  Fine-tuning is performed with LoRA, which allows for efficient adaptation of the model while keeping the number of trainable parameters low.
+* **Image Captioning:** The model is specifically trained to generate textual descriptions (captions) for images of aircraft.
+**Intended Use Cases:**
+* **Image Captioning:** Generate descriptive captions for aircraft images.
+* **Aircraft Recognition:** Assist in identifying different types of aircraft based on their visual features.
+* **Educational Purposes:**  Used as a tool for learning about different aircraft models.
+**How to Use:**
+You can use this model directly from Hugging Face Transformers:
+```python
+from transformers import pipeline, AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM
+from peft import PeftModel
+# Load the tokenizer
+tokenizer = AutoTokenizer.from_pretrained("frankmorales2020/lora_fine_tuned_phi-4_quantized_vision")
+# Load the base model with 4-bit quantization
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16
+)
+base_model = AutoModelForCausalLM.from_pretrained(
+    "microsoft/phi-4",
+    quantization_config=bnb_config,
+    low_cpu_mem_usage=True
+)
+# Load the locally fine-tuned model with LoRA adapter
+model = PeftModel.from_pretrained(
+    base_model,  # Pass the base model instance
+    "frankmorales2020/lora_fine_tuned_phi-4_quantized_vision",  # Load from HF Hub
+    device_map={"": 0},
+)
+# Set the pad_token_id for the model explicitly
+model.generation_config.pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
+tokenizer.pad_token = tokenizer.eos_token
+model.pad_token_id = model.config.eos_token_id
+# Create a text generation pipeline
+generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
+# Generate captions for an image (replace with your image processing logic)
+image_path = "path/to/your/aircraft/image.jpg"
+# ... (Add your image loading and preprocessing code here) ...
+prompt = f"Generate a caption for the following image: {processed_image}"
+generated_caption = generator(prompt, max_length=64)[0]['generated_text']
+print(generated_caption)
+````
+**Training Data:**
+The model was trained on the FGVC Aircraft dataset ([https://www.robots.ox.ac.uk/\~vgg/data/fgvc-aircraft/](https://www.google.com/url?sa=E&source=gmail&q=https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/)).
+**Evaluation:**
+The model was evaluated using the BLEU metric on a held-out test set from the FGVC Aircraft dataset.
+**Limitations:**
+  * The model is specifically fine-tuned for aircraft images and may not generalize well to other types of images.
+  * The generated captions may sometimes be overly generic or lack fine-grained details.
+**Future Work:**
+  * Fine-tune the model on a larger and more diverse dataset of images.
+  * Explore more advanced image encoding techniques to improve the model's understanding of visual features.
+  * Experiment with different decoding strategies to generate more detailed and human-like captions.
+**Acknowledgements:**
+This work is based on the Phi-4 language model developed by Microsoft and utilizes the Hugging Face Transformers and Datasets libraries.
+```
+**Remember to:**
+* Replace `"path/to/your/aircraft/image.jpg"` with the actual path to your image.
+* Add your image loading and preprocessing code in the designated section.
+* Consider adding a license (e.g., MIT License) to your repository.
+```