frankmorales2020 commited on
Commit
9605200
·
verified ·
1 Parent(s): dbf0100

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ````markdown
2
+ # lora_fine_tuned_phi-4_quantized_vision
3
+
4
+ This repository contains a fine-tuned version of the **Phi-4** language model specifically adapted for **image-to-text generation**.
5
+
6
+ The model has been fine-tuned using **LoRA (Low-Rank Adaptation)** on the **FGVC Aircraft** dataset, which consists of images of aircraft with corresponding textual descriptions. This fine-tuning process enables the model to generate more accurate and descriptive captions for aircraft images.
7
+
8
+ **Key Features:**
9
+
10
+ * **4-bit Quantization:** The model utilizes 4-bit quantization techniques to reduce its size and memory footprint, making it more efficient to deploy and use.
11
+ * **LoRA:** Fine-tuning is performed with LoRA, which allows for efficient adaptation of the model while keeping the number of trainable parameters low.
12
+ * **Image Captioning:** The model is specifically trained to generate textual descriptions (captions) for images of aircraft.
13
+
14
+ **Intended Use Cases:**
15
+
16
+ * **Image Captioning:** Generate descriptive captions for aircraft images.
17
+ * **Aircraft Recognition:** Assist in identifying different types of aircraft based on their visual features.
18
+ * **Educational Purposes:** Used as a tool for learning about different aircraft models.
19
+
20
+ **How to Use:**
21
+
22
+ You can use this model directly from Hugging Face Transformers:
23
+
24
+ ```python
25
+ from transformers import pipeline, AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM
26
+ from peft import PeftModel
27
+
28
+ # Load the tokenizer
29
+ tokenizer = AutoTokenizer.from_pretrained("frankmorales2020/lora_fine_tuned_phi-4_quantized_vision")
30
+
31
+ # Load the base model with 4-bit quantization
32
+ bnb_config = BitsAndBytesConfig(
33
+ load_in_4bit=True,
34
+ bnb_4bit_use_double_quant=True,
35
+ bnb_4bit_quant_type="nf4",
36
+ bnb_4bit_compute_dtype=torch.bfloat16
37
+ )
38
+
39
+ base_model = AutoModelForCausalLM.from_pretrained(
40
+ "microsoft/phi-4",
41
+ quantization_config=bnb_config,
42
+ low_cpu_mem_usage=True
43
+ )
44
+
45
+ # Load the locally fine-tuned model with LoRA adapter
46
+ model = PeftModel.from_pretrained(
47
+ base_model, # Pass the base model instance
48
+ "frankmorales2020/lora_fine_tuned_phi-4_quantized_vision", # Load from HF Hub
49
+ device_map={"": 0},
50
+ )
51
+
52
+ # Set the pad_token_id for the model explicitly
53
+ model.generation_config.pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
54
+ tokenizer.pad_token = tokenizer.eos_token
55
+ model.pad_token_id = model.config.eos_token_id
56
+
57
+ # Create a text generation pipeline
58
+ generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
59
+
60
+ # Generate captions for an image (replace with your image processing logic)
61
+ image_path = "path/to/your/aircraft/image.jpg"
62
+ # ... (Add your image loading and preprocessing code here) ...
63
+
64
+ prompt = f"Generate a caption for the following image: {processed_image}"
65
+ generated_caption = generator(prompt, max_length=64)[0]['generated_text']
66
+ print(generated_caption)
67
+ ````
68
+
69
+ **Training Data:**
70
+
71
+ The model was trained on the FGVC Aircraft dataset ([https://www.robots.ox.ac.uk/\~vgg/data/fgvc-aircraft/](https://www.google.com/url?sa=E&source=gmail&q=https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/)).
72
+
73
+ **Evaluation:**
74
+
75
+ The model was evaluated using the BLEU metric on a held-out test set from the FGVC Aircraft dataset.
76
+
77
+ **Limitations:**
78
+
79
+ * The model is specifically fine-tuned for aircraft images and may not generalize well to other types of images.
80
+ * The generated captions may sometimes be overly generic or lack fine-grained details.
81
+
82
+ **Future Work:**
83
+
84
+ * Fine-tune the model on a larger and more diverse dataset of images.
85
+ * Explore more advanced image encoding techniques to improve the model's understanding of visual features.
86
+ * Experiment with different decoding strategies to generate more detailed and human-like captions.
87
+
88
+ **Acknowledgements:**
89
+
90
+ This work is based on the Phi-4 language model developed by Microsoft and utilizes the Hugging Face Transformers and Datasets libraries.
91
+
92
+ ```
93
+
94
+ **Remember to:**
95
+
96
+ * Replace `"path/to/your/aircraft/image.jpg"` with the actual path to your image.
97
+ * Add your image loading and preprocessing code in the designated section.
98
+ * Consider adding a license (e.g., MIT License) to your repository.
99
+ ```