|
--- |
|
library_name: peft |
|
license: apache-2.0 |
|
base_model: HuggingFaceTB/SmolVLM-Base |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: SmolVLM-Base-vqav2 |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# SmolVLM-Base-vqav2 |
|
|
|
This model is a fine-tuned version of [HuggingFaceTB/SmolVLM-Base](https://huggingface.co/HuggingFaceTB/SmolVLM-Base) on an unknown dataset. |
|
|
|
## Model description |
|
|
|
Here is the sample code for how to use. |
|
|
|
```python |
|
from transformers import AutoProcessor, AutoModelForImageTextToText |
|
from peft import PeftModel |
|
import torch |
|
from PIL import Image |
|
|
|
|
|
DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu" ### DEVICE = "cuda:0" instead of DEVICE = "cuda" it fixes flash attention warning!! |
|
|
|
|
|
model_id = "HuggingFaceTB/SmolVLM-Instruct" # Base Model |
|
|
|
base_model = AutoModelForImageTextToText.from_pretrained( |
|
model_id, |
|
torch_dtype=torch.bfloat16, |
|
_attn_implementation="flash_attention_2" if DEVICE == "cuda" else "eager" |
|
).to(DEVICE) |
|
|
|
print(f"Model is on device: {base_model.device}") |
|
|
|
|
|
# QLoRA adapter |
|
adapter_path = r"C:\Users\.....\SmolVLM-Base-vqav2\checkpoint-670" |
|
model = PeftModel.from_pretrained(base_model, adapter_path) |
|
|
|
model = model.to(DEVICE) # Check the model device ##################################### |
|
|
|
# Load the processor |
|
processor = AutoProcessor.from_pretrained(model_id) |
|
|
|
# Functıon for load images from local |
|
def load_image_from_file(file_path): |
|
try: |
|
image = Image.open(file_path) |
|
return image |
|
except Exception as e: |
|
print(f"Error loading image: {e}") |
|
return None |
|
|
|
|
|
image1_path = "C:/Users/.../IMG_4.jpg" |
|
image2_path = "C:/Users/.../IMG_35.jpg" |
|
|
|
# Load images |
|
image1 = load_image_from_file(image1_path) |
|
image2 = load_image_from_file(image2_path) |
|
|
|
# Check the images |
|
if image1 and image2: |
|
|
|
# Create message type |
|
messages = [ |
|
{ |
|
"role": "user", |
|
"content": [ |
|
{"type": "image"}, |
|
{"type": "image"}, |
|
{"type": "text", "text": "Can you describe and compare the two images?"} |
|
] |
|
}, |
|
] |
|
|
|
# Prepare the Prompt |
|
|
|
prompt = processor.apply_chat_template(messages, add_generation_prompt=True) |
|
inputs = processor(text=prompt, images=[image1, image2], return_tensors="pt") |
|
inputs = inputs.to(DEVICE) |
|
|
|
# Run the model |
|
generated_ids = model.generate(**inputs, max_new_tokens=500) |
|
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True) |
|
|
|
# Print the result |
|
print(generated_texts[0]) # Çıktı |
|
|
|
else: |
|
print("Images can not be loaded") |
|
|
|
|
|
``` |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0001 |
|
- train_batch_size: 4 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 4 |
|
- total_train_batch_size: 16 |
|
- optimizer: Use adamw_hf with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 50 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.14.0 |
|
- Transformers 4.46.3 |
|
- Pytorch 2.5.1+cu121 |
|
- Datasets 3.1.0 |
|
- Tokenizers 0.20.3 |