--- library_name: peft license: apache-2.0 base_model: HuggingFaceTB/SmolVLM-Base tags: - generated_from_trainer model-index: - name: SmolVLM-Base-vqav2 results: [] --- # SmolVLM-Base-vqav2 This model is a fine-tuned version of [HuggingFaceTB/SmolVLM-Base](https://huggingface.co/HuggingFaceTB/SmolVLM-Base) on an unknown dataset. ## Model description Here is the sample code for how to use. ```python from transformers import AutoProcessor, AutoModelForImageTextToText from peft import PeftModel import torch from PIL import Image DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu" ### DEVICE = "cuda:0" instead of DEVICE = "cuda" it fixes flash attention warning!! model_id = "HuggingFaceTB/SmolVLM-Instruct" # Base Model base_model = AutoModelForImageTextToText.from_pretrained( model_id, torch_dtype=torch.bfloat16, _attn_implementation="flash_attention_2" if DEVICE == "cuda" else "eager" ).to(DEVICE) print(f"Model is on device: {base_model.device}") # QLoRA adapter adapter_path = r"C:\Users\.....\SmolVLM-Base-vqav2\checkpoint-670" model = PeftModel.from_pretrained(base_model, adapter_path) model = model.to(DEVICE) # Check the model device ##################################### # Load the processor processor = AutoProcessor.from_pretrained(model_id) # Functıon for load images from local def load_image_from_file(file_path): try: image = Image.open(file_path) return image except Exception as e: print(f"Error loading image: {e}") return None image1_path = "C:/Users/.../IMG_4.jpg" image2_path = "C:/Users/.../IMG_35.jpg" # Load images image1 = load_image_from_file(image1_path) image2 = load_image_from_file(image2_path) # Check the images if image1 and image2: # Create message type messages = [ { "role": "user", "content": [ {"type": "image"}, {"type": "image"}, {"type": "text", "text": "Can you describe and compare the two images?"} ] }, ] # Prepare the Prompt prompt = processor.apply_chat_template(messages, add_generation_prompt=True) inputs = processor(text=prompt, images=[image1, image2], return_tensors="pt") inputs = inputs.to(DEVICE) # Run the model generated_ids = model.generate(**inputs, max_new_tokens=500) generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True) # Print the result print(generated_texts[0]) # Çıktı else: print("Images can not be loaded") ``` ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Use adamw_hf with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 50 - num_epochs: 1 ### Training results ### Framework versions - PEFT 0.14.0 - Transformers 4.46.3 - Pytorch 2.5.1+cu121 - Datasets 3.1.0 - Tokenizers 0.20.3