Fine-tuned ViT for House Condition Classification
This model is a fine-tuned version of google/vit-base-patch16-224-in21k for classifying house conditions into 4 categories.
Model Description
This Vision Transformer (ViT) model has been fine-tuned to classify house images into four condition categories:
- good (dobre)
- unknown (nepoznato)
- ruined (oronule)
- medium (srednje)
Training Details
Training Data
- Total dataset: 935 images
- Training set: 776 images
- Validation set: 80 images
- Test set: 79 images
- Classes: 4 (dobre, nepoznato, oronule, srednje)
Training Hyperparameters
- Epochs: 10.0
- Batch size: 16 per device
- Learning rate: 2e-5
- Optimizer: AdamW
- Seed: 42 (for reproducibility)
- Training time: 5m 45s
- Samples per second: 22.43
Evaluation Results
Validation Set Performance
- Accuracy: 81.2%
- Loss: 0.5629
Training Set Performance
- Final Training Loss: 0.5295
Per-Class Metrics (Validation)
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| good | 0.78 | 0.70 | 0.74 | 10 |
| unknown | 1.00 | 0.83 | 0.91 | 24 |
| ruined | 0.62 | 1.00 | 0.77 | 15 |
| medium | 0.85 | 0.74 | 0.79 | 31 |
Overall Metrics:
- Accuracy: 81.0% (65/80 correct)
- Macro Average: Precision=0.81, Recall=0.82, F1=0.80
- Weighted Average: Precision=0.84, Recall=0.81, F1=0.82
Confusion Matrix (Validation)
Predicted โ
good unknown ruined medium
good [ 7 0 0 3 ]
unknown [ 1 20 2 1 ]
ruined [ 0 0 15 0 ]
medium [ 1 0 7 23 ]
Key Insights:
- 'unknown' class has perfect precision (1.00) - no false positives
- 'ruined' class has perfect recall (1.00) - catches all ruined houses
- Main confusion: 'medium' condition sometimes mistaken for 'ruined' (7 cases)
- 'good' houses occasionally misclassified as 'medium' (3 cases)
Usage
from transformers import ViTForImageClassification, ViTImageProcessor
from PIL import Image
import torch
# Load model and processor
model = ViTForImageClassification.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
processor = ViTImageProcessor.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
# Load and preprocess image
image = Image.open("path_to_image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt")
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
predicted_class_idx = outputs.logits.argmax(-1).item()
predicted_label = model.config.id2label[str(predicted_class_idx)]
print(f"Predicted class: {predicted_label}")
# Get probabilities
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)[0]
for idx, prob in enumerate(probs):
label = model.config.id2label[str(idx)]
print(f"{label}: {prob.item():.2%}")
Limitations and Bias
- The model was trained on a specific dataset of house images and may not generalize well to different architectural styles or regions
- Performance varies by class - see validation metrics for details
- The model may have difficulty distinguishing between similar condition categories
- Dataset size: 935 images (relatively small for deep learning)
- Images are from a specific geographical/architectural context
Training Procedure
The model was fine-tuned using the Hugging Face Transformers library with the following approach:
- Pre-trained weights: Initialized from google/vit-base-patch16-224-in21k
- Classification head: Replaced with a new 4-class classifier
- Fine-tuning: All model parameters were fine-tuned on the custom dataset
- Data preprocessing: Images converted to RGB to ensure consistent 3-channel input
- Evaluation strategy: Evaluated every 50 steps with checkpoint saving
- Best model selection: Best model automatically loaded based on validation performance
Base Model
google/vit-base-patch16-224-in21k
Vision Transformer (ViT) model pre-trained on ImageNet-21k at resolution 224x224.
Framework Versions
- Transformers: 4.57.1
- PyTorch: 2.x
- Datasets: 3.x
- Python: 3.13
Citation
If you use this model, please cite:
@misc{house-condition-vit,
author = {Your Name},
title = {Fine-tuned ViT for House Condition Classification},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/YOUR_USERNAME/YOUR_MODEL_NAME}}
}
Model Card Authors
This model card was created by the model author.
Additional Information
- Repository: [GitHub Repository URL]
- Contact: [Your Email or Contact]
- Downloads last month
- 162