File size: 4,340 Bytes
a726d7d
 
eeef70f
a726d7d
eeef70f
 
 
 
 
 
 
 
22197b4
eeef70f
35c406f
 
 
 
 
eeef70f
35c406f
eeef70f
35c406f
 
eeef70f
35c406f
 
 
eeef70f
35c406f
 
 
eeef70f
35c406f
eeef70f
35c406f
eeef70f
35c406f
eeef70f
35c406f
eeef70f
 
35c406f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eeef70f
35c406f
 
 
 
 
 
 
 
 
 
 
 
 
eeef70f
35c406f
 
eeef70f
35c406f
 
eeef70f
35c406f
 
be10ea8
 
35c406f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: mit
base_model: laion/CLIP-ViT-B-32-laion2B-s34B-b79K
tags:
- generated_from_trainer
model-index:
- name: laion-finetuned_v5e7_epoch10_fold0_threshold3
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# laion-finetuned_room luxury annotater

This model is a fine-tuned version of [laion/CLIP-ViT-B-32-laion2B-s34B-b79K](https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K) on a private dataset provided by Wahi Inc. It is designed to classify room images into categories based on their luxury level and room type.

## Model Description

This model leverages a fine-tuned version of CLIP, specifically optimized for real estate image annotation. It performs zero-shot classification of room images into categories like standard or contemporary kitchens, bathrooms, and other common rooms in real estate properties. The model uses a multi-stage approach where diffusion models generate supplementary training data, and hierarchical CLIP networks perform luxury annotation. This fine-tuning process enables high accuracy in distinguishing luxury levels from real estate images.

The model was developed for the paper *"Diffusion-based Data Augmentation and Hierarchical CLIP for Real Estate Image Annotation"* submitted to the *Pattern Analysis and Applications Special Issue on Multimedia Sensing and Computing*.

![Model Framework](framework.png) 
## Intended Uses & Limitations

This model is intended to be used for:
- Annotating real estate images by classifying room types and luxury levels (e.g., standard or contemporary kitchens, bathrooms, etc.).
- Helping users filter properties in real estate platforms based on the luxury level of rooms.

**Limitations**:
- The model is optimized for real estate images and may not generalize well to other domains.
- Zero-shot classification is limited to the predefined categories and candidate labels used during fine-tuning.

## Training and Evaluation Data

The training data was collected and labeled by Wahi Inc. and includes a diverse set of real estate images from kitchens, bathrooms, dining rooms, living rooms, and foyers. The images were annotated as either standard or contemporary, based on the room's aesthetics, design, and quality.

## Training Procedure

### Training Hyperparameters

The following hyperparameters were used during training:
- **Learning Rate**: 1e-06
- **Train Batch Size**: 384
- **Eval Batch Size**: 24
- **Seed**: 42
- **Optimizer**: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- **LR Scheduler Type**: Linear

### Framework Versions

- **Transformers**: 4.37.2
- **PyTorch**: 2.0.1+cu117
- **Datasets**: 2.14.4
- **Tokenizers**: 0.15.0

### Output Example

Below is an example of the model's output, where an image of a kitchen is classified with its top 3 predicted room types and confidence scores.

![Model Output Example](example_output.png)  <!-- You would replace this with the actual image of the model's output. -->

## How to Use the Model

You can use this model for zero-shot image classification with the HuggingFace `pipeline` API. Here is a basic example:

```python
from transformers import pipeline

# Initialize the pipeline
classifier = pipeline("zero-shot-image-classification", model="strollingorange/roomLuxuryAnnotater")

# Define the candidate labels
candidate_labels = [
    "a photo of standard bathroom",
    "a photo of contemporary bathroom",
    "a photo of standard kitchen",
    "a photo of contemporary kitchen",
    "a photo of standard foyer",
    "a photo of standard living room",
    "a photo of standard dining room",
    "a photo of contemporary foyer",
    "a photo of contemporary living room",
    "a photo of contemporary dining room"
]

# Load your image (replace 'image_path' with your actual image path)
image = Image.open('path_to_your_image.jpg')

# Run zero-shot classification
result = classifier(image, candidate_labels=candidate_labels)

# Output the result
print(result)
```
## Acknowledgments
We would like to acknowledge Wahi Inc. for providing the training data and their continued support in the development of this model. Their collaboration was essential in fine-tuning the model for real estate image annotation.