--- license: apache-2.0 tags: - generated_from_trainer datasets: - imagefolder metrics: - accuracy model-index: - name: vit-artworkclassifier results: - task: name: Image Classification type: image-classification dataset: name: imagefolder type: imagefolder config: artbench10-vit split: test args: artbench10-vit metrics: - name: Accuracy type: accuracy value: 0.4887640449438202 --- # vit-artworkclassifier This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the imagefolder dataset, a subset of the artbench-10 dataset. Train set size 1800, test set size 180, split equally over the 9 classes. It achieves the following results on the evaluation set: - Loss: 1.3363 - Accuracy: 0.4888 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 32 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 8 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:--------:| | 1.4136 | 1.79 | 100 | 1.5093 | 0.5112 | | 0.7189 | 3.57 | 200 | 1.3363 | 0.4888 | | 0.2717 | 5.36 | 300 | 1.4907 | 0.5281 | | 0.1227 | 7.14 | 400 | 1.4826 | 0.5562 | ### Framework versions - Transformers 4.26.1 - Pytorch 1.13.1+cu117 - Datasets 2.9.0 - Tokenizers 0.13.2 ### Code to Run def vit_classify(image): from transformers import ViTFeatureExtractor from transformers import ViTForImageClassification import torch vit = ViTForImageClassification.from_pretrained("oschamp/vit-artworkclassifier") vit.eval() device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') vit.to(device) model_name_or_path = 'google/vit-base-patch16-224-in21k' feature_extractor = ViTFeatureExtractor.from_pretrained(model_name_or_path) #LOAD IMAGE encoding = feature_extractor(images=image, return_tensors="pt") encoding.keys() pixel_values = encoding['pixel_values'].to(device) outputs = vit(pixel_values) logits = outputs.logits prediction = logits.argmax(-1) return prediction.item() #vit.config.id2label[prediction.item()]