Update README.md

9e21d9b about 1 year ago

6.47 kB

	---
	license: apache-2.0
	base_model: google/vit-base-patch16-224-in21k
	tags:
	- generated_from_trainer
	datasets:
	- FastJobs/Visual_Emotional_Analysis
	metrics:
	- accuracy
	- precision
	- f1
	model-index:
	- name: emotion_classification
	results:
	- task:
	name: Image Classification
	type: image-classification
	dataset:
	name: FastJobs/Visual_Emotional_Analysis
	type: FastJobs/Visual_Emotional_Analysis
	config: default
	split: train
	args: default
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.63125
	- name: Precision
	type: precision
	value: 0.6430986797647803
	- name: F1
	type: f1
	value: 0.6224944698106615
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Emotion Classification

	This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k)
	on the [FastJobs/Visual_Emotional_Analysis](https://huggingface.co/datasets/FastJobs/Visual_Emotional_Analysis) dataset.

	In theory, the accuracy for a random guess on this dataset is 0.1429.

	It achieves the following results on the evaluation set:
	- Loss: 1.1031
	- Accuracy: 0.6312
	- Precision: 0.6431
	- F1: 0.6225

	## Model description

	The Vision Transformer base version trained on ImageNet-21K released by Google.
	Further details can be found on their [repo](https://huggingface.co/google/vit-base-patch16-224-in21k).

	## Training and evaluation data

	### Data Split

	Used a 4:1 ratio for training and development sets and a random seed of 42.
	Also used a seed of 42 for batching the data, completely unrelated lol.

	### Pre-processing Augmentation

	The main pre-processing phase for both training and evaluation includes:
	- Bilinear interpolation to resize the image to (224, 224, 3) because it uses ImageNet images to train the original model
	- Normalizing images using a mean and standard deviation of [0.5, 0.5, 0.5] just like the original model

	Other than the aforementioned pre-processing, the training set was augmented using:
	- Random horizontal & vertical flip
	- Color jitter
	- Random resized crop

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine_with_restarts
	- lr_scheduler_warmup_steps: 20
	- num_epochs: 100

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Precision \| F1 \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:---------:\|:------:\|
	\| 2.0742 \| 1.0 \| 10 \| 2.0533 \| 0.1938 \| 0.1942 \| 0.1858 \|
	\| 2.0081 \| 2.0 \| 20 \| 1.8908 \| 0.3438 \| 0.3701 \| 0.3368 \|
	\| 1.7211 \| 3.0 \| 30 \| 1.5199 \| 0.5312 \| 0.4821 \| 0.4844 \|
	\| 1.5641 \| 4.0 \| 40 \| 1.4248 \| 0.4875 \| 0.5314 \| 0.4532 \|
	\| 1.3979 \| 5.0 \| 50 \| 1.2973 \| 0.5375 \| 0.5162 \| 0.5023 \|
	\| 1.2997 \| 6.0 \| 60 \| 1.2016 \| 0.525 \| 0.4828 \| 0.4826 \|
	\| 1.2348 \| 7.0 \| 70 \| 1.1670 \| 0.5875 \| 0.6375 \| 0.5941 \|
	\| 1.1481 \| 8.0 \| 80 \| 1.1292 \| 0.6 \| 0.6111 \| 0.5961 \|
	\| 1.079 \| 9.0 \| 90 \| 1.1782 \| 0.5188 \| 0.5265 \| 0.5005 \|
	\| 0.9909 \| 10.0 \| 100 \| 1.1115 \| 0.5813 \| 0.5892 \| 0.5668 \|
	\| 0.9662 \| 11.0 \| 110 \| 1.1047 \| 0.5938 \| 0.6336 \| 0.5723 \|
	\| 0.8149 \| 12.0 \| 120 \| 1.0944 \| 0.5563 \| 0.5648 \| 0.5499 \|
	\| 0.7661 \| 13.0 \| 130 \| 1.0932 \| 0.5625 \| 0.5738 \| 0.5499 \|
	\| 0.7067 \| 14.0 \| 140 \| 1.0787 \| 0.6062 \| 0.6318 \| 0.6045 \|
	\| 0.6708 \| 15.0 \| 150 \| 1.1140 \| 0.6188 \| 0.6463 \| 0.6134 \|
	\| 0.6268 \| 16.0 \| 160 \| 1.0875 \| 0.5813 \| 0.6016 \| 0.5815 \|
	\| 0.5473 \| 17.0 \| 170 \| 1.1483 \| 0.5938 \| 0.6027 \| 0.5844 \|
	\| 0.5228 \| 18.0 \| 180 \| 1.1031 \| 0.6312 \| 0.6431 \| 0.6225 \|
	\| 0.4805 \| 19.0 \| 190 \| 1.1747 \| 0.5813 \| 0.6057 \| 0.5848 \|
	\| 0.4995 \| 20.0 \| 200 \| 1.1865 \| 0.6062 \| 0.6062 \| 0.5980 \|
	\| 0.456 \| 21.0 \| 210 \| 1.2619 \| 0.6 \| 0.6020 \| 0.5843 \|
	\| 0.4697 \| 22.0 \| 220 \| 1.2476 \| 0.5625 \| 0.5804 \| 0.5647 \|
	\| 0.3656 \| 23.0 \| 230 \| 1.3106 \| 0.6125 \| 0.6645 \| 0.6130 \|
	\| 0.394 \| 24.0 \| 240 \| 1.3398 \| 0.5437 \| 0.5627 \| 0.5460 \|
	\| 0.35 \| 25.0 \| 250 \| 1.3391 \| 0.5938 \| 0.5940 \| 0.5860 \|
	\| 0.3508 \| 26.0 \| 260 \| 1.2846 \| 0.575 \| 0.6070 \| 0.5821 \|
	\| 0.3106 \| 27.0 \| 270 \| 1.3495 \| 0.575 \| 0.6258 \| 0.5663 \|
	\| 0.3265 \| 28.0 \| 280 \| 1.4450 \| 0.5375 \| 0.6512 \| 0.5248 \|
	\| 0.2806 \| 29.0 \| 290 \| 1.5145 \| 0.5188 \| 0.5840 \| 0.5151 \|
	\| 0.3276 \| 30.0 \| 300 \| 1.5207 \| 0.5188 \| 0.5741 \| 0.5164 \|
	\| 0.2932 \| 31.0 \| 310 \| 1.3179 \| 0.6312 \| 0.6421 \| 0.6298 \|
	\| 0.3542 \| 32.0 \| 320 \| 1.3720 \| 0.5875 \| 0.6157 \| 0.5780 \|
	\| 0.3321 \| 33.0 \| 330 \| 1.4787 \| 0.5625 \| 0.6088 \| 0.5714 \|
	\| 0.2641 \| 34.0 \| 340 \| 1.5468 \| 0.5375 \| 0.5817 \| 0.5385 \|
	\| 0.2432 \| 35.0 \| 350 \| 1.4893 \| 0.5687 \| 0.6012 \| 0.5538 \|
	\| 0.275 \| 36.0 \| 360 \| 1.4775 \| 0.575 \| 0.5827 \| 0.5710 \|
	\| 0.239 \| 37.0 \| 370 \| 1.4812 \| 0.575 \| 0.6100 \| 0.5739 \|
	\| 0.2658 \| 38.0 \| 380 \| 1.7335 \| 0.5563 \| 0.6547 \| 0.5436 \|
	\| 0.3026 \| 39.0 \| 390 \| 1.5692 \| 0.5875 \| 0.6401 \| 0.5854 \|
	\| 0.1867 \| 40.0 \| 400 \| 1.4908 \| 0.5687 \| 0.5921 \| 0.5741 \|
	\| 0.1931 \| 41.0 \| 410 \| 1.6608 \| 0.5375 \| 0.5834 \| 0.5396 \|
	\| 0.2416 \| 42.0 \| 420 \| 1.5172 \| 0.5938 \| 0.6259 \| 0.5935 \|
	\| 0.1943 \| 43.0 \| 430 \| 1.5260 \| 0.5437 \| 0.5775 \| 0.5498 \|


	### Framework versions

	- Transformers 4.33.1
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.5
	- Tokenizers 0.13.3