vit-indian-food / README.md
therealcyberlord's picture
Update README.md
85ebd16 verified
metadata
license: apache-2.0
datasets:
  - bharat-raghunathan/indian-foods-dataset
metrics:
  - accuracy
  - precision
  - recall

Indian Food Classification with Vision Transformer (ViT)

Overview

This model is a fine-tuned Vision Transformer (ViT) for the task of classifying images of Indian foods. The model was trained on the Indian Foods Dataset from Hugging Face Datasets.

Dataset

The Indian Foods Dataset contains 4,770 images across 15 different classes of popular Indian dishes. The dataset is split into:

  • Training: 3,047 images
  • Validation: 762 images
  • Testing: 961 images

Model

The base model used is the vision transformer (google/vit-base-patch16-224-in21k). The model was fine-tuned on the Indian Foods Dataset for 10 epochs using the AdamW optimizer with a learning rate of 2e-4.

Evaluation

The model was evaluated on the test set and achieved the following metrics:

  • Accuracy: 0.9667
  • Precision: 0.9670
  • Recall: 0.9667

Usage

You can use this pre-trained model directly from Hugging Face