|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# ViT Fine-tuned on Stanford Car Dataset |
|
|
|
Base model: https://huggingface.co/google/vit-base-patch16-224 |
|
|
|
This achieves around 86% on the testing set, you can use it as a baseline for further tuning. |
|
|
|
# Dataset Description: |
|
|
|
The Stanford car dataset contains 16,185 images of 196 classes of cars. Classes are typically at the level of Make, Model, Year, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe. The data is split into 8144 training images, 6,041 testing images, and 2000 validation images in this case. |
|
|
|
** Please note: this dataset does not contain newer car models ** |
|
|
|
<img src="https://ai.stanford.edu/~jkrause/cars/class_montage.jpg"> |
|
|
|
# Citations: |
|
3D Object Representations for Fine-Grained Categorization |
|
Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei |
|
4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013. |
|
|