README.md · therealcyberlord/bigcatvit at eb30004b9433b76308ddba46e81c626a03c34ee2

metadata

license: apache-2.0

Fine-tuning a Vision Transformer on the Big Cats Dataset In this project, we fine-tuned a vision transformer on the Big Cats dataset to perform image classification. The Big Cats dataset consists of 2339 images of 10 different types of big cats, including lions, tigers, jaguars, and more.

Our goal was to train a model that could accurately classify these images with high accuracy. After fine-tuning a pre-trained Vision Transformer, we were able to achieve an accuracy of 98%.

References [1] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.