timm
/

vit_large_patch14_clip_336.laion2b_ft_augreg_inat21

Image Classification

Model card Files Files and versions Community

vit_large_patch14_clip_336.laion2b_ft_augreg_inat21 / README.md

rwightman's picture

rwightman HF staff

Update README.md

39500db about 1 year ago

|

history blame contribute delete

2.84 kB

	---
	tags:
	- image-classification
	- timm
	library_name: timm
	license: cc-by-nc-4.0
	---
	# Model card for vit_large_patch14_clip_336.laion2b_ft_augreg_inat21
	Part of a series of `timm` fine-tune experiments on iNaturalist 2021 competition data (https://github.com/visipedia/inat_comp/tree/master/2021) for higher capacity models.

	Covering 10,000 species, this dataset and these models are fun to explore via the classification widget with pictures from your backyard, but quite a bit smaller than models you can find on iNaturalist website (https://www.inaturalist.org/blog/75633-a-new-computer-vision-model-v2-1-including-1-770-new-taxa).

	No extra meta-data was used for training these models (as was the case for the competition), it was a straightfoward fine-tune to explore differences in model pretrain data.

	\| Model \| Top-1 \| Top-5 \| Img Size (Train) \| Paper \|
	\|-------\|-------\|-------\|----------\|-------\|
	\| [eva02_large_patch14_clip_336.merged2b_ft_inat21](https://huggingface.co/timm/eva02_large_patch14_clip_336.merged2b_ft_inat21) \| 92.05 \| 98.01 \| 336 \| https://arxiv.org/abs/2303.11331 \|
	\| [vit_large_patch14_clip_336.datacompxl_ft_augreg_inat21](https://huggingface.co/timm/vit_large_patch14_clip_336.datacompxl_ft_augreg_inat21) \| 91.98 \| 98.03 \| 336 \| https://arxiv.org/abs/2304.14108 \|
	\| [vit_large_patch14_clip_336.laion2b_ft_augreg_inat21](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_augreg_inat21) \| 91.48 \| 97.89 \| 336 \| https://arxiv.org/abs/2212.07143 \|
	\| [convnext_large_mlp.laion2b_ft_augreg_inat21](https://huggingface.co/timm/convnext_large_mlp.laion2b_ft_augreg_inat21) \| 90.95 \| 97.68 \| 448 (384) \| \|
	\| [vit_large_patch14_clip_336.datacompxl_ft_inat21](https://huggingface.co/timm/vit_large_patch14_clip_336.datacompxl_ft_inat21) \| 90.85 \| 97.68 \| 336 \| https://arxiv.org/abs/2304.14108 \|
	\| [convnext_large_mlp.laion2b_ft_augreg_inat21](https://huggingface.co/timm/convnext_large_mlp.laion2b_ft_augreg_inat21) \| 90.62 \| 97.61 \| 384 \| \|
	\| [vit_large_patch14_clip_336.laion2b_ft_in12k_in1k_inat21](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k_inat21) \| 90.29 \| 97.44 \| 336 \| https://arxiv.org/abs/2212.07143 \|


	## Run Validation
	```
	python validate.py /tfds/ --dataset tfds/i_naturalist2021 --model hf-hub:timm/vit_large_patch14_clip_336.laion2b_ft_augreg_inat21 --split val --amp
	```

	## Citation

	```bibtex
	@inproceedings{cherti2023reproducible,
	title={Reproducible scaling laws for contrastive language-image learning},
	author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
	pages={2818--2829},
	year={2023}
	}
	```