mtgv's picture
Update README.md
08930fd verified
---
license: apache-2.0
datasets:
- imagenet-1k
metrics:
- accuracy
pipeline_tag: image-classification
---
# VisionLLaMA-Base-MAE
With the Masked Autoencoders' paradigm, VisionLLaMA-Large-MAE model is trained on ImageNet-1K without labels. It retains improvements over classification tasks (SFT, linear probing) on ImageNet-1K.
| Model | ImageNet Acc (SFT) | ImageNet Acc (Linear Probe) |
| -- | -- | --|
| VisionLLaMA-Large-MAE (ep800) |85.5 | 77.3 |
# How to Use
Please refer the [Github](https://github.com/Meituan-AutoML/VisionLLaMA) page for usage.
# Citation
```
@article{chu2024visionllama,
title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks},
author={Chu, Xiangxiang and Su, Jianlin and Zhang, Bo and Shen, Chunhua},
journal={arXiv preprint arXiv:2403.00522},
year={2024}
}
```