--- license: mit base_model: - apple/aimv2-large-patch14-native pipeline_tag: image-classification tags: - image-classification - vision library_name: transformers --- # AIMv2-Large-Patch14-Native Image Classification [Original AIMv2 Paper](https://arxiv.org/abs/2411.14402) | [BibTeX](#citation) This repository contains an adapted version of the original AIMv2 model, modified to be compatible with the `AutoModelForImageClassification` class from Hugging Face Transformers. This adaptation enables seamless use of the model for image classification tasks. **This model has not been trained/fine-tuned** ## Introduction We have adapted the original `apple/aimv2-large-patch14-native` model to work with `AutoModelForImageClassification`. The AIMv2 family consists of vision models pre-trained with a multimodal autoregressive objective, offering robust performance across various benchmarks. Some highlights of the AIMv2 models include: 1. Outperforming OAI CLIP and SigLIP on the majority of multimodal understanding benchmarks. 2. Surpassing DINOv2 in open-vocabulary object detection and referring expression comprehension. 3. Demonstrating strong recognition performance, with AIMv2-3B achieving **89.5% on ImageNet using a frozen trunk**. ## Usage ### PyTorch ```python import requests from PIL import Image from transformers import AutoImageProcessor, AutoModelForImageClassification url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) processor = AutoImageProcessor.from_pretrained( "amaye15/aimv2-large-patch14-native-image-classification", ) model = AutoModelForImageClassification.from_pretrained( "amaye15/aimv2-large-patch14-native-image-classification", trust_remote_code=True, ) inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs) # Get predicted class predictions = outputs.logits.softmax(dim=-1) predicted_class = predictions.argmax(-1).item() print(f"Predicted class: {model.config.id2label[predicted_class]}") ``` ## Model Details - **Model Name**: `amaye15/aimv2-large-patch14-native-image-classification` - **Original Model**: `apple/aimv2-large-patch14-native` - **Adaptation**: Modified to be compatible with `AutoModelForImageClassification` for direct use in image classification tasks. - **Framework**: PyTorch ## Citation If you use this model or find it helpful, please consider citing the original AIMv2 paper: ```bibtex @article{yang2023aimv2, title={AIMv2: Advances in Multimodal Vision Models}, author={Yang, Li and others}, journal={arXiv preprint arXiv:2411.14402}, year={2023} } ```