metadata

license: apache-2.0

Model Card: Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification

Model Description

The Fine-Tuned Vision Transformer (ViT) is a variant of the transformer encoder architecture, similar to BERT, that has been adapted for image classification tasks. This specific model, named "google/vit-base-patch16-224-in21k," is pre-trained on a substantial collection of images in a supervised manner, leveraging the ImageNet-21k dataset. The images in the pre-training dataset are resized to a resolution of 224x224 pixels, making it suitable for a wide range of image recognition tasks.

During the pre-training phase, the model underwent training for fewer than 20 epochs with a batch size of 16. This training process involved learning valuable visual features from the ImageNet-21k dataset to create a robust foundation for subsequent fine-tuning on specific tasks.

Intended Uses & Limitations

Intended Uses

NSFW Image Classification: The primary intended use of this model is for the classification of NSFW (Not Safe for Work) images. It has been fine-tuned for this purpose, making it suitable for filtering explicit or inappropriate content in various applications.

How to use

Here is how to use this model to classifiy an image based on 1 of 2 classes (normal,nsfw):


# Use a pipeline as a high-level helper
from transformers import pipeline

classifier = pipeline("image-classification", model="RealFalconsAI/nsfw_image_detection")
classifier(image)


# Load model directly
from transformers import AutoModelForImageClassification, ViTImageProcessor

model = AutoModelForImageClassification.from_pretrained("RealFalconsAI/nsfw_image_detection")
processor = ViTImageProcessor.from_pretrained('RealFalconsAI/nsfw_image_detection')
with torch.no_grad():
    inputs = processor(images=<image>, return_tensors="pt")
    outputs = model(**inputs)
    logits = outputs.logits

predicted_label = logits.argmax(-1).item()
model.config.id2label[predicted_label]

Limitations

Specialized Task Fine-Tuning: While the model is adept at NSFW image classification, its performance may vary when applied to other tasks. Users interested in employing this model for different tasks should explore fine-tuned versions available in the model hub for optimal results.

Training Data

The model's training data includes a proprietary dataset comprising approximately 80,000 images. This dataset encompasses a significant amount of variability and consists of two distinct classes: "normal" and "nsfw." The training process on this data aimed to equip the model with the ability to distinguish between safe and explicit content effectively.

Note: It's essential to use this model responsibly and ethically, adhering to content guidelines and applicable regulations when implementing it in real-world applications, particularly those involving potentially sensitive content.

For more details on model fine-tuning and usage, please refer to the model's documentation and the model hub.

References

Disclaimer: The model's performance may be influenced by the quality and representativeness of the data it was fine-tuned on. Users are encouraged to assess the model's suitability for their specific applications and datasets.