--- license: apache-2.0 pipeline_tag: image-classification library_name: transformers tags: - deep-fake - ViT - detection - Image - transformers-4.49.0.dev0 base_model: - google/vit-base-patch16-224-in21k --- ![df[ViT].gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Xbuv-x40-l3QjzWu5Yj2F.gif) # **Deep-Fake-Detector-Model** # **Overview** The **Deep-Fake-Detector-Model** is a state-of-the-art deep learning model designed to detect deepfake images. It leverages the **Vision Transformer (ViT)** architecture, specifically the `google/vit-base-patch16-224-in21k` model, fine-tuned on a dataset of real and deepfake images. The model is trained to classify images as either "Real" or "Fake" with high accuracy, making it a powerful tool for detecting manipulated media. **Update :** The previous model checkpoint was obtained using a smaller classification dataset. Although it performed well in evaluation scores, its real-time performance was average due to limited variations in the training set. The new update includes a larger dataset to improve the detection of fake images. | Repository | Link | |------------|------| | Deep Fake Detector Model | [GitHub Repository](https://github.com/PRITHIVSAKTHIUR/Deep-Fake-Detector-Model) | # **Key Features** - **Architecture**: Vision Transformer (ViT) - `google/vit-base-patch16-224-in21k`. - **Input**: RGB images resized to 224x224 pixels. - **Output**: Binary classification ("Real" or "Fake"). - **Training Dataset**: A curated dataset of real and deepfake images. - **Fine-Tuning**: The model is fine-tuned using Hugging Face's `Trainer` API with advanced data augmentation techniques. - **Performance**: Achieves high accuracy and F1 score on validation and test datasets. # **Model Architecture** The model is based on the **Vision Transformer (ViT)**, which treats images as sequences of patches and applies a transformer encoder to learn spatial relationships. Key components include: - **Patch Embedding**: Divides the input image into fixed-size patches (16x16 pixels). - **Transformer Encoder**: Processes patch embeddings using multi-head self-attention mechanisms. - **Classification Head**: A fully connected layer for binary classification. # **Training Details** - **Optimizer**: AdamW with a learning rate of `1e-6`. - **Batch Size**: 32 for training, 8 for evaluation. - **Epochs**: 2. - **Data Augmentation**: - Random rotation (±90 degrees). - Random sharpness adjustment. - Random resizing and cropping. - **Loss Function**: Cross-Entropy Loss. - **Evaluation Metrics**: Accuracy, F1 Score, and Confusion Matrix. # **Inference with Hugging Face Pipeline** ```python from transformers import pipeline # Load the model pipe = pipeline('image-classification', model="prithivMLmods/Deep-Fake-Detector-Model", device=0) # Predict on an image result = pipe("path_to_image.jpg") print(result) ``` # **Inference with PyTorch** ```python from transformers import ViTForImageClassification, ViTImageProcessor from PIL import Image import torch # Load the model and processor model = ViTForImageClassification.from_pretrained("prithivMLmods/Deep-Fake-Detector-Model") processor = ViTImageProcessor.from_pretrained("prithivMLmods/Deep-Fake-Detector-Model") # Load and preprocess the image image = Image.open("path_to_image.jpg").convert("RGB") inputs = processor(images=image, return_tensors="pt") # Perform inference with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits predicted_class = torch.argmax(logits, dim=1).item() # Map class index to label label = model.config.id2label[predicted_class] print(f"Predicted Label: {label}") ``` # **Performance Metrics** ``` Classification report: precision recall f1-score support Real 0.6276 0.9823 0.7659 38054 Fake 0.9594 0.4176 0.5819 38080 accuracy 0.6999 76134 macro avg 0.7935 0.7000 0.6739 76134 weighted avg 0.7936 0.6999 0.6739 76134 ``` ![Untitled.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/MoxwukbZZZuVpvXHstxsw.png) - **Confusion Matrix**: ``` [[True Positives, False Negatives], [False Positives, True Negatives]] ``` # **Dataset** The model is fine-tuned on the dataset, which contains: - **Real Images**: Authentic images of human faces. - **Fake Images**: Deepfake images generated using advanced AI techniques. # **Limitations** The model is trained on a specific dataset and may not generalize well to other deepfake datasets or domains. - Performance may degrade on low-resolution or heavily compressed images. - The model is designed for image classification and does not detect deepfake videos directly. # **Ethical Considerations** **Misuse**: This model should not be used for malicious purposes, such as creating or spreading deepfakes. **Bias**: The model may inherit biases from the training dataset. Care should be taken to ensure fairness and inclusivity. **Transparency**: Users should be informed when deepfake detection tools are used to analyze their content. # **Future Work** - Extend the model to detect deepfake videos. - Improve generalization by training on larger and more diverse datasets. - Incorporate explainability techniques to provide insights into model predictions. # **Citation** ```bibtex @misc{Deep-Fake-Detector-Model, author = {prithivMLmods}, title = {Deep-Fake-Detector-Model}, initial = {21 Mar 2024}, last_updated = {31 Jan 2025} }