metadata

license: cc-by-nc-4.0
tags:
  - vision
  - video-classification
language:
  - en
pipeline_tag: video-classification

FAL - Framework For Automated Labeling Of Videos (FALVideoClassifier)

FAL (Framework for Automated Labeling Of Videos) is a custom video classification model developed by SVECTOR and fine-tuned on the FAL-500 dataset. This model is designed for efficient video understanding and classification, leveraging state-of-the-art video processing techniques.

Model Overview

This model, referred to as FALVideoClassifier, fine-tuned on FAL-500 Dataset, and optimized for automated video labeling tasks. It is capable of classifying a video into one of the 5 00 possible labels from the FAL-500 dataset.

This model was developed by SVECTOR as part of our initiative to advance automated video understanding and classification technologies.

Intended Uses & Limitations

This model is designed for video classification tasks, and you can use it to classify videos into one of the 500 classes from the FAL-500 dataset. Please note that the model was trained on FAL-500 and may not perform as well on datasets that significantly differ from this.

Intended Use:

Automated video labeling
Video content classification
Research in video understanding and machine learning

Limitations:

Only trained on FAL-500
May not generalize well to out-of-domain videos without further fine-tuning
Requires videos to be pre-processed (such as resizing frames, normalization, etc.)

How to Use

To use this model for video classification, follow these steps:

Installation:

Ensure you have the necessary dependencies installed:

pip install torch torchvision transformers

Code Example:

Here is an example Python code snippet for using the FAL model to classify a video:

from transformers import AutoImageProcessor, FALVideoClassifierForVideoClassification
import numpy as np
import torch

# Simulating a sample video (8 frames of size 224x224 with 3 color channels)
video = list(np.random.randn(8, 3, 224, 224))  # 8 frames, each of size 224x224 with RGB channels

# Load the image processor and model
processor = AutoImageProcessor.from_pretrained("SVECTOR-CORPORATION/FAL")
model = FALVideoClassifierForVideoClassification.from_pretrained("SVECTOR-CORPORATION/FAL")

# Pre-process the video input
inputs = processor(video, return_tensors="pt")

# Run inference with no gradient calculation (evaluation mode)
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Find the predicted class (highest logit)
predicted_class_idx = logits.argmax(-1).item()

# Output the predicted label
print("Predicted class:", model.config.id2label[predicted_class_idx])

Model Details:

Model Name: FALVideoClassifier
Dataset Used: FAL-S500
Input Size: 8 frames of size 224x224 with 3 color channels (RGB)

Configuration:

The FALVideoClassifier uses the following hyperparameters:

num_frames: Number of frames in the video (e.g., 8)
num_labels: The number of possible video classes (500 for FAL-500)
hidden_size: Hidden size for transformer layers (768)
attention_probs_dropout_prob: Dropout probability for attention layers (0.0)
hidden_dropout_prob: Dropout probability for the hidden layers (0.0)
drop_path_rate: Dropout rate for stochastic depth (0.0)

Preprocessing:

Before feeding videos into the model, ensure the frames are properly pre-processed:

Resize frames to 224x224
Normalize pixel values (use the processor from the model, as shown in the code)

License

This model is licensed under the CC-BY-NC-4.0 license, which means it can be used for non-commercial purposes with proper attribution.

Citation

If you use this model in your research or projects, please cite the following:

@misc{svector2024fal,
  title={FAL - Framework For Automated Labeling Of Videos (FALVideoClassifier)},
  author={SVECTOR},
  year={2024},
  url={https://www.svector.co.in},
  note={Accessed: 2024-12-19}
}

Contact

For any inquiries regarding this model or its implementation, you can contact the SVECTOR team at ai@svector.com.