MCG-NJU
/

videomae-base-short-ssv2

Video Classification

Inference Endpoints

Model card Files Files and versions Community

nielsr HF staff commited on Apr 22, 2023

Commit

4939373

•

1 Parent(s): 9100fa5

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -28,17 +28,17 @@ You can use the raw model for predicting pixel values for masked patches of a vi
 Here is how to use this model to predict pixel values for randomly masked patches:
 ```python
-from transformers import VideoMAEFeatureExtractor, VideoMAEForPreTraining
 import numpy as np
 import torch
 num_frames = 16
 video = list(np.random.randn(16, 3, 224, 224))
-feature_extractor = VideoMAEFeatureExtractor.from_pretrained("MCG-NJU/videomae-base-short-ssv2")
 model = VideoMAEForPreTraining.from_pretrained("MCG-NJU/videomae-base-short-ssv2")
-pixel_values = feature_extractor(video, return_tensors="pt").pixel_values
 num_patches_per_frame = (model.config.image_size // model.config.patch_size) ** 2
 seq_length = (num_frames // model.config.tubelet_size) * num_patches_per_frame

 Here is how to use this model to predict pixel values for randomly masked patches:
 ```python
+from transformers import VideoMAEImageProcessor, VideoMAEForPreTraining
 import numpy as np
 import torch
 num_frames = 16
 video = list(np.random.randn(16, 3, 224, 224))
+processor = VideoMAEImageProcessor.from_pretrained("MCG-NJU/videomae-base-short-ssv2")
 model = VideoMAEForPreTraining.from_pretrained("MCG-NJU/videomae-base-short-ssv2")
+pixel_values = processor(video, return_tensors="pt").pixel_values
 num_patches_per_frame = (model.config.image_size // model.config.patch_size) ** 2
 seq_length = (num_frames // model.config.tubelet_size) * num_patches_per_frame