mazpie's picture
Initial commit
2d9a728
|
raw
history blame
5.76 kB

Model Zoo

Note

  • For all the pretraining and finetuning, we adopt spaese/uniform sampling.
  • #Frame $=$ #input_frame $\times$ #crop $\times$ #clip
  • #input_frame means how many frames are input for model per inference
  • #crop means spatial crops (e.g., 3 for left/right/center)
  • #clip means temporal clips (e.g., 4 means repeted sampling four clips with different start indices)

Pretraining

Model Setting Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash-1.1M 300e TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash-2M 300e TBD run.sh

Finetuning

K710

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT 8x3x4 87.6 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT 8x3x4 88.1 TBD run.sh

K400

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 8x3x4 91.3 TBD run.sh
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 16x3x4 91.6 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 8x3x4 91.9 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 16x3x4 92.1 TBD run.sh

K600

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 8x3x4 91.4 TBD run.sh
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 16x3x4 91.6 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 8x3x4 91.7 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 16x3x4 91.9 TBD run.sh

K700

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 8x3x4 85.0 TBD run.sh
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 16x3x4 85.4 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 8x3x4 85.7 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 16x3x4 85.9 TBD run.sh

MiT V1

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT + K400 FT 8x3x4 50.8 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT + K400 FT 8x3x4 51.0 TBD run.sh
$\text{InternVideo2}_{s1}$-6B 336↑ K-Mash PT + K710 FT + K400 FT 8x3x4 51.2 TBD run.sh

SthSth V1

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT 8x3x4 68.5 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT 8x3x4 69.7 TBD run.sh

SthSth V2

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT 8x3x4 77.1 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT 8x3x4 77.5 TBD run.sh

ANet

Model Setting #Frame Top-1 mAP Model Shell
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT + K400 FT 8x3x4 95.9 98.2 TBD run.sh

HACS

Model Setting #Frame Top-1 mAP Model Shell
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT + K400 FT 8x3x4 97.0 98.8 TBD run.sh