trace / README.md
Yongxin-Guo's picture
Create README.md
d7ebfc3 verified
|
raw
history blame
1.59 kB
metadata
license: apache-2.0
language:
  - en
base_model:
  - mistralai/Mistral-7B-Instruct-v0.2
tags:
  - video temporal grounding
  - dense video caption
  - video highlight detection

Overview

In this work

  • We model the videos by a series of events, and propose causal event modeling framework to capture videos' inherent structure.
  • We present a novel task-interleaved video LLM model, TRACE, tailored to implement the causal event modeling framework through the sequential encoding/decoding of timestamps, salient scores, and textual captions.

Model Zoo

Checkpoints Description URL
Initialization Weights initialized from VideoLLaMA2 trace-init
Stage-1 Model checkpoints trained after stage-1 trace-stage1
Stage-2 Model checkpoints trained after stage-2 trace

Results

Youcook2 (Zero-Shot) CIDER METEOR SODA_c F1
TRACE 8.1 2.8 2.2 22.4
Charades-STA (Zero-Shot) 0.3 0.5 0.7 mIOU
TRACE 58.6 40.3 19.4 38.7
QVHighlights (Zero-Shot) mAP Hit@1
TRACE 26.8 42.7
ActivityNet-DVC CIDER METEOR SODA_c F1
TRACE 25.9 6.0 6.4 39.3
ActivityNet-MR 0.3 0.5 0.7 mIOU
TRACE 53.0 37.7 24.0 39.0