English
TVR-Ranking / README.md
Liangrj5
init
5019d3f
|
raw
history blame
3.5 kB

Video Moment Retrieval in Practical Setting: A Dataset of Ranked Moments for Imprecise Queries

The benchmark and dataset for the paper "Video Moment Retrieval in Practical Settings: A Dataset of Ranked Moments for Imprecise Queries" is coming soon.

We recommend cloning the code, data, and feature files from the Hugging Face repository at TVR-Ranking.

TVR_Ranking_overview

Getting started

1. Install the requisites

The Python packages we used are listed as follows. Commonly, the most recent versions work well.

conda create --name tvr_ranking python=3.11
conda activate tvr_ranking
pip install pytorch # 2.2.1+cu121
pip install tensorboard 
pip install h5py pandas tqdm easydict pyyaml

2. Download full dataset

For the full dataset, please go down from Hugging Face TVR-Ranking.
The detailed introduction and raw annotations is available at Dataset Introduction.

TVR_Ranking/
  -val.json                  
  -test.json                 
  -train_top01.json
  -train_top20.json
  -train_top40.json
  -video_corpus.json

3. Download features

For the query BERT features, you can download them from Hugging Face TVR-Ranking.
For the video and subtitle features, please request them at TVR.

tar -xf tvr_feature_release.tar.gz -C data/TVR_Ranking/feature

4. Training

# modify the data path first 
sh run_top20.sh

Baseline

(ToDo: running the new version...)
The baseline performance of $NDGC@20$ was shown as follows. Top $N$ moments were comprised of a pseudo training set by the query-caption similarity.

Model $N$ IoU = 0.3, val IoU = 0.3, test IoU = 0.5, val IoU = 0.5, test IoU = 0.7, val IoU = 0.7, test
XML 1 0.1050 0.1047 0.0767 0.0751 0.0287 0.0314
20 0.1948 0.1964 0.1417 0.1434 0.0519 0.0583
40 0.2101 0.2110 0.1525 0.1533 0.0613 0.0617
CONQUER 1 0.0979 0.0830 0.0817 0.0686 0.0547 0.0479
20 0.2007 0.1935 0.1844 0.1803 0.1391 0.1341
40 0.2094 0.1943 0.1930 0.1825 0.1481 0.1334
ReLoCLNet 1 0.1306 0.1299 0.1169 0.1154 0.0738 0.0789
20 0.3264 0.3214 0.3007 0.2956 0.2074 0.2084
40 0.3479 0.3473 0.3221 0.3217 0.2218 0.2275

4. Inferring

[ToDo] The checkpoint can all be accessed from Hugging Face TVR-Ranking.

Citation

If you feel this project helpful to your research, please cite our work.