Video Moment Retrieval in Practical Setting: A Dataset of Ranked Moments for Imprecise Queries

The benchmark and dataset for the paper "Video Moment Retrieval in Practical Settings: A Dataset of Ranked Moments for Imprecise Queries" is coming soon.

We recommend cloning the code, data, and feature files from the Hugging Face repository at TVR-Ranking.

Getting started

1. Install the requisites

The Python packages we used are listed as follows. Commonly, the most recent versions work well.

conda create --name tvr_ranking python=3.11
conda activate tvr_ranking
pip install pytorch # 2.2.1+cu121
pip install tensorboard 
pip install h5py pandas tqdm easydict pyyaml

2. Download full dataset

For the full dataset, please go down from Hugging Face TVR-Ranking.
The detailed introduction and raw annotations is available at Dataset Introduction.

TVR_Ranking/
  -val.json                  
  -test.json                 
  -train_top01.json
  -train_top20.json
  -train_top40.json
  -video_corpus.json

3. Download features

For the query BERT features, you can download them from Hugging Face TVR-Ranking.
For the video and subtitle features, please request them at TVR.

tar -xf tvr_feature_release.tar.gz -C data/TVR_Ranking/feature

4. Training

# modify the data path first 
sh run_top20.sh

Baseline

(ToDo: running the new version...)
The baseline performance of $NDGC@20$ was shown as follows. Top $N$ moments were comprised of a pseudo training set by the query-caption similarity.

Model	$N$	IoU = 0.3, val	IoU = 0.3, test	IoU = 0.5, val	IoU = 0.5, test	IoU = 0.7, val	IoU = 0.7, test
XML	1	0.1050	0.1047	0.0767	0.0751	0.0287	0.0314
	20	0.1948	0.1964	0.1417	0.1434	0.0519	0.0583
	40	0.2101	0.2110	0.1525	0.1533	0.0613	0.0617
CONQUER	1	0.0979	0.0830	0.0817	0.0686	0.0547	0.0479
	20	0.2007	0.1935	0.1844	0.1803	0.1391	0.1341
	40	0.2094	0.1943	0.1930	0.1825	0.1481	0.1334
ReLoCLNet	1	0.1306	0.1299	0.1169	0.1154	0.0738	0.0789
	20	0.3264	0.3214	0.3007	0.2956	0.2074	0.2084
	40	0.3479	0.3473	0.3221	0.3217	0.2218	0.2275

4. Inferring

[ToDo] The checkpoint can all be accessed from Hugging Face TVR-Ranking.

Citation

If you feel this project helpful to your research, please cite our work.