Video Moment Retrieval in Practical Setting: A Dataset of Ranked Moments for Imprecise Queries
The benchmark and dataset for the paper "Video Moment Retrieval in Practical Settings: A Dataset of Ranked Moments for Imprecise Queries" is coming soon.
We recommend cloning the code, data, and feature files from the Hugging Face repository at TVR-Ranking.
Getting started
1. Install the requisites
The Python packages we used are listed as follows. Commonly, the most recent versions work well.
conda create --name tvr_ranking python=3.11
conda activate tvr_ranking
pip install pytorch # 2.2.1+cu121
pip install tensorboard
pip install h5py pandas tqdm easydict pyyaml
2. Download full dataset
For the full dataset, please go down from Hugging Face TVR-Ranking.
The detailed introduction and raw annotations is available at Dataset Introduction.
TVR_Ranking/
-val.json
-test.json
-train_top01.json
-train_top20.json
-train_top40.json
-video_corpus.json
3. Download features
For the query BERT features, you can download them from Hugging Face TVR-Ranking.
For the video and subtitle features, please request them at TVR.
tar -xf tvr_feature_release.tar.gz -C data/TVR_Ranking/feature
4. Training
# modify the data path first
sh run_top20.sh
Baseline
(ToDo: running the new version...)
The baseline performance of $NDGC@20$ was shown as follows.
Top $N$ moments were comprised of a pseudo training set by the query-caption similarity.
Model | $N$ | IoU = 0.3, val | IoU = 0.3, test | IoU = 0.5, val | IoU = 0.5, test | IoU = 0.7, val | IoU = 0.7, test |
---|---|---|---|---|---|---|---|
XML | 1 | 0.1050 | 0.1047 | 0.0767 | 0.0751 | 0.0287 | 0.0314 |
20 | 0.1948 | 0.1964 | 0.1417 | 0.1434 | 0.0519 | 0.0583 | |
40 | 0.2101 | 0.2110 | 0.1525 | 0.1533 | 0.0613 | 0.0617 | |
CONQUER | 1 | 0.0979 | 0.0830 | 0.0817 | 0.0686 | 0.0547 | 0.0479 |
20 | 0.2007 | 0.1935 | 0.1844 | 0.1803 | 0.1391 | 0.1341 | |
40 | 0.2094 | 0.1943 | 0.1930 | 0.1825 | 0.1481 | 0.1334 | |
ReLoCLNet | 1 | 0.1306 | 0.1299 | 0.1169 | 0.1154 | 0.0738 | 0.0789 |
20 | 0.3264 | 0.3214 | 0.3007 | 0.2956 | 0.2074 | 0.2084 | |
40 | 0.3479 | 0.3473 | 0.3221 | 0.3217 | 0.2218 | 0.2275 |
4. Inferring
[ToDo] The checkpoint can all be accessed from Hugging Face TVR-Ranking.
Citation
If you feel this project helpful to your research, please cite our work.