|
--- |
|
license: cc |
|
datasets: |
|
- axgroup/Ranking_TVR |
|
language: |
|
- en |
|
--- |
|
# Video Moment Retrieval in Practical Setting: A Dataset of Ranked Moments for Imprecise Queries |
|
|
|
The benchmark and dataset for the paper [Video Moment Retrieval in Practical Settings: A Dataset of Ranked Moments for Imprecise Queries](https://arxiv.org/abs/2407.06597). |
|
|
|
We recommend cloning the code, data, and feature files from the Hugging Face repository at [TVR-Ranking](https://huggingface.co/axgroup/TVR-Ranking). This repository only includes the code for ReLoCLNet_RVMR. You can download the other baseline models from [XML_RVMR](https://huggingface.co/LiangRenjie/XML_RVMR) and [CONQUER_RVMR](https://huggingface.co/LiangRenjie/CONQUER_RVMR). |
|
|
|
![TVR_Ranking_overview](./figures/taskComparisonV.png) |
|
|
|
|
|
|
|
|
|
## Getting started |
|
### 1. Install the requisites |
|
|
|
The Python packages we used are listed as follows. Commonly, the most recent versions work well. |
|
|
|
|
|
```shell |
|
conda create --name tvr_ranking python=3.11 |
|
conda activate tvr_ranking |
|
pip install pytorch # 2.2.1+cu121 |
|
pip install tensorboard |
|
pip install h5py pandas tqdm easydict pyyaml |
|
``` |
|
|
|
### 2. Download full dataset |
|
For the full dataset, please go down from Hugging Face [TVR-Ranking](https://huggingface.co/axgroup/TVR-Ranking). \ |
|
The detailed introduction and raw annotations is available at [Dataset Introduction](data/TVR_Ranking/readme.md). |
|
|
|
|
|
``` |
|
TVR_Ranking/ |
|
-val.json |
|
-test.json |
|
-train_top01.json |
|
-train_top20.json |
|
-train_top40.json |
|
-video_corpus.json |
|
``` |
|
|
|
### 3. Download features |
|
|
|
For the query BERT features, you can download them from Hugging Face [TVR-Ranking](https://huggingface.co/axgroup/TVR-Ranking). \ |
|
For the video and subtitle features, please request them at [TVR](https://tvr.cs.unc.edu/). |
|
|
|
```shell |
|
tar -xf tvr_feature_release.tar.gz -C data/TVR_Ranking/feature |
|
``` |
|
|
|
### 4. Training |
|
```shell |
|
# modify the data path first |
|
sh run_top20.sh |
|
``` |
|
### 5. Inferring |
|
The checkpoint can all be accessed from Hugging Face [TVR-Ranking](https://huggingface.co/axgroup/TVR-Ranking). |
|
```shell |
|
sh infer_top20.sh |
|
``` |
|
|
|
## Experiment Results |
|
### Baseline |
|
The baseline performance of $NDGC@40$ was shown as follows. |
|
Top $N$ moments were comprised of a pseudo training set by the query-caption similarity. |
|
|
|
| **Model** | **Train Set Top N** | **IoU=0.3** | |**IoU=0.5** | |**IoU=0.7** | | |
|
|----------------|---------------------|--------------|--------------|--------------|--------------|--------------|--------------| |
|
| | | **Val** | **Test** | **Val** | **Test** | **Val** | **Test** | |
|
| **XML** | 1 | 0.1077 | 0.1016 | 0.0775 | 0.0727 | 0.0273 | 0.0294 | |
|
| | 20 | 0.2580 | 0.2512 | 0.1874 | 0.1853 | 0.0705 | 0.0753 | |
|
| | 40 | 0.2408 | 0.2432 | 0.1740 | 0.1791 | 0.0666 | 0.0720 | |
|
| **CONQUER** | 1 | 0.0952 | 0.0835 | 0.0808 | 0.0687 | 0.0526 | 0.0484 | |
|
| | 20 | 0.2130 | 0.1995 | 0.1976 | 0.1867 | 0.1527 | 0.1368 | |
|
| | 40 | 0.2183 | 0.1968 | 0.2022 | 0.1851 | 0.1524 | 0.1365 | |
|
| **ReLoCLNet** | 1 | 0.1533 | 0.1489 | 0.1321 | 0.1304 | 0.0878 | 0.0869 | |
|
| | 20 | 0.4039 | 0.4031 | 0.3656 | 0.3648 | 0.2542 | 0.2567 | |
|
| | 40 | 0.4725 | 0.4735 | 0.4337 | 0.4337 | 0.3015 | 0.3079 | |
|
|
|
|
|
### ReLoCLNet Performance |
|
|
|
| **Model** | **Train Set Top N** | **IoU=0.3** | |**IoU=0.5** | |**IoU=0.7** | | |
|
|------------|---------------------|--------------|--------------|--------------|--------------|--------------|--------------| |
|
| | | **Val** | **Test** | **Val** | **Test** | **Val** | **Test** | |
|
| **NDCG@10** | | | | | | | | |
|
| ReLoCLNet | 1 | 0.1575 | 0.1525 | 0.1358 | 0.1349 | 0.0908 | 0.0916 | |
|
| ReLoCLNet | 20 | 0.3751 | 0.3751 | 0.3407 | 0.3397 | 0.2316 | 0.2338 | |
|
| ReLoCLNet | 40 | 0.4339 | 0.4353 | 0.3984 | 0.3986 | 0.2693 | 0.2807 | |
|
| **NDCG@20** | | | | | | | | |
|
| ReLoCLNet | 1 | 0.1504 | 0.1439 | 0.1303 | 0.1269 | 0.0866 | 0.0849 | |
|
| ReLoCLNet | 20 | 0.3815 | 0.3792 | 0.3462 | 0.3427 | 0.2381 | 0.2386 | |
|
| ReLoCLNet | 40 | 0.4418 | 0.4439 | 0.4060 | 0.4059 | 0.2787 | 0.2877 | |
|
| **NDCG@40** | | | | | | | | |
|
| ReLoCLNet | 1 | 0.1533 | 0.1489 | 0.1321 | 0.1304 | 0.0878 | 0.0869 | |
|
| ReLoCLNet | 20 | 0.4039 | 0.4031 | 0.3656 | 0.3648 | 0.2542 | 0.2567 | |
|
| ReLoCLNet | 40 | 0.4725 | 0.4735 | 0.4337 | 0.4337 | 0.3015 | 0.3079 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Citation |
|
If you feel this project helpful to your research, please cite our work. |
|
``` |
|
|
|
``` |