File size: 3,499 Bytes
5019d3f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# Video Moment Retrieval in Practical Setting: A Dataset of Ranked Moments for Imprecise Queries
The benchmark and dataset for the paper "Video Moment Retrieval in Practical Settings: A Dataset of Ranked Moments for Imprecise Queries" is coming soon.
We recommend cloning the code, data, and feature files from the Hugging Face repository at [TVR-Ranking](https://huggingface.co/axgroup/TVR-Ranking).
![TVR_Ranking_overview](./figures/taskComparisonV.png)
## Getting started
### 1. Install the requisites
The Python packages we used are listed as follows. Commonly, the most recent versions work well.
```shell
conda create --name tvr_ranking python=3.11
conda activate tvr_ranking
pip install pytorch # 2.2.1+cu121
pip install tensorboard
pip install h5py pandas tqdm easydict pyyaml
```
### 2. Download full dataset
For the full dataset, please go down from Hugging Face [TVR-Ranking](https://huggingface.co/axgroup/TVR-Ranking). \
The detailed introduction and raw annotations is available at [Dataset Introduction](data/TVR_Ranking/readme.md).
```
TVR_Ranking/
-val.json
-test.json
-train_top01.json
-train_top20.json
-train_top40.json
-video_corpus.json
```
### 3. Download features
For the query BERT features, you can download them from Hugging Face [TVR-Ranking](https://huggingface.co/axgroup/TVR-Ranking). \
For the video and subtitle features, please request them at [TVR](https://tvr.cs.unc.edu/).
```shell
tar -xf tvr_feature_release.tar.gz -C data/TVR_Ranking/feature
```
### 4. Training
```shell
# modify the data path first
sh run_top20.sh
```
## Baseline
(ToDo: running the new version...) \
The baseline performance of $NDGC@20$ was shown as follows.
Top $N$ moments were comprised of a pseudo training set by the query-caption similarity.
| Model | $N$ | IoU = 0.3, val | IoU = 0.3, test | IoU = 0.5, val | IoU = 0.5, test | IoU = 0.7, val | IoU = 0.7, test |
|----------------|-----|----------------|-----------------|----------------|-----------------|----------------|-----------------|
| **XML** | 1 | 0.1050 | 0.1047 | 0.0767 | 0.0751 | 0.0287 | 0.0314 |
| | 20 | 0.1948 | 0.1964 | 0.1417 | 0.1434 | 0.0519 | 0.0583 |
| | 40 | 0.2101 | 0.2110 | 0.1525 | 0.1533 | 0.0613 | 0.0617 |
| **CONQUER** | 1 | 0.0979 | 0.0830 | 0.0817 | 0.0686 | 0.0547 | 0.0479 |
| | 20 | 0.2007 | 0.1935 | 0.1844 | 0.1803 | 0.1391 | 0.1341 |
| | 40 | 0.2094 | 0.1943 | 0.1930 | 0.1825 | 0.1481 | 0.1334 |
| **ReLoCLNet** | 1 | 0.1306 | 0.1299 | 0.1169 | 0.1154 | 0.0738 | 0.0789 |
| | 20 | 0.3264 | 0.3214 | 0.3007 | 0.2956 | 0.2074 | 0.2084 |
| | 40 | 0.3479 | 0.3473 | 0.3221 | 0.3217 | 0.2218 | 0.2275 |
### 4. Inferring
[ToDo] The checkpoint can all be accessed from Hugging Face [TVR-Ranking](https://huggingface.co/axgroup/TVR-Ranking).
## Citation
If you feel this project helpful to your research, please cite our work.
```
```
|