File size: 3,276 Bytes
f53b39e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
## Generation of crops from the real datasets

The instructions below allow to generate the crops used for pre-training CroCo v2 from the following real-world datasets: ARKitScenes, MegaDepth, 3DStreetView and IndoorVL.

### Download the metadata of the crops to generate 

First, download the metadata and put them in `./data/`:
```
mkdir -p data
cd data/
wget https://download.europe.naverlabs.com/ComputerVision/CroCo/data/crop_metadata.zip
unzip crop_metadata.zip
rm crop_metadata.zip
cd ..
```

### Prepare the original datasets 

Second, download the original datasets in `./data/original_datasets/`.
```
mkdir -p data/original_datasets
```

##### ARKitScenes

Download the `raw` dataset from https://github.com/apple/ARKitScenes/blob/main/DATA.md and put it in `./data/original_datasets/ARKitScenes/`.
The resulting file structure should be like:
```
./data/original_datasets/ARKitScenes/
└───Training
    └───40753679
     β”‚  β”‚   ultrawide
     β”‚  β”‚   ...
     └───40753686
     β”‚   
      ...
```

##### MegaDepth

Download `MegaDepth v1 Dataset` from https://www.cs.cornell.edu/projects/megadepth/ and put it in `./data/original_datasets/MegaDepth/`.
The resulting file structure should be like:

```
./data/original_datasets/MegaDepth/
└───0000
β”‚   └───images
β”‚    β”‚      β”‚   1000557903_87fa96b8a4_o.jpg
β”‚    β”‚      β”” ...
β”‚    └─── ...
└───0001
β”‚   β”‚   
β”‚   β”” ...
└─── ...
```

##### 3DStreetView

Download `3D_Street_View` dataset from https://github.com/amir32002/3D_Street_View and put it in `./data/original_datasets/3DStreetView/`.
The resulting file structure should be like:

``` 
./data/original_datasets/3DStreetView/
└───dataset_aligned
β”‚   └───0002
β”‚    β”‚      β”‚   0000002_0000001_0000002_0000001.jpg
β”‚    β”‚      β”” ...
β”‚    └─── ...
└───dataset_unaligned
β”‚   └───0003
β”‚    β”‚      β”‚   0000003_0000001_0000002_0000001.jpg
β”‚    β”‚      β”” ...
β”‚    └─── ...
```

##### IndoorVL

Download the `IndoorVL` datasets using [Kapture](https://github.com/naver/kapture).

```
pip install kapture
mkdir -p ./data/original_datasets/IndoorVL
cd ./data/original_datasets/IndoorVL
kapture_download_dataset.py update
kapture_download_dataset.py install  "HyundaiDepartmentStore_*"
kapture_download_dataset.py install  "GangnamStation_*"
cd -
```

### Extract the crops

Now, extract the crops for each of the dataset:
```
for dataset in ARKitScenes MegaDepth 3DStreetView IndoorVL; 
do 
  python3 datasets/crops/extract_crops_from_images.py --crops ./data/crop_metadata/${dataset}/crops_release.txt --root-dir ./data/original_datasets/${dataset}/ --output-dir ./data/${dataset}_crops/ --imsize 256 --nthread 8 --max-subdir-levels 5 --ideal-number-pairs-in-dir 500;
done
```

##### Note for IndoorVL

Due to some legal issues, we can only release 144,228 pairs out of the 1,593,689 pairs used in the paper.
To account for it in terms of number of pre-training iterations, the pre-training command in this repository uses 125 training epochs including 12 warm-up epochs and learning rate cosine schedule of 250, instead of 100, 10 and 200 respectively.
The impact on the performance is negligible.