|
## Generation of crops from the real datasets |
|
|
|
The instructions below allow to generate the crops used for pre-training CroCo v2 from the following real-world datasets: ARKitScenes, MegaDepth, 3DStreetView and IndoorVL. |
|
|
|
### Download the metadata of the crops to generate |
|
|
|
First, download the metadata and put them in `./data/`: |
|
``` |
|
mkdir -p data |
|
cd data/ |
|
wget https://download.europe.naverlabs.com/ComputerVision/CroCo/data/crop_metadata.zip |
|
unzip crop_metadata.zip |
|
rm crop_metadata.zip |
|
cd .. |
|
``` |
|
|
|
### Prepare the original datasets |
|
|
|
Second, download the original datasets in `./data/original_datasets/`. |
|
``` |
|
mkdir -p data/original_datasets |
|
``` |
|
|
|
##### ARKitScenes |
|
|
|
Download the `raw` dataset from https://github.com/apple/ARKitScenes/blob/main/DATA.md and put it in `./data/original_datasets/ARKitScenes/`. |
|
The resulting file structure should be like: |
|
``` |
|
./data/original_datasets/ARKitScenes/ |
|
ββββTraining |
|
ββββ40753679 |
|
β β ultrawide |
|
β β ... |
|
ββββ40753686 |
|
β |
|
... |
|
``` |
|
|
|
##### MegaDepth |
|
|
|
Download `MegaDepth v1 Dataset` from https://www.cs.cornell.edu/projects/megadepth/ and put it in `./data/original_datasets/MegaDepth/`. |
|
The resulting file structure should be like: |
|
|
|
``` |
|
./data/original_datasets/MegaDepth/ |
|
ββββ0000 |
|
β ββββimages |
|
β β β 1000557903_87fa96b8a4_o.jpg |
|
β β β ... |
|
β ββββ ... |
|
ββββ0001 |
|
β β |
|
β β ... |
|
ββββ ... |
|
``` |
|
|
|
##### 3DStreetView |
|
|
|
Download `3D_Street_View` dataset from https://github.com/amir32002/3D_Street_View and put it in `./data/original_datasets/3DStreetView/`. |
|
The resulting file structure should be like: |
|
|
|
``` |
|
./data/original_datasets/3DStreetView/ |
|
ββββdataset_aligned |
|
β ββββ0002 |
|
β β β 0000002_0000001_0000002_0000001.jpg |
|
β β β ... |
|
β ββββ ... |
|
ββββdataset_unaligned |
|
β ββββ0003 |
|
β β β 0000003_0000001_0000002_0000001.jpg |
|
β β β ... |
|
β ββββ ... |
|
``` |
|
|
|
##### IndoorVL |
|
|
|
Download the `IndoorVL` datasets using [Kapture](https://github.com/naver/kapture). |
|
|
|
``` |
|
pip install kapture |
|
mkdir -p ./data/original_datasets/IndoorVL |
|
cd ./data/original_datasets/IndoorVL |
|
kapture_download_dataset.py update |
|
kapture_download_dataset.py install "HyundaiDepartmentStore_*" |
|
kapture_download_dataset.py install "GangnamStation_*" |
|
cd - |
|
``` |
|
|
|
### Extract the crops |
|
|
|
Now, extract the crops for each of the dataset: |
|
``` |
|
for dataset in ARKitScenes MegaDepth 3DStreetView IndoorVL; |
|
do |
|
python3 datasets/crops/extract_crops_from_images.py --crops ./data/crop_metadata/${dataset}/crops_release.txt --root-dir ./data/original_datasets/${dataset}/ --output-dir ./data/${dataset}_crops/ --imsize 256 --nthread 8 --max-subdir-levels 5 --ideal-number-pairs-in-dir 500; |
|
done |
|
``` |
|
|
|
##### Note for IndoorVL |
|
|
|
Due to some legal issues, we can only release 144,228 pairs out of the 1,593,689 pairs used in the paper. |
|
To account for it in terms of number of pre-training iterations, the pre-training command in this repository uses 125 training epochs including 12 warm-up epochs and learning rate cosine schedule of 250, instead of 100, 10 and 200 respectively. |
|
The impact on the performance is negligible. |
|
|