Spaces:
Running
Running
File size: 3,276 Bytes
4dfb78b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
## Generation of crops from the real datasets
The instructions below allow to generate the crops used for pre-training CroCo v2 from the following real-world datasets: ARKitScenes, MegaDepth, 3DStreetView and IndoorVL.
### Download the metadata of the crops to generate
First, download the metadata and put them in `./data/`:
```
mkdir -p data
cd data/
wget https://download.europe.naverlabs.com/ComputerVision/CroCo/data/crop_metadata.zip
unzip crop_metadata.zip
rm crop_metadata.zip
cd ..
```
### Prepare the original datasets
Second, download the original datasets in `./data/original_datasets/`.
```
mkdir -p data/original_datasets
```
##### ARKitScenes
Download the `raw` dataset from https://github.com/apple/ARKitScenes/blob/main/DATA.md and put it in `./data/original_datasets/ARKitScenes/`.
The resulting file structure should be like:
```
./data/original_datasets/ARKitScenes/
ββββTraining
ββββ40753679
β β ultrawide
β β ...
ββββ40753686
β
...
```
##### MegaDepth
Download `MegaDepth v1 Dataset` from https://www.cs.cornell.edu/projects/megadepth/ and put it in `./data/original_datasets/MegaDepth/`.
The resulting file structure should be like:
```
./data/original_datasets/MegaDepth/
ββββ0000
β ββββimages
β β β 1000557903_87fa96b8a4_o.jpg
β β β ...
β ββββ ...
ββββ0001
β β
β β ...
ββββ ...
```
##### 3DStreetView
Download `3D_Street_View` dataset from https://github.com/amir32002/3D_Street_View and put it in `./data/original_datasets/3DStreetView/`.
The resulting file structure should be like:
```
./data/original_datasets/3DStreetView/
ββββdataset_aligned
β ββββ0002
β β β 0000002_0000001_0000002_0000001.jpg
β β β ...
β ββββ ...
ββββdataset_unaligned
β ββββ0003
β β β 0000003_0000001_0000002_0000001.jpg
β β β ...
β ββββ ...
```
##### IndoorVL
Download the `IndoorVL` datasets using [Kapture](https://github.com/naver/kapture).
```
pip install kapture
mkdir -p ./data/original_datasets/IndoorVL
cd ./data/original_datasets/IndoorVL
kapture_download_dataset.py update
kapture_download_dataset.py install "HyundaiDepartmentStore_*"
kapture_download_dataset.py install "GangnamStation_*"
cd -
```
### Extract the crops
Now, extract the crops for each of the dataset:
```
for dataset in ARKitScenes MegaDepth 3DStreetView IndoorVL;
do
python3 datasets/crops/extract_crops_from_images.py --crops ./data/crop_metadata/${dataset}/crops_release.txt --root-dir ./data/original_datasets/${dataset}/ --output-dir ./data/${dataset}_crops/ --imsize 256 --nthread 8 --max-subdir-levels 5 --ideal-number-pairs-in-dir 500;
done
```
##### Note for IndoorVL
Due to some legal issues, we can only release 144,228 pairs out of the 1,593,689 pairs used in the paper.
To account for it in terms of number of pre-training iterations, the pre-training command in this repository uses 125 training epochs including 12 warm-up epochs and learning rate cosine schedule of 250, instead of 100, 10 and 200 respectively.
The impact on the performance is negligible.
|