third_party/pram/README.md · Realcat/image-matching-webui at f77c97c11a88978534b901b9986b362fe617543f

PRAM: Place Recognition Anywhere Model for Efficient Visual Localization

Humans localize themselves efficiently in known environments by first recognizing landmarks defined on certain objects and their spatial relationships, and then verifying the location by aligning detailed structures of recognized objects with those in the memory. Inspired by this, we propose the place recognition anywhere model (PRAM) to perform visual localization as efficiently as humans do. PRAM consists of two main components - recognition and registration. In detail, first of all, a self-supervised map-centric landmark definition strategy is adopted, making places in either indoor or outdoor scenes act as unique landmarks. Then, sparse keypoints extracted from images, are utilized as the input to a transformer-based deep neural network for landmark recognition; these keypoints enable PRAM to recognize hundreds of landmarks with high time and memory efficiency. Keypoints along with recognized landmark labels are further used for registration between query images and the 3D landmark map. Different from previous hierarchical methods, PRAM discards global and local descriptors, and reduces over 90% storage. Since PRAM utilizes recognition and landmark-wise verification to replace global reference search and exhaustive matching respectively, it runs 2.4 times faster than prior state-of-the-art approaches. Moreover, PRAM opens new directions for visual localization including multi-modality localization, map-centric feature learning, and hierarchical scene coordinate regression.

Full paper PDF: Place Recognition Anywhere Model for Efficient Visual Localization.
Authors: Fei Xue, Ignas Budvytis, Roberto Cipolla
Website: PRAM for videos, slides, recent updates, and datasets.

Key Features

1. Self-supervised landmark definition on 3D space

No need of segmentations on images
No inconsistent semantic results from multi-view images
No limitation to labels of only known objects
Work in any places with known or unknown objects
Landmark-wise 3D map sparsification

2. Efficient landmark-wise coarse and fine localization

Recognize landmarks as opposed to do global retrieval
Local landmark-wise matching as opposed to exhaustive matching
No global descriptors (e.g. NetVLAD)
No reference images and their heavy repetative 2D keypoints and descriptors
Automatic inlier/outlier idetification

4. Sparse recognition

Sparse SFD2 keypoints as tokens
No uncertainties of points at boundaries
Flexible to accept multi-modality inputs

5. Relocalization and temporal localization

Per frame reclocalization from scratch
Tracking previous frames for higher efficiency

6. One model one dataset

All 7 subscenes in 7Scenes dataset share a model
All 12 subscenes in 12Scenes dataset share a model
All 5 subscenes in CambridgeLandmarks share a model

7. Robust to long-term changes

Open problems

Adaptive number landmarks determination
Using SAM + open vocabulary to generate semantic map
Multi-modality localization with other tokenized signals (e.g. text, language, GPS, Magonemeter)
More effective solutions to 3D sparsification

Preparation

Download the 7Scenes, 12Scenes, CambridgeLandmarks, and Aachen datasets (remove redundant depth images otherwise they will be found in the sfm process)
Environments

2.1 Create a virtual environment

conda env create -f environment.yml
(do not activate pram before pangolin is installed)

2.2 Compile Pangolin for the installed python

git clone --recursive https://github.com/stevenlovegrove/Pangolin.git
cd Pangolin
git checkout v0.8

# Install dependencies
./scripts/install_prerequisites.sh recommended

# Compile with your python
cmake -DPython_EXECUTABLE=/your path to/anaconda3/envs/pram/bin/python3  -B build
cmake --build build -t pypangolin_pip_install

conda activate pram

Run the localization with online visualization

Download the 3D-models, pretrained models , and landmarks
Put pretrained models in weights directory
Run the demo (e.g. 7Scenes)

python3 inference.py  --config configs/config_train_7scenes_sfd2.yaml --rec_weight_path weights/7scenes_nc113_birch_segnetvit.199.pth  --landmark_path /your path to/landmarks --online

Train the recognition model (e.g. for 7Scenes)

1. Do SfM with SFD2 including feature extraction (modify the dataset_dir, ref_sfm_dir, output_dir)

./sfm_scripts/reconstruct_7scenes.sh

This step will produce the SfM results together with the extracted keypoints

2. Generate 3D landmarks

python3 -m recognition.recmap --dataset 7Scenes --dataset_dir /your path to/7Scenes --sfm_dir /sfm_path/7Scenes --save_dir /save_path/landmakrs

This step will generate 3D landmarks, create virtual reference frame, and sparsify the 3D points for each landmark for all scenes in 7Scenes

3. Train the sparse recognition model (one model one dataset)

python3 train.py   --config configs/config_train_7scenes_sfd2.yaml

Remember to modify the paths in 'config_train_7scenes_sfd2.yaml'

Your own dataset

Run colmap or hloc to obtain the SfM results
Do reconstruction with SFD2 keypoints with the sfm from step as refernece sfm
Do 3D landmark generation, VRF, map sparsification etc (Add DatasetName.yaml to configs/datasets)
Train the recognition model
Do evaluation

Previous works can be found here

BibTeX Citation

If you use any ideas from the paper or code in this repo, please consider citing:

 @article{xue2024pram,
          author    = {Fei Xue and Ignas Budvytis and Roberto Cipolla},
          title     = {PRAM: Place Recognition Anywhere Model for Efficient Visual Localization},
          journal   = {arXiv preprint arXiv:2404.07785},
          year      = {2024}
 }

@inproceedings{xue2023sfd2,
  author    = {Fei Xue and Ignas Budvytis and Roberto Cipolla},
  title     = {SFD2: Semantic-guided Feature Detection and Description},
  booktitle = {CVPR},
  year      = {2023}
}

@inproceedings{xue2022imp,
  author    = {Fei Xue and Ignas Budvytis and Roberto Cipolla},
  title     = {IMP: Iterative Matching and Pose Estimation with Adaptive Pooling},
  booktitle = {CVPR},
  year      = {2023}
}

@inproceedings{xue2022efficient,
  author    = {Fei Xue and Ignas Budvytis and Daniel Olmeda Reino and Roberto Cipolla},
  title     = {Efficient Large-scale Localization by Global Instance Recognition},
  booktitle = {CVPR},
  year      = {2022}
}

Acknowledgements

Part of the code is from previous excellent works including , SuperGlue and hloc. You can find more details from their released repositories if you are interested in their works.