Skip to main content
Erschienen in: International Journal of Computer Vision 2/2021

07.10.2020

Image Matching Across Wide Baselines: From Paper to Practice

verfasst von: Yuhe Jin, Dmytro Mishkin, Anastasiia Mishchuk, Jiri Matas, Pascal Fua, Kwang Moo Yi, Eduard Trulls

Erschienen in: International Journal of Computer Vision | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We introduce a comprehensive benchmark for local features and robust estimation algorithms, focusing on the downstream task—the accuracy of the reconstructed camera pose—as our primary metric. Our pipeline’s modular structure allows easy integration, configuration, and combination of different methods and heuristics. This is demonstrated by embedding dozens of popular algorithms and evaluating them, from seminal works to the cutting edge of machine learning research. We show that with proper settings, classical solutions may still outperform the perceived state of the art. Besides establishing the actual state of the art, the conducted experiments reveal unexpected properties of structure from motion pipelines that can help improve their performance, for both algorithmic and learned methods. Data and code are online (https://​github.​com/​ubc-vision/​image-matching-benchmark), providing an easy-to-use and flexible framework for the benchmarking of local features and robust estimation methods, both alongside and against top-performing methods. This work provides a basis for the Image Matching Challenge (https://​image-matching-challenge.​github.​io).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In (Barroso-Laguna et al. 2019) the models are converted to TensorFlow—we use the original PyTorch version.
 
5
Time measured on ‘1-standard-2’ VMs on Google Cloud Compute: 2 vCPUs with 7.5 GB of RAM and no GPU.
 
Literatur
Zurück zum Zitat Aanaes, H., Dahl, A. L., & Steenstrup-Pedersen, K. (2012). Interesting interest points. International Journal of Computer Vision, 97, 18–35.CrossRef Aanaes, H., Dahl, A. L., & Steenstrup-Pedersen, K. (2012). Interesting interest points. International Journal of Computer Vision, 97, 18–35.CrossRef
Zurück zum Zitat Aanaes, H., & Kahl, F. (2002). Estimation of deformable structure and motion. In Vision and modelling of dynamic scenes workshop. Aanaes, H., & Kahl, F. (2002). Estimation of deformable structure and motion. In Vision and modelling of dynamic scenes workshop.
Zurück zum Zitat Agarwal, S., Snavely, N., Simon, I., Seitz, S., & Szeliski, R. (2009). Building Rome in one day. In International conference on computer vision. Agarwal, S., Snavely, N., Simon, I., Seitz, S., & Szeliski, R. (2009). Building Rome in one day. In International conference on computer vision.
Zurück zum Zitat Alahi, A., Ortiz, R., & Vandergheynst, P. (2012). FREAK: Fast retina keypoint. In Conference on computer vision and pattern recognition. Alahi, A., Ortiz, R., & Vandergheynst, P. (2012). FREAK: Fast retina keypoint. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Alcantarilla, P. F., Nuevo, J., & Bartoli, A. (2013). Fast explicit diffusion for accelerated features in nonlinear scale spaces. In British machine vision conference. Alcantarilla, P. F., Nuevo, J., & Bartoli, A. (2013). Fast explicit diffusion for accelerated features in nonlinear scale spaces. In British machine vision conference.
Zurück zum Zitat Aldana-Iuit, J., Mishkin, D., Chum, O., & Matas, J. (2019). Saddle: Fast and repeatable features with good coverage. Image and Vision Computing, 97, 3807.CrossRef Aldana-Iuit, J., Mishkin, D., Chum, O., & Matas, J. (2019). Saddle: Fast and repeatable features with good coverage. Image and Vision Computing, 97, 3807.CrossRef
Zurück zum Zitat Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In Conference on computer vision and pattern recognition. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Arandjelovic, R. & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In Conference on computer vision and pattern recognition. Arandjelovic, R. & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Balntas, V., Lenc, K., Vedaldi, A., & Mikolajczyk, K. (2017). HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Conference on computer vision and pattern recognition. Balntas, V., Lenc, K., Vedaldi, A., & Mikolajczyk, K. (2017). HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Balntas, V., Li, S., & Prisacariu, V. (September 2018). RelocNet: continuous metric learning relocalisation using neural nets. In European conference on computer vision. Balntas, V., Li, S., & Prisacariu, V. (September 2018). RelocNet: continuous metric learning relocalisation using neural nets. In European conference on computer vision.
Zurück zum Zitat Balntas, V., Riba, E., Ponsa, D., & Mikolajczyk, K. (2016). Learning local feature descriptors with triplets and shallow convolutional neural networks. In British machine vision conference. Balntas, V., Riba, E., Ponsa, D., & Mikolajczyk, K. (2016). Learning local feature descriptors with triplets and shallow convolutional neural networks. In British machine vision conference.
Zurück zum Zitat Barath, D. & Matas, J. (June 2018). Graph-cut RANSAC. In Conference on computer vision and pattern recognition. Barath, D. & Matas, J. (June 2018). Graph-cut RANSAC. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Barath, D., Matas, J., & Noskova, J. (2019). MAGSAC: Marginalizing sample consensus. In Conference on computer vision and pattern recognition. Barath, D., Matas, J., & Noskova, J. (2019). MAGSAC: Marginalizing sample consensus. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Barroso-Laguna, A., Riba, E., Ponsa, D., & Mikolajczyk, K. (2019). Key.Net: Keypoint detection by handcrafted and learned CNN filters. In International conference on computer vision. Barroso-Laguna, A., Riba, E., Ponsa, D., & Mikolajczyk, K. (2019). Key.Net: Keypoint detection by handcrafted and learned CNN filters. In International conference on computer vision.
Zurück zum Zitat Baumberg, A. (2000). Reliable feature matching across widely separated views. In Conference on computer vision and pattern recognition. Baumberg, A. (2000). Reliable feature matching across widely separated views. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: Speeded up robust features. In European conference on computer vision. Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: Speeded up robust features. In European conference on computer vision.
Zurück zum Zitat Beaudet, P. R. (Nov. 1978). Rotationally invariant image operators. In Proceedings of the 4th international joint conference on pattern recognition (pp. 579–583). Kyoto. Beaudet, P. R. (Nov. 1978). Rotationally invariant image operators. In Proceedings of the 4th international joint conference on pattern recognition (pp. 579–583). Kyoto.
Zurück zum Zitat Bellavia, F., & Colombo, C. (2020). Is there anything new to say about sift matching? International Journal of Computer Vision, 2020, 1–20. Bellavia, F., & Colombo, C. (2020). Is there anything new to say about sift matching? International Journal of Computer Vision, 2020, 1–20.
Zurück zum Zitat Bian, J.-W., Wu, Y.-H., Zhao, J., Liu, Y., Zhang, L., Cheng, M.-M., & Reid, I. (2019). An evaluation of feature matchers for fundamental matrix estimation. In British machine vision conference. Bian, J.-W., Wu, Y.-H., Zhao, J., Liu, Y., Zhang, L., Cheng, M.-M., & Reid, I. (2019). An evaluation of feature matchers for fundamental matrix estimation. In British machine vision conference.
Zurück zum Zitat Brachmann, E., & Rother, C. (2019). Neural-guided RANSAC: learning where to sample model hypotheses. In International conference on computer vision. Brachmann, E., & Rother, C. (2019). Neural-guided RANSAC: learning where to sample model hypotheses. In International conference on computer vision.
Zurück zum Zitat Bradski, G. (2000). The OpenCV library. Dr. Dobb’s Journal of Software Tools, 120, 122–125. Bradski, G. (2000). The OpenCV library. Dr. Dobb’s Journal of Software Tools, 120, 122–125.
Zurück zum Zitat Brown, M., Hua, G., & Winder, S. (2011). Discriminative learning of local image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 43–57.CrossRef Brown, M., Hua, G., & Winder, S. (2011). Discriminative learning of local image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 43–57.CrossRef
Zurück zum Zitat Brown, M., & Lowe, D. (2007). Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 74, 59–73.CrossRef Brown, M., & Lowe, D. (2007). Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 74, 59–73.CrossRef
Zurück zum Zitat Bui, M., Baur, C., Navab, N., Ilic, S., & Albarqouni, S. (October 2019). Adversarial networks for camera pose regression and refinement. In International conference on computer vision. Bui, M., Baur, C., Navab, N., Ilic, S., & Albarqouni, S. (October 2019). Adversarial networks for camera pose regression and refinement. In International conference on computer vision.
Zurück zum Zitat Chum, O., & Matas, J. (June 2005). Matching with PROSAC—progressive sample consensus. In Conference on computer vision and pattern recognition. Chum, O., & Matas, J. (June 2005). Matching with PROSAC—progressive sample consensus. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Chum, O., Matas, J., & Kittler, J. (2003). Locally optimized RANSAC. In Pattern recognition. Chum, O., Matas, J., & Kittler, J. (2003). Locally optimized RANSAC. In Pattern recognition.
Zurück zum Zitat Chum, O., Werner, T., & Matas, J. (2005). Two-view geometry estimation unaffected by a dominant plane. In Conference on computer vision and pattern recognition. Chum, O., Werner, T., & Matas, J. (2005). Two-view geometry estimation unaffected by a dominant plane. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Cui, H., Gao, X., Shen, S., & Hu, Z. (July 2017). Hsfm: Hybrid structure-from-motion. In CVPR. Cui, H., Gao, X., Shen, S., & Hu, Z. (July 2017). Hsfm: Hybrid structure-from-motion. In CVPR.
Zurück zum Zitat Dang, Z., Yi, K. M., Hu, Y., Wang, F., Fua, P., & Salzmann, M. (2018). Eigendecomposition-free training of deep networks with zero eigenvalue-based losses. In European conference on computer vision. Dang, Z., Yi, K. M., Hu, Y., Wang, F., Fua, P., & Salzmann, M. (2018). Eigendecomposition-free training of deep networks with zero eigenvalue-based losses. In European conference on computer vision.
Zurück zum Zitat Detone, D., Malisiewicz, T., & Rabinovich, A. (2018). Superpoint: Self-supervised interest point detection and description. CVPR workshop on deep learning for visual SLAM. Detone, D., Malisiewicz, T., & Rabinovich, A. (2018). Superpoint: Self-supervised interest point detection and description. CVPR workshop on deep learning for visual SLAM.
Zurück zum Zitat Dong, J., Karianakis, N., Davis, D., Hernandez, J., Balzer, J., & Soatto, S. (June 2015). Multi-view feature engineering and learning. In Conference on computer vision and pattern recognition. Dong, J., Karianakis, N., Davis, D., Hernandez, J., Balzer, J., & Soatto, S. (June 2015). Multi-view feature engineering and learning. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Dong, J. & Soatto, S. (2015). Domain-size pooling in local descriptors: DSP-SIFT. In Conference on computer vision and pattern recognition. Dong, J. & Soatto, S. (2015). Domain-size pooling in local descriptors: DSP-SIFT. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., & Sattler, T. (2019). D2-Net: A trainable CNN for joint detection and description of local features. In Conference on computer vision and pattern recognition. Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., & Sattler, T. (2019). D2-Net: A trainable CNN for joint detection and description of local features. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Ebel, P., Mishchuk, A., Yi, K. M., Fua, P., & Trulls, E. (2019). Beyond Cartesian representations for local descriptors. In International conference on computer vision. Ebel, P., Mishchuk, A., Yi, K. M., Fua, P., & Trulls, E. (2019). Beyond Cartesian representations for local descriptors. In International conference on computer vision.
Zurück zum Zitat Fischler, M., & Bolles, R. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.MathSciNetCrossRef Fischler, M., & Bolles, R. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.MathSciNetCrossRef
Zurück zum Zitat Gay, P., Bansal, V., Rubino, C., & Bue, A. D. (2017). Probabilistic structure from motion with objects (PSfMO). In International conference on computer vision. Gay, P., Bansal, V., Rubino, C., & Bue, A. D. (2017). Probabilistic structure from motion with objects (PSfMO). In International conference on computer vision.
Zurück zum Zitat Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In Conference on computer vision and pattern recognition. Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Hartley, R. (1997). In defense of the eight-point algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6), 580–593.CrossRef Hartley, R. (1997). In defense of the eight-point algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6), 580–593.CrossRef
Zurück zum Zitat Hartley, R., & Zisserman, A. (2000). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.MATH Hartley, R., & Zisserman, A. (2000). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.MATH
Zurück zum Zitat Hartley, R. I. (1994). Projective reconstruction and invariants from multiple images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(10), 1036–1041.CrossRef Hartley, R. I. (1994). Projective reconstruction and invariants from multiple images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(10), 1036–1041.CrossRef
Zurück zum Zitat He, K., Lu, Y., & Sclaroff, S. (2018). Local descriptors optimized for average precision. In Conference on computer vision and pattern recognition. He, K., Lu, Y., & Sclaroff, S. (2018). Local descriptors optimized for average precision. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Heinly, J., Schoenberger, J., Dunn, E., & Frahm, J.-M. (2015). Reconstructing the world in six days. In Conference on computer vision and pattern recognition. Heinly, J., Schoenberger, J., Dunn, E., & Frahm, J.-M. (2015). Reconstructing the world in six days. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Jacobs, N., Roman, N., & Pless, R. (2007). Consistent temporal variations in many outdoor scenes. In Conference on computer vision and pattern recognition. Jacobs, N., Roman, N., & Pless, R. (2007). Consistent temporal variations in many outdoor scenes. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Kendall, A., Grimes, M., & Cipolla, R. (2015). Posenet: A convolutional network for real-time 6-DOF camera relocalization. In International conference on computer vision. Kendall, A., Grimes, M., & Cipolla, R. (2015). Posenet: A convolutional network for real-time 6-DOF camera relocalization. In International conference on computer vision.
Zurück zum Zitat Krishna Murthy, J., Iyer, G., & Paull, L. (2019). gradSLAM: Dense SLAM meets automatic differentiation. Krishna Murthy, J., Iyer, G., & Paull, L. (2019). gradSLAM: Dense SLAM meets automatic differentiation.
Zurück zum Zitat Leutenegger, S., Chli, M., & Siegwart, R. Y. (2011). Brisk: Binary robust invariant scalable keypoints. In International conference on computer vision. Leutenegger, S., Chli, M., & Siegwart, R. Y. (2011). Brisk: Binary robust invariant scalable keypoints. In International conference on computer vision.
Zurück zum Zitat Li, Z., & Snavely, N. (2018). MegaDepth: Learning single-view depth prediction from internet photos. In Conference on computer vision and pattern recognition. Li, Z., & Snavely, N. (2018). MegaDepth: Learning single-view depth prediction from internet photos. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 20(2), 91–110.CrossRef Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 20(2), 91–110.CrossRef
Zurück zum Zitat Luo, Z., Shen, T., Zhou, L., Zhang, J., Yao, Y., Li, S., Fang, T., & Quan, L. (2019). ContextDesc: Local descriptor augmentation with cross-modality context. In Conference on computer vision and pattern recognition. Luo, Z., Shen, T., Zhou, L., Zhang, J., Yao, Y., Li, S., Fang, T., & Quan, L. (2019). ContextDesc: Local descriptor augmentation with cross-modality context. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Luo, Z., Shen, T., Zhou, L., Zhu, S., Zhang, R., Yao, Y., Fang, T., & Quan, L. (2018). Geodesc: Learning local descriptors by integrating geometry constraints. In European conference on computer vision. Luo, Z., Shen, T., Zhou, L., Zhu, S., Zhang, R., Yao, Y., Fang, T., & Quan, L. (2018). Geodesc: Learning local descriptors by integrating geometry constraints. In European conference on computer vision.
Zurück zum Zitat Lynen, S., Zeisl, B., Aiger, D., Bosse, M., Hesch, J., Pollefeys, M., Siegwart, R., & Sattler, T. (2019). Large-scale, real-time visual-inertial localization revisited. Preprint. Lynen, S., Zeisl, B., Aiger, D., Bosse, M., Hesch, J., Pollefeys, M., Siegwart, R., & Sattler, T. (2019). Large-scale, real-time visual-inertial localization revisited. Preprint.
Zurück zum Zitat Maddern, W., Pascoe, G., Linegar, C., & Newman, P. (2017). 1 year, 1000 km: The Oxford RobotCar dataset. International Journal of Robotics Research, 36(1), 3–15.CrossRef Maddern, W., Pascoe, G., Linegar, C., & Newman, P. (2017). 1 year, 1000 km: The Oxford RobotCar dataset. International Journal of Robotics Research, 36(1), 3–15.CrossRef
Zurück zum Zitat Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767.CrossRef Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767.CrossRef
Zurück zum Zitat Mikolajczyk, K., & Schmid, C. (2004). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.CrossRef Mikolajczyk, K., & Schmid, C. (2004). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.CrossRef
Zurück zum Zitat Mikolajczyk, K., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In European conference on computer vision. Mikolajczyk, K., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In European conference on computer vision.
Zurück zum Zitat Mishchuk, A., Mishkin, D., Radenovic, F., & Matas, J. (2017). Working hard to know your neighbor’s margins: Local descriptor learning loss. In Advances in neural information processing systems. Mishchuk, A., Mishkin, D., Radenovic, F., & Matas, J. (2017). Working hard to know your neighbor’s margins: Local descriptor learning loss. In Advances in neural information processing systems.
Zurück zum Zitat Mishkin, D., Matas, J., & Perdoch, M. (2015). MODS: Fast and robust method for two-view matching. Computer Vision and Image Understanding, 141, 81–93.CrossRef Mishkin, D., Matas, J., & Perdoch, M. (2015). MODS: Fast and robust method for two-view matching. Computer Vision and Image Understanding, 141, 81–93.CrossRef
Zurück zum Zitat Mishkin, D., Radenovic, F., & Matas, J. (2018). Repeatability is not enough: Learning affine regions via discriminability. In European conference on computer vision. Mishkin, D., Radenovic, F., & Matas, J. (2018). Repeatability is not enough: Learning affine regions via discriminability. In European conference on computer vision.
Zurück zum Zitat Muja, M. & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In International conference on computer vision. Muja, M. & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In International conference on computer vision.
Zurück zum Zitat Mukundan, A., Tolias, G., & Chum, O. (2019). Explicit spatial encoding for deep local descriptors. In Conference on computer vision and pattern recognition. Mukundan, A., Tolias, G., & Chum, O. (2019). Explicit spatial encoding for deep local descriptors. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Mur-Artal, R., Montiel, J., & Tardós, J. (2015). ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 31(5), 1147–1163.CrossRef Mur-Artal, R., Montiel, J., & Tardós, J. (2015). ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 31(5), 1147–1163.CrossRef
Zurück zum Zitat Nister, D. (June 2003). An efficient solution to the five-point relative pose problem. In Conference on computer vision and pattern recognition. Nister, D. (June 2003). An efficient solution to the five-point relative pose problem. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Noh, H., Araujo, A., Sim, J., & nd Bohyung Han, T. W. (2017). Large-scale image retrieval with attentive deep local features. In International conference on computer vision. Noh, H., Araujo, A., Sim, J., & nd Bohyung Han, T. W. (2017). Large-scale image retrieval with attentive deep local features. In International conference on computer vision.
Zurück zum Zitat Ono, Y., Trulls, E., Fua, P., & Yi, K. M. (2018). LF-Net: Learning local features from images. In Advances in neural information processing systems. Ono, Y., Trulls, E., Fua, P., & Yi, K. M. (2018). LF-Net: Learning local features from images. In Advances in neural information processing systems.
Zurück zum Zitat Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.MathSciNetMATH
Zurück zum Zitat Pizer, S. M., Amburn, E. P., Austin, J. D., Cromartie, R., Geselowitz, A., Greer, T., ter Haar Romeny, B., Zimmerman, J. B., & Zuiderveld, K. (1987). Adaptive histogram equalization and its variations. In Computer vision, graphics, and image processing. Pizer, S. M., Amburn, E. P., Austin, J. D., Cromartie, R., Geselowitz, A., Greer, T., ter Haar Romeny, B., Zimmerman, J. B., & Zuiderveld, K. (1987). Adaptive histogram equalization and its variations. In Computer vision, graphics, and image processing.
Zurück zum Zitat Pritchett, P., & Zisserman, A. (1998). Wide baseline stereo matching. In ICCV (pp. 754–760). Pritchett, P., & Zisserman, A. (1998). Wide baseline stereo matching. In ICCV (pp. 754–760).
Zurück zum Zitat Pultar, M., Mishkin, D., & Matas, J. (2019). Leveraging outdoor webcams for local descriptor learning. In Computer vision winter workshop. Pultar, M., Mishkin, D., & Matas, J. (2019). Leveraging outdoor webcams for local descriptor learning. In Computer vision winter workshop.
Zurück zum Zitat Qi, C., Su, H., Mo, K., & Guibas, L. (2017). Pointnet: Deep learning on point sets for 3D classification and segmentation. In Conference on computer vision and pattern recognition. Qi, C., Su, H., Mo, K., & Guibas, L. (2017). Pointnet: Deep learning on point sets for 3D classification and segmentation. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Radenovic, F., Tolias, G., & Chum, O. (2016). CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In European conference on computer vision. Radenovic, F., Tolias, G., & Chum, O. (2016). CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In European conference on computer vision.
Zurück zum Zitat Ranftl, R. & Koltun, V. (2018). Deep fundamental matrix estimation. In European conference on computer vision. Ranftl, R. & Koltun, V. (2018). Deep fundamental matrix estimation. In European conference on computer vision.
Zurück zum Zitat Revaud, J., Weinzaepfel, P., De Souza, C., Pion, N., Csurka, G., Cabon, Y., & Humenberger, M. (2019). R2D2: Repeatable and reliable detector and descriptor. Preprint. Revaud, J., Weinzaepfel, P., De Souza, C., Pion, N., Csurka, G., Cabon, Y., & Humenberger, M. (2019). R2D2: Repeatable and reliable detector and descriptor. Preprint.
Zurück zum Zitat Revaud, J., Weinzaepfel, P., de Souza, C. R., Pion, N., Csurka, G., Cabon, Y., & Humenberger, M. (2019). R2D2: Repeatable and reliable detector and descriptor. In Advances in neural information processing systems. Revaud, J., Weinzaepfel, P., de Souza, C. R., Pion, N., Csurka, G., Cabon, Y., & Humenberger, M. (2019). R2D2: Repeatable and reliable detector and descriptor. In Advances in neural information processing systems.
Zurück zum Zitat Rosten, E., Porter, R., & Drummond, T. (2010). Faster and better: A machine learning approach to corner detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 105–119.CrossRef Rosten, E., Porter, R., & Drummond, T. (2010). Faster and better: A machine learning approach to corner detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 105–119.CrossRef
Zurück zum Zitat Rublee, E., Rabaud, V., Konolidge, K., & Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In International conference on computer vision. Rublee, E., Rabaud, V., Konolidge, K., & Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In International conference on computer vision.
Zurück zum Zitat Sarlin, P., DeTone, D., Malisiewicz, T., & Rabinovich, A. (2020). Superglue: Learning feature matching with graph neural networks. In Conference on computer vision and pattern recognition. Sarlin, P., DeTone, D., Malisiewicz, T., & Rabinovich, A. (2020). Superglue: Learning feature matching with graph neural networks. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Sattler, T., Leibe, B., & Kobbelt, L. (2012). Improving image-based localization by active correspondence search. In European conference on computer vision. Sattler, T., Leibe, B., & Kobbelt, L. (2012). Improving image-based localization by active correspondence search. In European conference on computer vision.
Zurück zum Zitat Sattler, T., Maddern, W., Toft, C., Torii, A., Hammarstrand, L., Stenborg, E., Safari, D., Okutomi, M., Pollefeys, M., Sivic, J., Kahl, F., & Pajdla, T. (2018). Benchmarking 6DOF outdoor visual localization in changing conditions. In Conference on computer vision and pattern recognition. Sattler, T., Maddern, W., Toft, C., Torii, A., Hammarstrand, L., Stenborg, E., Safari, D., Okutomi, M., Pollefeys, M., Sivic, J., Kahl, F., & Pajdla, T. (2018). Benchmarking 6DOF outdoor visual localization in changing conditions. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Sattler, T., Weyand, T., Leibe, B., & Kobbelt, L. (2012). Image retrieval for image-based localization revisited. In British machine vision conference. Sattler, T., Weyand, T., Leibe, B., & Kobbelt, L. (2012). Image retrieval for image-based localization revisited. In British machine vision conference.
Zurück zum Zitat Sattler, T., Zhou, Q., Pollefeys, M., & Leal-Taixe, L. (2019). Understanding the limitations of CNN-based absolute camera pose regression. In Conference on computer vision and pattern recognition. Sattler, T., Zhou, Q., Pollefeys, M., & Leal-Taixe, L. (2019). Understanding the limitations of CNN-based absolute camera pose regression. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Savinov, N., Seki, A., Ladicky, L., Sattler, T., & Pollefeys, M. (2017). Quad-networks: Unsupervised learning to rank for interest point detection. Conference on computer vision and pattern recognition. Savinov, N., Seki, A., Ladicky, L., Sattler, T., & Pollefeys, M. (2017). Quad-networks: Unsupervised learning to rank for interest point detection. Conference on computer vision and pattern recognition.
Zurück zum Zitat Schönberger, J., & Frahm, J. (2016). Structure-from-motion revisited. In Conference on computer vision and pattern recognition. Schönberger, J., & Frahm, J. (2016). Structure-from-motion revisited. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Schönberger, J., Hardmeier, H., Sattler, T., & Pollefeys, M. (2017). Comparative evaluation of hand-crafted and learned local features. In Conference on computer vision and pattern recognition. Schönberger, J., Hardmeier, H., Sattler, T., & Pollefeys, M. (2017). Comparative evaluation of hand-crafted and learned local features. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Schönberger, J., Zheng, E., Pollefeys, M., & Frahm, J. (2016). Pixelwise view selection for unstructured multi-view stereo. In European conference on computer vision. Schönberger, J., Zheng, E., Pollefeys, M., & Frahm, J. (2016). Pixelwise view selection for unstructured multi-view stereo. In European conference on computer vision.
Zurück zum Zitat Shi, Y., Zhu, J., Fang, Y., Lien, K., & Gu, J. (2019). Self-supervised learning of depth and ego-motion with differentiable bundle adjustment. Preprint. Shi, Y., Zhu, J., Fang, Y., Lien, K., & Gu, J. (2019). Self-supervised learning of depth and ego-motion with differentiable bundle adjustment. Preprint.
Zurück zum Zitat Simo-serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., & Moreno-Noguer, F. (2015). Discriminative learning of deep convolutional feature point descriptors. In International conference on computer vision. Simo-serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., & Moreno-Noguer, F. (2015). Discriminative learning of deep convolutional feature point descriptors. In International conference on computer vision.
Zurück zum Zitat Strecha, C., Hansen, W., Van Gool, L., Fua, P., & Thoennessen, U. (2008). On benchmarking camera calibration and multi-view stereo for high resolution imagery. In Conference on computer vision and pattern recognition. Strecha, C., Hansen, W., Van Gool, L., Fua, P., & Thoennessen, U. (2008). On benchmarking camera calibration and multi-view stereo for high resolution imagery. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. In International conference on intelligent robots and systems. Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. In International conference on intelligent robots and systems.
Zurück zum Zitat Sun, W., Jiang, W., Trulls, E., Tagliasacchi, A., & Yi, K. M. (2020). ACNe: Attentive context normalization for robust permutation-equivariant learning. In Conference on computer vision and pattern recognition. Sun, W., Jiang, W., Trulls, E., Tagliasacchi, A., & Yi, K. M. (2020). ACNe: Attentive context normalization for robust permutation-equivariant learning. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Taira, H., Okutomi, M., Sattler, T., Cimpoi, M., Pollefeys, M., Sivic, J., et al. (2019). InLoc: indoor visual localization with dense matching and view synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1744–1756. Taira, H., Okutomi, M., Sattler, T., Cimpoi, M., Pollefeys, M., Sivic, J., et al. (2019). InLoc: indoor visual localization with dense matching and view synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1744–1756.
Zurück zum Zitat Tang, C., & Tan, P. (2019). Ba-Net: dense bundle adjustment network. In International conference on learning representations. Tang, C., & Tan, P. (2019). Ba-Net: dense bundle adjustment network. In International conference on learning representations.
Zurück zum Zitat Tateno, K., Tombari, F., Laina, I., & Navab, N. (July 2017). CNN-SLAM: Real-time dense monocular slam with learned depth prediction. In CVPR. Tateno, K., Tombari, F., Laina, I., & Navab, N. (July 2017). CNN-SLAM: Real-time dense monocular slam with learned depth prediction. In CVPR.
Zurück zum Zitat Thomee, B., Shamma, D., Friedland, G., Elizalde, B., Ni, K., Poland, D., et al. (2016). YFCC100M: the new data in multimedia research. Communications of the ACM, 59, 64–73.CrossRef Thomee, B., Shamma, D., Friedland, G., Elizalde, B., Ni, K., Poland, D., et al. (2016). YFCC100M: the new data in multimedia research. Communications of the ACM, 59, 64–73.CrossRef
Zurück zum Zitat Tian, Y., Fan, B., & Wu, F. (2017). L2-Net: Deep learning of discriminative patch descriptor in Euclidean space. In Conference on computer vision and pattern recognition. Tian, Y., Fan, B., & Wu, F. (2017). L2-Net: Deep learning of discriminative patch descriptor in Euclidean space. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., & Balntas, V. (2019). SOSNet: Second order similarity regularization for local descriptor learning. In Conference on computer vision and pattern recognition. Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., & Balntas, V. (2019). SOSNet: Second order similarity regularization for local descriptor learning. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Tolias, G., Avrithis, Y., & Jégou, H. (2016). Image search with selective match kernels: Aggregation across single and multiple images. IJCV, 116(3), 247–261.MathSciNetCrossRef Tolias, G., Avrithis, Y., & Jégou, H. (2016). Image search with selective match kernels: Aggregation across single and multiple images. IJCV, 116(3), 247–261.MathSciNetCrossRef
Zurück zum Zitat Torr, P., & Zisserman, A. (2000). MLESAC: A new robust estimator with application to estimating image geometry. Computer Vision and Image Understanding, 78, 138–156.CrossRef Torr, P., & Zisserman, A. (2000). MLESAC: A new robust estimator with application to estimating image geometry. Computer Vision and Image Understanding, 78, 138–156.CrossRef
Zurück zum Zitat Triggs, B., Mclauchlan, P., Hartley, R., & Fitzgibbon, A. (2000). Bundle adjustment—A modern synthesis. In Vision algorithms: Theory and practice (pp. 298–372). Triggs, B., Mclauchlan, P., Hartley, R., & Fitzgibbon, A. (2000). Bundle adjustment—A modern synthesis. In Vision algorithms: Theory and practice (pp. 298–372).
Zurück zum Zitat Vedaldi, A., & Fulkerson, B. (2010). Vlfeat: An open and portable library of computer vision algorithms. In Proceedings of the 18th ACM international conference on multimedia, MM’10 (pp. 1469–1472). Vedaldi, A., & Fulkerson, B. (2010). Vlfeat: An open and portable library of computer vision algorithms. In Proceedings of the 18th ACM international conference on multimedia, MM’10 (pp. 1469–1472).
Zurück zum Zitat Verdie, Y., Yi, K. M., Fua, P., & Lepetit, V. (2015). TILDE: A temporally invariant learned detector. In Conference on computer vision and pattern recognition. Verdie, Y., Yi, K. M., Fua, P., & Lepetit, V. (2015). TILDE: A temporally invariant learned detector. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., & Fragkiadaki, K. (2017). SFM-Net: Learning of structure and motion from video. Preprint. Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., & Fragkiadaki, K. (2017). SFM-Net: Learning of structure and motion from video. Preprint.
Zurück zum Zitat Wei, X., Zhang, Y., Gong, Y., & Zheng, N. (2018). Kernelized subspace pooling for deep local descriptors. In Conference on computer vision and pattern recognition. Wei, X., Zhang, Y., Gong, Y., & Zheng, N. (2018). Kernelized subspace pooling for deep local descriptors. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Wei, X., Zhang, Y., Li, Z., Fu, Y., & Xue, X. (2020). DeepSFM: Structure from motion via deep bundle adjustment. In European conference on computer vision. Wei, X., Zhang, Y., Li, Z., Fu, Y., & Xue, X. (2020). DeepSFM: Structure from motion via deep bundle adjustment. In European conference on computer vision.
Zurück zum Zitat Wu, C. (2013). Towards linear-time incremental structure from motion. In 3DV. Wu, C. (2013). Towards linear-time incremental structure from motion. In 3DV.
Zurück zum Zitat Yi, K. M., Trulls, E., Lepetit, V., & Fua, P. (2016). LIFT: Learned invariant feature transform. In European conference on computer vision. Yi, K. M., Trulls, E., Lepetit, V., & Fua, P. (2016). LIFT: Learned invariant feature transform. In European conference on computer vision.
Zurück zum Zitat Yi, K. M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., & Fua, P. (2018). Learning to find good correspondences. In Conference on computer vision and pattern recognition. Yi, K. M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., & Fua, P. (2018). Learning to find good correspondences. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Yoo, A. B., Jette, M. A., & Grondona, M. (2003). SLURM: Simple Linux utility for resource management. In Workshop on job scheduling strategies for parallel processing (pp. 44–60). Berlin: Springer. Yoo, A. B., Jette, M. A., & Grondona, M. (2003). SLURM: Simple Linux utility for resource management. In Workshop on job scheduling strategies for parallel processing (pp. 44–60). Berlin: Springer.
Zurück zum Zitat Zagoruyko, S., & Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In Conference on computer vision and pattern recognition. Zagoruyko, S., & Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Zhang, J., Sun, D., Luo, Z., Yao, A., Zhou, L., Shen, T., et al. (2019). Learning two-view correspondences and geometry using order-aware network. International conference on computer vision. Zhang, J., Sun, D., Luo, Z., Yao, A., Zhou, L., Shen, T., et al. (2019). Learning two-view correspondences and geometry using order-aware network. International conference on computer vision.
Zurück zum Zitat Zhang, J., Sun, D., Luo, Z., Yao, A., Zhou, L., Shen, T., Chen, Y., Quan, L., & Liao, H. (2019). Learning two-view correspondences and geometry using order-aware network. In ICCV. Zhang, J., Sun, D., Luo, Z., Yao, A., Zhou, L., Shen, T., Chen, Y., Quan, L., & Liao, H. (2019). Learning two-view correspondences and geometry using order-aware network. In ICCV.
Zurück zum Zitat Zhang, X., Yu, F. X., Karaman, S., & Chang, S.-F. (July 2017). Learning discriminative and transformation covariant local feature detectors. In Conference on computer vision and pattern recognition. Zhang, X., Yu, F. X., Karaman, S., & Chang, S.-F. (July 2017). Learning discriminative and transformation covariant local feature detectors. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Zhao, C., Cao, Z., Li, C., Li, X., & Yang, J. (2019). NM-Net: Mining reliable neighbors for robust feature correspondences. In Conference on computer vision and pattern recognition. Zhao, C., Cao, Z., Li, C., Li, X., & Yang, J. (2019). NM-Net: Mining reliable neighbors for robust feature correspondences. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Zhou, Q., Sattler, T., Pollefeys, M., & Leal-Taixe, L. (2020). To learn or not to learn: Visual localization from essential matrices. In ICRA. Zhou, Q., Sattler, T., Pollefeys, M., & Leal-Taixe, L. (2020). To learn or not to learn: Visual localization from essential matrices. In ICRA.
Zurück zum Zitat Zhu, S., Zhang, R., Zhou, L., Shen, T., Fang, T., Tan, P., & Quan, L. (June 2018). Very large-scale global SFM by distributed motion averaging. In Conference on computer vision and pattern recognition. Zhu, S., Zhang, R., Zhou, L., Shen, T., Fang, T., Tan, P., & Quan, L. (June 2018). Very large-scale global SFM by distributed motion averaging. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Zitnick, C., & Ramnath, K. (2011). Edge foci interest points. In International conference on computer vision. Zitnick, C., & Ramnath, K. (2011). Edge foci interest points. In International conference on computer vision.
Metadaten
Titel
Image Matching Across Wide Baselines: From Paper to Practice
verfasst von
Yuhe Jin
Dmytro Mishkin
Anastasiia Mishchuk
Jiri Matas
Pascal Fua
Kwang Moo Yi
Eduard Trulls
Publikationsdatum
07.10.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 2/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01385-0

Weitere Artikel der Ausgabe 2/2021

International Journal of Computer Vision 2/2021 Zur Ausgabe