Skip to main content
Top
Published in: Autonomous Robots 6/2018

16-11-2017

BoCNF: efficient image matching with Bag of ConvNet features for scalable and robust visual place recognition

Authors: Yi Hou, Hong Zhang, Shilin Zhou

Published in: Autonomous Robots | Issue 6/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recent advances in visual place recognition (VPR) have exploited ConvNet features to improve the recognition accuracy under significant environmental and viewpoint changes. However, it remains unsolved how to implement efficient image matching with high dimensional ConvNet features. In this paper, we tackle the problem of matching efficiency using ConvNet features for VPR, where the task is to accurately and quickly recognize a given place in large-scale challenging environments. The paper makes two contributions. First, we propose an efficient solution to VPR, based on the well-known bag-of-words (BoW) framework, to speed up image matching with ConvNet features. Second, in order to alleviate the problem of perceptual aliasing in BoW, we adopt a coarse-to-fine approach where we first, in the coarse stage, search for the top-K candidate images via BoW and then, in the fine stage, identify the best match among the candidates using a hash-based voting scheme. We conduct extensive experiments on six popular VPR datasets to validate the effectiveness of our method. Experimental results show that, in terms of recognition accuracy, our method is comparable to linear search, and outperforms other methods such as FABMAP and SeqSLAM by a significant margin. In terms of efficiecy, our method achieves a significant speed-up over linear search, with an average matching time as low as 23.5 ms per query on a dataset with 21K images.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
Literature
go back to reference Arandjelovic, R., & Zisserman, A. (2013). All about VLAD. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1578–1585). Arandjelovic, R., & Zisserman, A. (2013). All about VLAD. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1578–1585).
go back to reference Babenko, A., & Lempitsky, V. (2015). Aggregating deep convolutional features for image retrieval. In IEEE international conference on computer vision (ICCV). Babenko, A., & Lempitsky, V. (2015). Aggregating deep convolutional features for image retrieval. In IEEE international conference on computer vision (ICCV).
go back to reference Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: Speeded up robust features. In European conference on computer vision (ECCV) (Vol. 3951, pp. 404–417). Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: Speeded up robust features. In European conference on computer vision (ECCV) (Vol. 3951, pp. 404–417).
go back to reference Chen, Z., Lam, O., Jacobson, A., & M. Milford (2014). Convolutional neural network-based place recognition. In Australasian conference on robotics and automation (ACRA) (pp. 2–4). Chen, Z., Lam, O., Jacobson, A., & M. Milford (2014). Convolutional neural network-based place recognition. In Australasian conference on robotics and automation (ACRA) (pp. 2–4).
go back to reference Cheng, M.-M., Zhang, Z., Lin, W.-Y., & Torr, P. (2014). BING: Binarized normed gradients for objectness estimation at 300fps. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3286–3293). Cheng, M.-M., Zhang, Z., Lin, W.-Y., & Torr, P. (2014). BING: Binarized normed gradients for objectness estimation at 300fps. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3286–3293).
go back to reference Cummins, M., & Newman, P. (2011). Appearance-only SLAM at large scale with FAB-MAP 2.0. The International Journal of Robotics Research, 30(9), 1100–1123.CrossRef Cummins, M., & Newman, P. (2011). Appearance-only SLAM at large scale with FAB-MAP 2.0. The International Journal of Robotics Research, 30(9), 1100–1123.CrossRef
go back to reference Dalal, N., & Triggs B. (2005). Histograms of oriented gradients for human detection. In International conference on computer vision and pattern recognition (CVPR) (pp. 886–893). Dalal, N., & Triggs B. (2005). Histograms of oriented gradients for human detection. In International conference on computer vision and pattern recognition (CVPR) (pp. 886–893).
go back to reference Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. In International conference on very large data bases, San Francisco, CA (pp. 518–529). Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. In International conference on very large data bases, San Francisco, CA (pp. 518–529).
go back to reference Glover, A., Maddern, W., Milford, M., & Wyeth, G. (2010). FAB-MAP + RatSLAM: appearance-based SLAM for multiple times of day. In IEEE international conference on robotics and automation (ICRA) (pp. 3507–3512). Glover, A., Maddern, W., Milford, M., & Wyeth, G. (2010). FAB-MAP + RatSLAM: appearance-based SLAM for multiple times of day. In IEEE international conference on robotics and automation (ICRA) (pp. 3507–3512).
go back to reference Glover, A., Maddern, W., Warren, M., Reid, S., Milford, M., & Wyeth, G. (2012). OpenFABMAP: An open source toolbox for appearance-based loop closure detection. In IEEE international conference on robotics and automation (ICRA) (pp. 4730–4735). Glover, A., Maddern, W., Warren, M., Reid, S., Milford, M., & Wyeth, G. (2012). OpenFABMAP: An open source toolbox for appearance-based loop closure detection. In IEEE international conference on robotics and automation (ICRA) (pp. 4730–4735).
go back to reference Hosang, J., Benenson, R., Dollár, P., & Schiele, B. (2016). What makes for effective detection proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4), 814–830.CrossRef Hosang, J., Benenson, R., Dollár, P., & Schiele, B. (2016). What makes for effective detection proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4), 814–830.CrossRef
go back to reference Hou, Y., Zhang, H., & Zhou, S. (2015). Convolutional neural network-based image representation for visual loop closure detection. In IEEE international conference on information and automation (ICIA) (pp. 2238–2245). Hou, Y., Zhang, H., & Zhou, S. (2015). Convolutional neural network-based image representation for visual loop closure detection. In IEEE international conference on information and automation (ICIA) (pp. 2238–2245).
go back to reference Hou, Y., Zhang, H., Zhou, S., & Zou H. (2017). Efficient ConvNet feature extraction with multiple RoI pooling for landmark-based visual localization of autonomous vehicles. In: Mobile information systems (Vol. 2017) (in press). Hou, Y., Zhang, H., Zhou, S., & Zou H. (2017). Efficient ConvNet feature extraction with multiple RoI pooling for landmark-based visual localization of autonomous vehicles. In: Mobile information systems (Vol. 2017) (in press).
go back to reference Jégou, H., Douze, M., & Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In European conference on computer vision (ECCV) (pp. 304–317). Jégou, H., Douze, M., & Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In European conference on computer vision (ECCV) (pp. 304–317).
go back to reference Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3304–3311). Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3304–3311).
go back to reference Kalantidis, Y., Mellina, C., & Osindero, S. (2015). Cross-dimensional weighting for aggregated deep convolutional features. In: European conference on computer vision (ECCV) (pp. 685–701). Kalantidis, Y., Mellina, C., & Osindero, S. (2015). Cross-dimensional weighting for aggregated deep convolutional features. In: European conference on computer vision (ECCV) (pp. 685–701).
go back to reference Kosecka, J., & Li, F. (2004). Vision based topological Markov localization. In IEEE international conference on robotics and automation (ICRA) (Vol. 2, pp. 1481–1486). Kosecka, J., & Li, F. (2004). Vision based topological Markov localization. In IEEE international conference on robotics and automation (ICRA) (Vol. 2, pp. 1481–1486).
go back to reference Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS) (pp. 1097–1105). Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS) (pp. 1097–1105).
go back to reference Li, F., & Kosecka, J. (2006). Probabilistic location recognition using reduced feature set. In IEEE international conference on robotics and automation (ICRA) (pp. 3405–3410). Li, F., & Kosecka, J. (2006). Probabilistic location recognition using reduced feature set. In IEEE international conference on robotics and automation (ICRA) (pp. 3405–3410).
go back to reference Liu, Y., & Zhang, H. (2012). Visual loop closure detection with a compact image descriptor. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1051–1056). Liu, Y., & Zhang, H. (2012). Visual loop closure detection with a compact image descriptor. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1051–1056).
go back to reference Liu, Y. & Zhang, H. (2013). Towards improving the efficiency of sequence-based SLAM. In IEEE international conference on mechatronics and automation (ICMA) (pp. 1261–1266). Liu, Y. & Zhang, H. (2013). Towards improving the efficiency of sequence-based SLAM. In IEEE international conference on mechatronics and automation (ICMA) (pp. 1261–1266).
go back to reference Liu, Y., Feng, R., & Zhang, H. (2015). Keypoint matching by outlier pruning with consensus constraint. In IEEE international conference on robotics and automation (ICRA) (pp. 5481–5486). Liu, Y., Feng, R., & Zhang, H. (2015). Keypoint matching by outlier pruning with consensus constraint. In IEEE international conference on robotics and automation (ICRA) (pp. 5481–5486).
go back to reference Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRef Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRef
go back to reference Lowry, S., Süenderhauf, N., Newman, P., Leonard, J., Cox, D., Corke, P., et al. (2016). Visual place recognition: A survey. IEEE Transactions on Robotics, 32(1), 1–19.CrossRef Lowry, S., Süenderhauf, N., Newman, P., Leonard, J., Cox, D., Corke, P., et al. (2016). Visual place recognition: A survey. IEEE Transactions on Robotics, 32(1), 1–19.CrossRef
go back to reference Milford, M. (2013). Vision-based place recognition: how low can you go? The International Journal of Robotics Research, 32(7), 766–789.CrossRef Milford, M. (2013). Vision-based place recognition: how low can you go? The International Journal of Robotics Research, 32(7), 766–789.CrossRef
go back to reference Milford, M., & Wyeth, G. (2012). SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In IEEE international conference on robotics and automation (ICRA) (pp. 1643–1649). Milford, M., & Wyeth, G. (2012). SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In IEEE international conference on robotics and automation (ICRA) (pp. 1643–1649).
go back to reference Naseer, T., Spinello, L., Burgard, W., & Stachniss, C. (2014). Robust visual robot localization across seasons using network flows. In The AAAI conference on artificial intelligence. Naseer, T., Spinello, L., Burgard, W., & Stachniss, C. (2014). Robust visual robot localization across seasons using network flows. In The AAAI conference on artificial intelligence.
go back to reference Neubert, P., & Protzel, P. (2015). Local region detector + CNN based landmarks for practical place recognition in changing environments. In European conference on mobile robots (ECMR) (pp. 1–6). Neubert, P., & Protzel, P. (2015). Local region detector + CNN based landmarks for practical place recognition in changing environments. In European conference on mobile robots (ECMR) (pp. 1–6).
go back to reference Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.CrossRefMATH Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.CrossRefMATH
go back to reference Pepperell, E., Corke, P., & Milford, M. (2014). All-environment visual place recognition with SMART. In IEEE international conference on robotics and automation (ICRA) (pp. 1612–1618). Pepperell, E., Corke, P., & Milford, M. (2014). All-environment visual place recognition with SMART. In IEEE international conference on robotics and automation (ICRA) (pp. 1612–1618).
go back to reference Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
go back to reference Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In European conference on computer vision (ECCV) (pp. 143–156). Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In European conference on computer vision (ECCV) (pp. 143–156).
go back to reference Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
go back to reference Singh, G., & Kosecka, J. (2010). Visual loop closing using gist descriptors in manhattan world. In IEEE international conference on robotics and automation (ICRA) omnidirectional robot vision workshop. Singh, G., & Kosecka, J. (2010). Visual loop closing using gist descriptors in manhattan world. In IEEE international conference on robotics and automation (ICRA) omnidirectional robot vision workshop.
go back to reference Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In IEEE international conference on computer vision (ICCV) (pp. 1470–1477). Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In IEEE international conference on computer vision (ICCV) (pp. 1470–1477).
go back to reference Süenderhauf, N., & Protzel, P. (2011). BRIEF-Gist—Closing the loop by simple means. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1234–1241). Süenderhauf, N., & Protzel, P. (2011). BRIEF-Gist—Closing the loop by simple means. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1234–1241).
go back to reference Süenderhauf, N., Dayoub, F., Shirazi, S., Upcroft, B., & M. Milford (2015a). On the performance of ConvNet features for place recognition. In IEEE international conference on intelligent robots and systems (IROS). Süenderhauf, N., Dayoub, F., Shirazi, S., Upcroft, B., & M. Milford (2015a). On the performance of ConvNet features for place recognition. In IEEE international conference on intelligent robots and systems (IROS).
go back to reference Süenderhauf, N., Neubert, P., & Protzel, P. (2013). Are we there yet? Challenging seqslam on a 3000 km journey across all four seasons. In IEEE international conference on robotics and automation (ICRA) workshop on long-term autonomy. Süenderhauf, N., Neubert, P., & Protzel, P. (2013). Are we there yet? Challenging seqslam on a 3000 km journey across all four seasons. In IEEE international conference on robotics and automation (ICRA) workshop on long-term autonomy.
go back to reference Süenderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., & Milford, M. (2015b). Place recognition with ConvNet landmarks: viewpoint-robust, condition-robust, training-free. In Robotics: science and systems (RSS), Rome. Süenderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., & Milford, M. (2015b). Place recognition with ConvNet landmarks: viewpoint-robust, condition-robust, training-free. In Robotics: science and systems (RSS), Rome.
go back to reference Zhang, H. (2011). BoRF: Loop-closure detection with scale invariant visual features. In IEEE international conference on robotics and automation (ICRA) (pp. 3125–3130). Zhang, H. (2011). BoRF: Loop-closure detection with scale invariant visual features. In IEEE international conference on robotics and automation (ICRA) (pp. 3125–3130).
go back to reference Zhang, H., Han, F., & Wang, H. (2016). Robust multimodal sequence-based loop closure detection via structured sparsity. In Robotics: Science and systems (RSS). Zhang, H., Han, F., & Wang, H. (2016). Robust multimodal sequence-based loop closure detection via structured sparsity. In Robotics: Science and systems (RSS).
go back to reference Zheng, L., Yang, Y., & Tian, Q. (2016). SIFT meets CNN: a decade survey of instance retrieval. In IEEE transactions on pattern analysis and machine intelligence (vol. PP, no. 99, pp. 1–1). Zheng, L., Yang, Y., & Tian, Q. (2016). SIFT meets CNN: a decade survey of instance retrieval. In IEEE transactions on pattern analysis and machine intelligence (vol. PP, no. 99, pp. 1–1).
go back to reference Zitnick, C. L., & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In European conference on computer vision (ECCV) (pp. 391–405). Zitnick, C. L., & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In European conference on computer vision (ECCV) (pp. 391–405).
Metadata
Title
BoCNF: efficient image matching with Bag of ConvNet features for scalable and robust visual place recognition
Authors
Yi Hou
Hong Zhang
Shilin Zhou
Publication date
16-11-2017
Publisher
Springer US
Published in
Autonomous Robots / Issue 6/2018
Print ISSN: 0929-5593
Electronic ISSN: 1573-7527
DOI
https://doi.org/10.1007/s10514-017-9684-3

Other articles of this Issue 6/2018

Autonomous Robots 6/2018 Go to the issue