Skip to main content
Top
Published in: International Journal of Computer Vision 3/2016

01-07-2016

Learning and Calibrating Per-Location Classifiers for Visual Place Recognition

Authors: Petr Gronát, Josef Sivic, Guillaume Obozinski, Tomas Pajdla

Published in: International Journal of Computer Vision | Issue 3/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The aim of this work is to localize a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold. First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition. Second, as only one or a few positive training examples are available for each location, we propose two methods to calibrate all the per-location SVM classifiers without the need for additional positive training data. The first method relies on p-values from statistical hypothesis testing and uses only the available negative training data. The second method performs an affine calibration by appropriately normalizing the learnt classifier hyperplane and does not need any additional labelled training data. We test the proposed place recognition method with the bag-of-visual-words and Fisher vector image representations suitable for large scale indexing. Experiments are performed on three datasets: 25,000 and 55,000 geotagged street view images of Pittsburgh, and the 24/7 Tokyo benchmark containing 76,000 images with varying illumination conditions. The results show improved place recognition accuracy of the learnt image representation over direct matching of raw image descriptors.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
The notion most commonly used in statistics is in fact the p value. The p value associated to a score is the quantity \(\alpha (s)\) defined by \(\alpha (s)=1-F_0(s)\); so the more significant the score is, the closer to 1 the cdf value is, and the closer to 0 the p-value is. To keep the presentation simple, we avoid the formulation in terms of p-values and we only talk of the probabilistic calibrated values obtained from the cdf \(F_0\).
 
2
When the calibration by re-normalization method is used the \(\mathbf {w}_j\) contains the re-normalized weights and the bias \(b_j\) is zero. However, to cover both calibration methods we include the bias term in the derivations in this section.
 
Literature
go back to reference Agarwal, S., Snavely, N., Simon, I., Seitz, S. & Szeliski, R. (2009). Building Rome in a day. In ICCV (pp. 72–79). Agarwal, S., Snavely, N., Simon, I., Seitz, S. & Szeliski, R. (2009). Building Rome in a day. In ICCV (pp. 72–79).
go back to reference Arandjelović, R. & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In IEEE PAMI. Arandjelović, R. & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In IEEE PAMI.
go back to reference Aubry, M., Maturana, D., Efros, A., Russell, B. & Sivic, J. (2014). Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In CVPR. Aubry, M., Maturana, D., Efros, A., Russell, B. & Sivic, J. (2014). Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In CVPR.
go back to reference Aubry, M., Russell, B. & Sivic, J. (2014) Painting-to-3D model alignment via discriminative visual elements. ACM Transactions on Graphics. Aubry, M., Russell, B. & Sivic, J. (2014) Painting-to-3D model alignment via discriminative visual elements. ACM Transactions on Graphics.
go back to reference Bay, H., Tuytelaars, T. & Van Gool, L. (2006). SURF: Speeded up robust features. In ECCV. Bay, H., Tuytelaars, T. & Van Gool, L. (2006). SURF: Speeded up robust features. In ECCV.
go back to reference Cao, S. & Snavely, N. (2013). Graph-based discriminative learning for location recognition. In IEEE Conference on CVPR (pp. 700–707). Cao, S. & Snavely, N. (2013). Graph-based discriminative learning for location recognition. In IEEE Conference on CVPR (pp. 700–707).
go back to reference Casella, G. & Berger, R. (2001). Statistical inference. Casella, G. & Berger, R. (2001). Statistical inference.
go back to reference Chen, D., Baatz, G., Köser, Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B. & Grzeszczuk, R. (2011). City-scale landmark identification on mobile devices. In CVPR. Chen, D., Baatz, G., Köser, Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B. & Grzeszczuk, R. (2011). City-scale landmark identification on mobile devices. In CVPR.
go back to reference Chum, O., Philbin, J., Sivic, J., Isard, M. & Zisserman, A. (2007). Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV. Chum, O., Philbin, J., Sivic, J., Isard, M. & Zisserman, A. (2007). Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV.
go back to reference Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV (pp. 1–22). Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV (pp. 1–22).
go back to reference Cummins, M. & Newman, P. (2009). Highly scalable appearance-only SLAM - FAB-MAP 2.0. In Proceedings of Robotics: Science and Systems, Seattle, USA. Cummins, M. & Newman, P. (2009). Highly scalable appearance-only SLAM - FAB-MAP 2.0. In Proceedings of Robotics: Science and Systems, Seattle, USA.
go back to reference Dalal, N. & Triggs, B. (2005). Histogram of oriented gradients for human detection. In CVPR. Dalal, N. & Triggs, B. (2005). Histogram of oriented gradients for human detection. In CVPR.
go back to reference Doersch, C., Gupta, A. & Efros, A.A. (2013). Mid-level visual element discovery as discriminative mode seeking. In NIPS. Doersch, C., Gupta, A. & Efros, A.A. (2013). Mid-level visual element discovery as discriminative mode seeking. In NIPS.
go back to reference Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. A. (2012). What makes Paris look like Paris? SIGGRAPH, 31(4), 101. Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. A. (2012). What makes Paris look like Paris? SIGGRAPH, 31(4), 101.
go back to reference Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATH Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATH
go back to reference Gebel, M., & Weihs, C. (2007). Calibrating classifier scores into probabilities. Advances in Data Analysis (pp. 141–148). Berlin: Springer.CrossRef Gebel, M., & Weihs, C. (2007). Calibrating classifier scores into probabilities. Advances in Data Analysis (pp. 141–148). Berlin: Springer.CrossRef
go back to reference Gharbi, M., Malisiewicz, T., Paris, S., & Durand, F. (2012). A Gaussian approximation of feature space for fast image similarity. Technical Report, MIT. Gharbi, M., Malisiewicz, T., Paris, S., & Durand, F. (2012). A Gaussian approximation of feature space for fast image similarity. Technical Report, MIT.
go back to reference Gronát, P., Obozinski, G., Sivic, J. & Pajdla, T. (2013). Learning and calibrating per-location classifiers for visual place recognition. In CVPR. Gronát, P., Obozinski, G., Sivic, J. & Pajdla, T. (2013). Learning and calibrating per-location classifiers for visual place recognition. In CVPR.
go back to reference Hariharan, B., Malik, J. & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In ECCV. Hariharan, B., Malik, J. & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In ECCV.
go back to reference Hays, J. & Efros, A.A. (2008). im2gps: estimating geographic information from a single image. In CVPR. Hays, J. & Efros, A.A. (2008). im2gps: estimating geographic information from a single image. In CVPR.
go back to reference Irschara, A., Zach, C., Frahm, J.M. & Bischof, H. (2009). From structure-from-motion point clouds to fast location recognition. In CVPR. Irschara, A., Zach, C., Frahm, J.M. & Bischof, H. (2009). From structure-from-motion point clouds to fast location recognition. In CVPR.
go back to reference Jégou, H. & Chum, O. (2012). Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In ECCV (pp. 774–787). Jégou, H. & Chum, O. (2012). Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In ECCV (pp. 774–787).
go back to reference Jegou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on PAMI, 33(1), 117–128.CrossRef Jegou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on PAMI, 33(1), 117–128.CrossRef
go back to reference Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on PAMI, 34, 1704–1716.CrossRef Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on PAMI, 34, 1704–1716.CrossRef
go back to reference Kalogerakis, E., Vesselova, O., Hays, J., Efros, A. & Hertzmann, A. (2009). Image sequence geolocation with human travel priors. In ICCV (pp. 253–260). Kalogerakis, E., Vesselova, O., Hays, J., Efros, A. & Hertzmann, A. (2009). Image sequence geolocation with human travel priors. In ICCV (pp. 253–260).
go back to reference Klingner, B., Martin, D. & Roseborough, J. (2013). Street view motion-from-structure-from-motion. In ICCV. Klingner, B., Martin, D. & Roseborough, J. (2013). Street view motion-from-structure-from-motion. In ICCV.
go back to reference Knopp, J., Sivic, J. & Pajdla, T. (2010). Avoidng confusing features in place recognition. In ECCV. Knopp, J., Sivic, J. & Pajdla, T. (2010). Avoidng confusing features in place recognition. In ECCV.
go back to reference Li, Y., Crandall, D. & Huttenlocher, D. (2009). Landmark classification in large-scale image collections. In ICCV. Li, Y., Crandall, D. & Huttenlocher, D. (2009). Landmark classification in large-scale image collections. In ICCV.
go back to reference Li, Y., Snavely, N. & Huttenlocher, D. (2010). Location recognition using prioritized feature matching. In ECCV. Li, Y., Snavely, N. & Huttenlocher, D. (2010). Location recognition using prioritized feature matching. In ECCV.
go back to reference Li, Y., Snavely, N., Huttenlocher, D. & Fua, P. (2012). Worldwide pose estimation using 3d point clouds. In ECCV. Li, Y., Snavely, N., Huttenlocher, D. & Fua, P. (2012). Worldwide pose estimation using 3d point clouds. In ECCV.
go back to reference Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.CrossRef Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.CrossRef
go back to reference Malisiewicz, T., Gupta, A. & Efros, A.A. (2011). Ensemble of exemplar-svms for object detection and beyond. In ICCV. Malisiewicz, T., Gupta, A. & Efros, A.A. (2011). Ensemble of exemplar-svms for object detection and beyond. In ICCV.
go back to reference Muja, M. & Lowe, D.G. (2014). Scalable nearest neighbor algorithms for high dimensional data. In IEEE Transactions on PAMI 36. Muja, M. & Lowe, D.G. (2014). Scalable nearest neighbor algorithms for high dimensional data. In IEEE Transactions on PAMI 36.
go back to reference Nister, D. & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR. Nister, D. & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR.
go back to reference Philbin, J., Chum, O., Isard, M., Sivic, J. & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR. Philbin, J., Chum, O., Isard, M., Sivic, J. & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR.
go back to reference Philbin, J., Sivic, J. & Zisserman, A. (2010). Geometric latent dirichlet allocation on a matching graph for large-scale image datasets. In IJCV. Philbin, J., Sivic, J. & Zisserman, A. (2010). Geometric latent dirichlet allocation on a matching graph for large-scale image datasets. In IJCV.
go back to reference Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 10(3), 61–74. Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 10(3), 61–74.
go back to reference Sattler, T., Leibe, B. & Kobbelt, L. (2012). Improving image-based localization by active correspondence search. In ECCV. Sattler, T., Leibe, B. & Kobbelt, L. (2012). Improving image-based localization by active correspondence search. In ECCV.
go back to reference Sattler, T., Weyand, T., Leibe, B., & Kobbelt, L. (2012). Image retrieval for image-based localization revisited. In Proceedings of BMVC. Sattler, T., Weyand, T., Leibe, B., & Kobbelt, L. (2012). Image retrieval for image-based localization revisited. In Proceedings of BMVC.
go back to reference Scheirer, W., Kumar, N., Belhumeur, P.N. & Boult, T.E. (2012). Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR. Scheirer, W., Kumar, N., Belhumeur, P.N. & Boult, T.E. (2012). Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR.
go back to reference Schindler, G., Brown, M. & Szeliski, R. (2007). City-scale location recognition. In CVPR. Schindler, G., Brown, M. & Szeliski, R. (2007). City-scale location recognition. In CVPR.
go back to reference Scholkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT press.MATH Scholkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT press.MATH
go back to reference Shrivastava, A., Malisiewicz, T., Gupta, A. & Efros, A.A. (2011). Data-driven visual similarity for cross-domain image matching. In SIGGRAPH ASIA. Shrivastava, A., Malisiewicz, T., Gupta, A. & Efros, A.A. (2011). Data-driven visual similarity for cross-domain image matching. In SIGGRAPH ASIA.
go back to reference Singh, S., Gupta, A. & Efros, A.A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV. Singh, S., Gupta, A. & Efros, A.A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV.
go back to reference Tighe, J. & Lazebnik, S. (2013). Finding things: Image parsing with regions and per-exemplar detectors. In CVPR. Tighe, J. & Lazebnik, S. (2013). Finding things: Image parsing with regions and per-exemplar detectors. In CVPR.
go back to reference Torii, A., Arandjelović, R., Sivic, J., Okutomi, M. & Pajdla, T. (2015). 24/7 place recognition by view synthesis. In CVPR. Torii, A., Arandjelović, R., Sivic, J., Okutomi, M. & Pajdla, T. (2015). 24/7 place recognition by view synthesis. In CVPR.
go back to reference Torii, A., Sivic, J. & Pajdla, T. (2011). Visual localization by linear combination of image descriptors. In IEEE Workshop on Mobile Vision. Torii, A., Sivic, J. & Pajdla, T. (2011). Visual localization by linear combination of image descriptors. In IEEE Workshop on Mobile Vision.
go back to reference Torii, A., Sivic, J., Pajdla, T. & Okutomi, M. (2013) Visual place recognition with repetitive structures. In CVPR. Torii, A., Sivic, J., Pajdla, T. & Okutomi, M. (2013) Visual place recognition with repetitive structures. In CVPR.
go back to reference Turcot, P., & Lowe, D. (2009). Better matching with fewer features: The selection of useful features in large database recognition problem. In WS-LAVD, ICCV. Turcot, P., & Lowe, D. (2009). Better matching with fewer features: The selection of useful features in large database recognition problem. In WS-LAVD, ICCV.
go back to reference Zadrozny, B. & Elkan, C. (2002) Transforming classifier scores into accurate multiclass probability estimates. In ACM SIGKDD. Zadrozny, B. & Elkan, C. (2002) Transforming classifier scores into accurate multiclass probability estimates. In ACM SIGKDD.
go back to reference Zamir, A. & Shah, M. (2010) Accurate image localization based on google maps street view. In ECCV. Zamir, A. & Shah, M. (2010) Accurate image localization based on google maps street view. In ECCV.
Metadata
Title
Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
Authors
Petr Gronát
Josef Sivic
Guillaume Obozinski
Tomas Pajdla
Publication date
01-07-2016
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 3/2016
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-015-0878-x

Other articles of this Issue 3/2016

International Journal of Computer Vision 3/2016 Go to the issue

Premium Partner