Skip to main content
Erschienen in: International Journal of Computer Vision 2/2015

01.01.2015

An Elastic Deformation Field Model for Object Detection and Tracking

verfasst von: Marco Pedersoli, Radu Timofte, Tinne Tuytelaars, Luc Van Gool

Erschienen in: International Journal of Computer Vision | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deformable Parts Models (DPM) are the current state-of-the-art for object detection. Nevertheless they seem sub-optimal in the representation of deformations. Object deformations are often continuous and not confined to big parts. Therefore we propose to replace the DPM star model based on big parts by a deformation field. This consists of a grid of small parts connected with pairwise constraints which can better handle continuous deformations. The naive application of this model for object detection would consist of a bounded sliding window approach: for each possible location of the image the best part configuration within a limited bound around this location is found. This is computationally very expensive.Instead, we propose a different inference procedure, where an iterative image-level search finds the best object hypothesis. We show that this approach is faster than bounded sliding windows yet produces comparable accuracy. Experiments further show that the deformation field can better approximate real object deformations and therefore, for certain classes, produces even better detection accuracy than state-of-the-art DPM. Finally, the same approach is adapted to model-free tracking, showing improved accuracy also in this case.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this work when referring to DPM we consider the specific implementation of Felzenszwalb et al. (2010).
 
2
Typically, when using alpha expansion the solution will not be exact. However, we experimentally found that the algorithm still works well, which is an indirect observation that the solution that is found is generally very close to the exact one. See Fig. 4 in the experimental results.
 
3
\(S(\mathbf {l},\mathbf {x},\mathbf {w})\) is the maximization defined in Eq. (9), where we make explicit the dependency on the image \(\mathbf {x}\) and \(\mathbf {w}\).
 
Literatur
Zurück zum Zitat Alahari, K., Kohli, P., & Torr, P. H. S. (2010). Dynamic hybrid algorithms for map inference in discrete mrfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1846–1857.CrossRef Alahari, K., Kohli, P., & Torr, P. H. S. (2010). Dynamic hybrid algorithms for map inference in discrete mrfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1846–1857.CrossRef
Zurück zum Zitat Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1014–1021). Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1014–1021).
Zurück zum Zitat Babenko, B., Yang, M. H., & Belongie, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1619–1632.CrossRef Babenko, B., Yang, M. H., & Belongie, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1619–1632.CrossRef
Zurück zum Zitat Batra, D., Yadollahpour, P., Guzman, A., & Shakhnarovich, G. (2012). Diverse m-best solutions in markov random fields. In Proceedings of the European conference on computer vision. Batra, D., Yadollahpour, P., Guzman, A., & Shakhnarovich, G. (2012). Diverse m-best solutions in markov random fields. In Proceedings of the European conference on computer vision.
Zurück zum Zitat Bergtholdt, M., Kappes, J., Schmidt, S., & Schnörr, C. (2010). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117.CrossRefMathSciNet Bergtholdt, M., Kappes, J., Schmidt, S., & Schnörr, C. (2010). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117.CrossRefMathSciNet
Zurück zum Zitat Bourdev, L. D., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In Proceedings of the European conference on computer vision (pp. 168–181). Bourdev, L. D., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In Proceedings of the European conference on computer vision (pp. 168–181).
Zurück zum Zitat Boykov, Y., Veksler, O., & Zabih, R. (1998). Markov random fields with efficient approximations. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 648–656). Boykov, Y., Veksler, O., & Zabih, R. (1998). Markov random fields with efficient approximations. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 648–656).
Zurück zum Zitat Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.CrossRef Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.CrossRef
Zurück zum Zitat Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 10–17). Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 10–17).
Zurück zum Zitat Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 886–893). Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 886–893).
Zurück zum Zitat Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class layout. In Proceedings of the IEEE international conference on computer vision. Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class layout. In Proceedings of the IEEE international conference on computer vision.
Zurück zum Zitat Duchenne, O., Joulin, A., & Ponce, J. (2011). A graph-matching kernel for object categorization. p. 1056. Barcelona, Spain. Duchenne, O., Joulin, A., & Ponce, J. (2011). A graph-matching kernel for object categorization. p. 1056. Barcelona, Spain.
Zurück zum Zitat Everingham, M., Zisserman, A., Williams, C., & Van Gool, L. (2007). The pascal visual obiect classes challenge 2007 (voc2007) results. Everingham, M., Zisserman, A., Williams, C., & Van Gool, L. (2007). The pascal visual obiect classes challenge 2007 (voc2007) results.
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R., & McAllester, D. (2010). Cascade object detection with deformable part models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2241–2248). Felzenszwalb, P. F., Girshick, R., & McAllester, D. (2010). Cascade object detection with deformable part models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2241–2248).
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (9) 1627–1645. Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (9) 1627–1645.
Zurück zum Zitat Felzenszwalb, P. F., & Huttenlocher, D.P. (2004). Distance transforms of sampled functions. Technical report Felzenszwalb, P. F., & Huttenlocher, D.P. (2004). Distance transforms of sampled functions. Technical report
Zurück zum Zitat Felzenszwalb, P. F., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8). Felzenszwalb, P. F., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Zurück zum Zitat Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 264–271). Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 264–271).
Zurück zum Zitat Glocker, B., Komodakis, N., Tziritas, G., Navab, N., & Paragios, N. (2008). Dense image registration through mrfs and efficient linear programming. Medical Image Analysis, 12(6), 731–741.CrossRef Glocker, B., Komodakis, N., Tziritas, G., Navab, N., & Paragios, N. (2008). Dense image registration through mrfs and efficient linear programming. Medical Image Analysis, 12(6), 731–741.CrossRef
Zurück zum Zitat Hoeim, D., Rother, C., & Winn, J. M. (2008). 3d layout crf for multi-view object class recognition and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. Hoeim, D., Rother, C., & Winn, J. M. (2008). 3d layout crf for multi-view object class recognition and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185–203.CrossRef Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185–203.CrossRef
Zurück zum Zitat Kalal, Z., Matas, J., & Mikolajczyk, K. (2010).P-n learning: Bootstrapping binary classifiers by structural constraints. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 49–56). Kalal, Z., Matas, J., & Mikolajczyk, K. (2010).P-n learning: Bootstrapping binary classifiers by structural constraints. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 49–56).
Zurück zum Zitat Kapoor, A., & Winn, J. (2006). Located hidden random fields: Learning discriminative parts for object detection. In Proceedings of the European conference on computer vision (pp. 302–315). Kapoor, A., & Winn, J. (2006). Located hidden random fields: Learning discriminative parts for object detection. In Proceedings of the European conference on computer vision (pp. 302–315).
Zurück zum Zitat Kohli, P., & Torr, P. H. S. (2007). Dynamic graph cuts for efficient inference in markov random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2079–2088.CrossRef Kohli, P., & Torr, P. H. S. (2007). Dynamic graph cuts for efficient inference in markov random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2079–2088.CrossRef
Zurück zum Zitat Komodakis, N., Tziritas, G., & Paragios, N. (2008). Performance vs computational efficiency for optimizing single and dynamic mrfs: Setting the state of the art with primal-dual strategies. Computer Vision and Image Understanding, 112(1), 14–29.CrossRef Komodakis, N., Tziritas, G., & Paragios, N. (2008). Performance vs computational efficiency for optimizing single and dynamic mrfs: Setting the state of the art with primal-dual strategies. Computer Vision and Image Understanding, 112(1), 14–29.CrossRef
Zurück zum Zitat Lades, M., Vorbruggen, J. C., Buhmann, J., Lange, J., Malsburg, Cvd, Wurtz, R. P., et al. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computers, 42(3), 300–311.CrossRef Lades, M., Vorbruggen, J. C., Buhmann, J., Lange, J., Malsburg, Cvd, Wurtz, R. P., et al. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computers, 42(3), 300–311.CrossRef
Zurück zum Zitat Ladicky, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. (2010). Where, what and how many? Combining object detectors and crfs. In Proceedings of the European conference on computer vision (pp. 424–437). Ladicky, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. (2010). Where, what and how many? Combining object detectors and crfs. In Proceedings of the European conference on computer vision (pp. 424–437).
Zurück zum Zitat Ladicky, L., Torr, P. H. S., & Zisserman, A. (2012). Latent svms for human detection with a locally affine deformation field. In Proceedings of the British machine vision conference (pp. 10.1–10.11). Ladicky, L., Torr, P. H. S., & Zisserman, A. (2012). Latent svms for human detection with a locally affine deformation field. In Proceedings of the British machine vision conference (pp. 10.1–10.11).
Zurück zum Zitat Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of the international joint conference on artificial intelligence (pp. 674–679). Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of the international joint conference on artificial intelligence (pp. 674–679).
Zurück zum Zitat Pedersoli, M., Vedaldi, A., & Gonzàlez, J. (2011). A coarse-to-fine approach for fast deformable object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1353–1360). Pedersoli, M., Vedaldi, A., & Gonzàlez, J. (2011). A coarse-to-fine approach for fast deformable object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1353–1360).
Zurück zum Zitat Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In L. K. Saul, Y. Weiss & L. Bottou (Eds.), Advances in neural information processing systems (pp. 1097–1104). MIT Press. Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In L. K. Saul, Y. Weiss & L. Bottou (Eds.), Advances in neural information processing systems (pp. 1097–1104). MIT Press.
Zurück zum Zitat Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming, 127(1), 3–30. Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming, 127(1), 3–30.
Zurück zum Zitat Vedaldi, A., & Zisserman, A. (2009). Structured output regression for detection with partial occulsion. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams & A. Culotta (Eds.), Advances in neural information processing systems (pp. 1928–1936). Vedaldi, A., & Zisserman, A. (2009). Structured output regression for detection with partial occulsion. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams & A. Culotta (Eds.), Advances in neural information processing systems (pp. 1928–1936).
Zurück zum Zitat Vedaldi, A., & Zisserman, A. (2012). Sparse kernel approximations for efficient classification and detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2320–2327). Vedaldi, A., & Zisserman, A. (2012). Sparse kernel approximations for efficient classification and detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2320–2327).
Zurück zum Zitat Wang, Y., Tran, D., Liao, Z. (2011). Learning hierarchical poselets for human parsing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1705–1712). Wang, Y., Tran, D., Liao, Z. (2011). Learning hierarchical poselets for human parsing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1705–1712).
Zurück zum Zitat Yang, Y., & Ramanan, D. (2012). Articulated human detection with flexible mixtures-of-parts. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(PrePrints), 1. Yang, Y., & Ramanan, D. (2012). Articulated human detection with flexible mixtures-of-parts. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(PrePrints), 1.
Zurück zum Zitat Yuille, A., Rangarajan, A., & Yuille, A. L. (2002). The concave-convex procedure (cccp). In Advances in neural information processing systems (pp. 1033–1040). Yuille, A., Rangarajan, A., & Yuille, A. L. (2002). The concave-convex procedure (cccp). In Advances in neural information processing systems (pp. 1033–1040).
Zurück zum Zitat Zhang, L., & van der Maaten, L. (2013). Structure preserving object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8). Zhang, L., & van der Maaten, L. (2013). Structure preserving object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Zurück zum Zitat Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8). Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Metadaten
Titel
An Elastic Deformation Field Model for Object Detection and Tracking
verfasst von
Marco Pedersoli
Radu Timofte
Tinne Tuytelaars
Luc Van Gool
Publikationsdatum
01.01.2015
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 2/2015
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-014-0736-2

Weitere Artikel der Ausgabe 2/2015

International Journal of Computer Vision 2/2015 Zur Ausgabe