nach oben

International Journal of Computer Vision

Erschienen in:

01.10.2016

Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model

verfasst von: Wanli Ouyang, Xingyu Zeng, Xiaogang Wang

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation—the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Then the evidence of co-existing pedestrians is used for improving the single pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the Caltech-Train dataset and the ETH dataset. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. The mutual visibility deep model leads to 6–15 % improvements on multiple benchmark datasets.

Vorheriger Artikel Accurate Image Search with Multi-Scale Contextual Evidences

Nächster Artikel Multiple Granularity Modeling: A Coarse-to-Fine Framework for Fine-grained Action Analysis

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

’Lena’ and ’Richard’ are used as placeholder names in this paper.

http://www.cvg.rdg.ac.uk/PETS2009/a.html.

http://www.ee.cuhk.edu.hk/~xgwang/2DBNped.html.

Bar-Hillel, A., Levi, D., Krupka. E., & Goldberg, C. (2010). Part-based feature synthesis for human detection. In Proceedings of ECCV. New York: Springer.

Benenson, R., Mathias, M., Timofte, R., & Gool, L. V. (2012). Pedestrian detection at 100 frames per second. In CVPR. Berlin: IEEE Press.

Benenson, R., Mathias, M., Tuytelaars, & T., Van Gool, L. (2013). Seeking the strongest rigid detector. In Proceedings of CVPR, New York.

Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127.MathSciNetCrossRefMATH

Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.CrossRef

Chen, G., Ding, Y., Xiao, J., & Han, T. X. (2013). Detection evolution with multi-order contextual co-occurrence. In Proceedings of CVPR, Boca Raton.

Dai, S., Yang, M., Wu, Y., & Katsaggelos, A. (2007). Detector ensemble. In IEEE Conference on CVPR. Heidelberg: Springer.

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on CVPR. New York: IEEE.

Dean, T., Ruzon, M.A., Segal, M., Shlens, J., Vijayanarasimhan, S., & Yagnik, J. (2013). Fast, accurate detection of 100,000 object classes on a single machine. In Proceedings of the IEEE conference on CVPR, New York.

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on CVPR. New York: Springer.

Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In IEEE international conference on ECCV. Heidelberg: Springer.

Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class object layout. In ICCV. New York: Springer.

Ding, Y., & Xiao, J. (2012). Contextual boost for pedestrian detection. In CVPR, Berlin.

Dollár, P. (2014). Caltech pedestrian detection benchmark. Accessed May 6, 2014, from http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians.

Dollár, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In BMVC, Beijing.

Dollár, P., Belongie, S., & Perona, P. (2010).The fastest pedestrian detector in the west. In BMVC, Heidelberg.

Dollár, P., Appel, R., & Kienzle, W. (2012a.) Crosstalk cascades for frame-rate pedestrian detection. In ECCV.

Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012b). Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 743–761.CrossRef

Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1532–1545.CrossRef

Duan, G., Ai, H., & Lao, S. (2010). A structural filter approach to human detection. In ECCV, Berlin.

Enzweiler, M., Eigenstetter, A., Schiele, B., & Gavrila, D. M. (2010). Multi-cue pedestrian classification with partial occlusion handling. In CVPR.

Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11, 625–660.MathSciNetMATH

Ess, A., Leibe, B., & Gool, L. V. (2007). Depth and appearance for mobile scene analysis. In ICCV.

Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1915–1929.CrossRef

Felzenszwalb, P., Grishick, R. B., McAllister, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627–1645.CrossRef

Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771–1800.CrossRefMATH

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.MathSciNetCrossRefMATH

Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.MathSciNetCrossRefMATH

Hu, J., Lu, J., & Tan, Y. P. (2014). Discriminative deep metric learning for face verification in the wild. In CVPR.

Jarrett, K., Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In CVPR.

Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.CrossRef

Krizhevsky, A., Sutskever, I.,&Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

Le, Q. V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G. S., & Dean, J, Ng. A. Y. (2012). Building high-level features using large scale unsupervised learning. In ICML.

Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML.

Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR.

Li, C., Parikh, D., & Chen, T. (2011). Extracting adaptive contextual cues from unlabeled regions. In ICCV.

Lin, Z., Davis, L. S., Doermann, D., & DeMenthon, D. (2007). Hierarchical part-template matching for human detection and segmentation. In ICCV.

Liu, P., Jan, S., Meng, Z., & Tong, Y. (2014). Facial expression recognition via a boosted deep belief network. In CVPR.

Luo, P., Wang, X., & Tang, X. (2012). Hierarchical face parsing via deep learning. In CVPR.

Marın, J., Vázquez, D., López, A. M., Amores, J., & Leibe, B. (2013). Random forests of local experts for pedestrian detection. In CVPR.

Mathias, M., Benenson, R., Timofte, R., & Van Gool, L. (2013). Handling occlusions with franken-classifiers. In CVPR.

Nam, W., Han, B., & Han, J. H. (2011). Improving object localization using macrofeature layout selection. In ICCV workshop, (pp 1801–1808). Berlin: IEEE Press.

Norouzi, M., Ranjbar, M., & Mori, G. (2009). Stacks of convolutional restricted boltzmann machines for shift-invariant feature learning. In CVPR.

Ouyang, W., & Wang, X. (2012). A discriminative deep model for pedestrian detection with occlusion handling. In CVPR.

Ouyang, W., & Wang, X. (2013a). Joint deep learning for pedestrian detection. In ICCV.

Ouyang, W., & Wang, X. (2013b), Single-pedestrian detection aided by multi-pedestrian detection. In: CVPR.

Ouyang, W., Zeng, X., & Wang, X. (2013). Modeling mutual visibility relationship in pedestrian detection. In CVPR.

Ouyang, W., Zeng, X., Wang, X. (2015). Single-pedestrian detection aided by 2-pedestrian detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2014.2377734.

Ouyang, W., Zeng, X., Wang, X. (2016). Partial occlusion handling in pedestrian detection with a deep model. IEEE Transactions on Circuits and Systems for Video Technology. doi:10.1109/TCSVT.2015.2501940.

Paisitkriangkrai, S., Shen, C., & Van Den Hengel, A. (2013). Efficient pedestrian detection by directly optimize the partial area under the roc curve. In ICCV.

Park, D., Ramanan, D., & Fowlkes, C. (2010). Multiresolution models for object detection. In ECCV.

Park, D., Zitnick, C. L., Ramanan, D., & Dollár, P. (2013). Exploring weak stabilization for motion feature extraction. In CVPR.

Pepikj, B., Stark, M., Gehler, P., & Schiele, B.(2013). Occlusion patterns for object class detection. In CVPR (pp. 3286–3293). New York: IEEE Press.

Ranzato, M., Susskind, J., Mnih, V., & Hinton, G. (2011). On deep generative models with applications to recognition. In CVPR.

Sadeghi, M. A., & Farhadi, A. (2011). Recognition using visual phrases. In CVPR, (pp. 1745–1752). Urbana: IEEE.

Schwartz, W., Kembhavi, A., Harwood, D., & Davis, L. (2009). Human detection using partial least squares analysis. In ICCV.

Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013a). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229.

Sermanet, P., Kavukcuoglu, K., Chintala, S., & Lecun, Y. (2013b). Pedestrian detection with unsupervised and multi-stage feature learning. In CVPR.

Shen, C., Wang, P., Paisitkriangkrai, S., & van den Hengel, A. (2013). Training effective node classifiers for cascade classification. IJCV, 103(3), 326–347.MathSciNetCrossRefMATH

Shet, V. D., Neumann, J., Ramesh, V., & Davis, L. S. (2007). Bilattice-based logical reasoning for human detection. In CVPR.

Sun, L., Jia, K., Chan, T. H., Fang, Y., & Yan, S. (2014). Deeply-learned slow feature analysis for action recognition. In CVPR.

Sun, Y., Wang, X., & Tang, X. (2014). Deep learning face representation from predicting 10,000 classes. In CVPR.

Tang, S., Andriluka, M., & Schiele, B. (2012). Detection and tracking of occluded people. In BMVC, Surrey.

Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., & Schiele, B. (2013). Learning people detectors for tracking in crowded scenes. In Proceedings of ICCV.

Viola, P., Jones, M. J., & Snow, D. (2005). Detecting pedestrians using patterns of motion and appearance. IJCV, 63(2), 153–161.CrossRef

Walk, S., Majer, N., Schindler, K., & Schiele, B. (2010). New features and insights for pedestrian detection. In CVPR.

Wang, X., Han, X., & Yan, S. (2009). An hog-lbp human detector with partial occlusion handling. In CVPR.

Wojek, C., & Schiele, B. (2008). A performance evaluation of single and multi-feature people detection. In DAGM.

Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In ICCV.

Wu, B., & Nevatia, R. (2009). Detection and segmentation of multiple, partially occluded objects by grouping, merging, assigning part detection responses. IJCV, 82(2), 185–204.CrossRef

Wu, T., & Zhu, S. (2011). A numeric study of the bottom-up and top-down inference processes in and-or graphs. IJCV, 93(2), 226–252.MathSciNetCrossRefMATH

Yan, J., Lei, Z., Yi, D., & Li, S. Z. (2012). Multi-pedestrian detection in crowded scenes: A global view. In CVPR.

Yan, J., Zhang, X., Lei, Z., Liao, S., & Li, S. Z. (2013). Robust multi-resolution pedestrian detection in traffic scenes. In CVPR.

Yang, Y., & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. In CVPR.

Yang, Y., Baker, S., Kannan, A., & Ramanan, D. (2012). Recognizing proxemics in personal photos. In CVPR.

Yao, B., & Fei-Fei, L. (2010). Modeling mutual context of object and human pose in human-object interaction activities. In CVPR.

Zeng, X., Ouyang, W., & Wang, X. (2013). Multi-stage contextual deep learning for pedestrian detection. In ICCV.

Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In CVPR.

Titel: Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model
verfasst von: Wanli Ouyang
Xingyu Zeng
Xiaogang Wang
Publikationsdatum: 01.10.2016
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 1/2016
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-016-0890-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2016

Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation

Foveated Nonlocal Self-Similarity

Recognizing an Action Using Its Name: A Knowledge-Based Approach

Multiple Granularity Modeling: A Coarse-to-Fine Framework for Fine-grained Action Analysis

Accurate Image Search with Multi-Scale Contextual Evidences

Premium Partner