Top

International Journal of Computer Vision

Published in:

08-04-2021

Guided Attention in CNNs for Occluded Pedestrian Detection and Re-identification

Authors: Shanshan Zhang, Di Chen, Jian Yang, Bernt Schiele

Published in: International Journal of Computer Vision | Issue 6/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Pedestrian detection and re-identification have progressed significantly in the last few years. However, occluded people are notoriously hard to detect and recognize, as their appearance varies substantially depending on a wide range of occlusion patterns. In this paper, we aim to propose a simple and compact method based on CNNs for occlusion handling. We start with interpreting CNN channel features of a pedestrian detector, and we find that different channels activate responses for different body parts respectively. These findings motivate us to employ an attention mechanism across channels to represent various occlusion patterns in one single model, as each occlusion pattern can be formulated as some specific combination of body parts. Therefore, an attention network with self or external guidances is proposed as an add-on to the baseline CNN method. Also, we propose an attention guided self-paced learning method to balance the optimization across different occlusion levels. Our proposed method shows significant improvements over the baseline methods for both pedestrian detection and re-identification tasks. For pedestrian detection, we achieve a considerable improvement of 8pp to the baseline FasterRCNN detector on the heavy occlusion subset of CityPersons and on Caltech we outperform the state-of-the-art method by 5pp. For pedestrian re-identification, our method surpasses the baseline and achieves state-of-the-art performance on multiple re-identification benchmarks.

previous article Vote-Based 3D Object Detection with Context Modeling and SOB-3DNMS

next article Visual Structure Constraint for Transductive Zero-Shot Learning in the Wild

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Ahmed E., Jones M., & Marks T. K. (2015). An improved deep learning architecture for person re-identification. In CVPR.

Bau D., Zhou B., Khosla A., Oliva A., & Torralba A. (2017) Network dissection: Quantifying interpretability of deep visual representations. In CVPR

Bell S., Zitnick C. L., Bala K., & Girshick R. (2016). Inside outside net: Detecting objects in context with skip pooling and recurrent neural networks. In CVPR

Benenson R., Omran M., Hosang J., & Schiele B. (2014). Ten years of pedestrian detection, what have we learned? In ECCV, CVRSUAD workshop.

Brazil G., & Liu X. (2019). Pedestrian detection with autoregressive network phases. In CVPR

Brazil G., Yin X., & Liu X. (2017). Illuminating pedestrians via simultaneous detection & segmentation. In ICCV.

Cai Z., Fan Q., Feris R., & Vasconcelos N. (2016). A unified multi-scale deep convolutional neural network for fast object detection. In ECCV.

Cheng D., Gong Y., Zhou S., Wang J., & Zheng N. (2016). Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In CVPR.

Chu X., Zheng A., Zhang X., & Sun J. (2020). Detection in crowded scenes: One proposal, multiple predictions. In CVPR.

Cordts M., Omran M., Ramos S., Rehfeld T., Enzweiler M., Benenson R., Franke U., Roth S., & Schiele B. (2016) The cityscapes dataset for semantic urban scene understanding. In CVPR.

Ding, S., Lin, L., Wang, G., & Chao, H. (2015). Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition, 48(10),

Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. PAMI, 34(4), 743–761.CrossRef

Du X., El-Khamy M., Lee J., & Davis L. S. (2016). Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In arXiv.

Enzweiler, M., Eigenstetter, A., Schiele, B., & Gavrila, D. (2010). Multi-cue pedestrian classification with partial occlusion handling. In CVPR.

Ess, A., Leibe, B., Schindler, K., & Gool, L. V. (2008). A mobile vision system for robust multi-person tracking. In CVPR.

Felzenszwalb, P. F., Girshick, R. B., Mcallester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. PAMI, 32(9), 1627–1645.CrossRef

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR.

Gonzalez-Garcia, A., Modolo, D., & Ferrari, V. (2017). Do semantic parts emerge in convolutional neural networks? IJCV, 126(5), 476–494.MathSciNetCrossRef

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

He, L., Liang, J., Li, H., & Sun, Z. (2018). Deep spatial feature reconstruction for partial person re-identification: Alignment-free approach. In CVPR.

Hosang, J., Omran, M., Benenson, R., & Schiele, B. (2015). Taking a deeper look at pedestrians. In CVPR.

Hu, J., Shen, L., & Sun, G. (2017). Squeeze-and-excitation networks. arXiv.

Huang, H., Li, D., Zhang, Z., Chen, X., & Huang, K. (2018) Adversarially occluded samples for person re-identification. In CVPR.

Huang, X., Ge, Z., Jie, Z., & Yoshie, O. (2020a). NMS by representative region: Towards crowded pedestrian detection by proposal pairing. In CVPR.

Huang, X., Ge, Z., Jie, Z., & Yoshie1, O. (2020b). NMS by representative region: Towards crowded pedestrian detection by proposal pairing. In CVPR.

Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., & Schiele, B. (2016). Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In ECCV.

Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. (2015). Spatial transformer networks. In NIPS.

Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

Li, G., Li, J., Zhang, S., & Yang, J. (2020). Learning hierarchical graph for occluded pedestrian detection. In ACM MM.

Li, J., Liang, X., Shen, S., Xu, T., & Yan, S. (2016). Scale-aware fast R-CNN for pedestrian detection. arXiv

Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR.

Li, W., Zhu, X., & Gong, S. (2018). Harmonious attention network for person re-identification. In CVPR.

Lin, C., Lu, J., Wang, G., & Zhou, J. (2018). Graininess-aware deep feature learning for pedestrian detection. In ECCV.

Liu, H., Feng, J., Qi, M., Jiang, J., & Yan, S. (2017). End-to-end comparative attention networks for person re-identification. TIP, 26(7),

Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., & Hu, J. (2018a). Pose transferrable person re-identification. In CVPR.

Liu S., Huang D., & Wang Y. (2019a) Adaptive nms: Refining pedestrian detection in a crowd. In: CVPR

Liu W., Liao S., Hu W., Liang X., & Chen X. (2018b) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: ECCV

Liu W., Liao S., Ren W., Hu W., & Yu Y. (2019b) High-level semantic feature detection: A new perspective for pedestrian detection. In: CVPR

Mathias M., Benenson R., Timofte R., & Van Gool L. (2013) Handling occlusions with franken-classifiers. In: ICCV

Newell A., Yang K., & Deng J. (2016) Stacked hourglass networks for human pose estimation. In: ECCV

Noh J., Lee S., Kim B., & Kim G. (2018) Improving occlusion and hard negative handling for single-stage pedestrian detectors. In: CVPR

Ouyang W., & Wang X. (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR

Ouyang W., & Wang X. (2013) Joint deep learning for pedestrian detection. In: ICCV

Paisitkriangkrai S., Shen C., & van den Hengel A. (2014) Strengthening the effectiveness of pedestrian detection. In: ECCV

Pang Y., Xie J., Khan M. H., Anwer R. M., Khan F. S., & Shao L. (2019) Mask-guided attention network for occluded pedestrian detection. In: ICCV

Ren S., He K., Girshick R., & Sun J. (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS

Ristani E., Solera F., Zou R., Cucchiara R., & Tomasi C. (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: ECCV

Saquib Sarfraz M., Schumann A., Eberle A., & Stiefelhagen R. (2018) A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: CVPR

Shao S., Zhao Z., Li B., Xiao T., Yu G., Zhang X., & Sun J. (2018) Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:180500123

Si J., Zhang H., Li C.-G., Kuen J., Kong X., Kot A. C., & Wang G. (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: CVPR

Simon M., Rodner E., & Denzler J. (2014) Part detector discovery in deep convolutional neural networks. In: ACCV

Song T., L. Sun D. X., Sun H., & Pu S. (2018) Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: ECCV

Su C., Li J., Zhang S., Xing J., Gao W., & Tian Q. (2017) Pose-driven deep convolutional model for person re-identification. In: ICCV

Suh Y., Wang J., Tang S., Mei T., & Mu Lee K. (2018) Part-aligned bilinear representations for person re-identification. In: ECCV

Szegedy C., Vanhoucke V., Ioffe S., Shlens J., & Wojna Z. (2016) Rethinking the inception architecture for computer vision. In: CVPR

Tian Y., Luo P., Wang X., & Tang X. (2015a) Deep learning strong parts for pedestrian detection. In: ICCV

Tian Y., Luo P., Wang X., & Tang X. (2015b) Pedestrian detection aided by deep learning semantic tasks. In: CVPR

Varior R. R., Shuai B., Lu J., Xu D., & Wang G. (2016) A Siamese Long Short-Term Memory Architecture for Human Re-Identification. In: ECCV

Wang S., Cheng J., Liu H., & Tang M. (2017) Pcn: Part and context information for pedestrian detection with cnns. In: BMVC

Wang X., Xiao T., Jiang Y., Shao S., Sun J., & Shen C. (2018) Repulsion loss: Detecting pedestrians in a crowd. In: CVPR

Wei Liu W. R. W. H. Y. Y. Shengcai Liao (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Wu J., Zhou C., Yang M., Zhang Q., Li Y., & Yuan J. (2020) Temporal-context enhanced detection of heavily occluded pedestrians. In: CVPR

Xiao T., Li H., Ouyang W., & Wang X. (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR

Xiao T., Li S., Wang B., Lin L., & Wang X. (2017) Joint detection and identification feature learning for person search. In: CVPR

Xie J., Cholakkal H., Anwer R., Khan F., Pang Y., Shao L., & Shah M. (2020) Count- and similarity-aware r-cnn for pedestrian detection. In: ECCV

Xu J., Zhao R., Zhu F., Wang H., & Ouyang W. (2018) Attention-aware compositional network for person re-identification. In: CVPR

Yi D., Lei Z., Liao S., & Li S. Z. (2014) Deep metric learning for person re-identification. In: ICPR

Zeiler M. D., & Fergus R. (2014) Visualizing and understanding convolutional networks. In: ECCV

Zhang L., Lin L., Liang X., & He K. (2016a) Is faster rcnn doing well with pedestrian detection. In: ECCV

Zhang S., Benenson R., Omran M., Hosang J., & Schiele B. (2016b) How far are we from solving pedestrian detection? In: CVPR

Zhang S., Benenson R., & Schiele B. (2017) Citypersons: A diverse dataset for pedestrian detection. In: CVPR

Zhang, S., Benenson, R., Omran, M., Hosang, J., & Schiele, B. (2018a). Towards reaching human performance in pedestrian detection. PAMI, 40(4), 973–986.CrossRef

Zhang S., Wen L., Bian X., & Lei Z., Li S. Z. (2018b) Occlusion-aware r-cnn: Detecting pedestrians in a crowd. In: ECCV

Zheng L., Shen L., Tian L., Wang S., Wang J., & Tian Q. (2015a) Scalable person re-identification: A benchmark. In: ICCV

Zheng L., Bie Z., Sun Y., Wang J., Su C., Wang S., & Tian Q. (2016a) Mars: A video benchmark for large-scale person re-identification. In: ECCV

Zheng L., Yang Y., & Hauptmann A. G. (2016b) Person re-identification: Past, present and future. arXiv

Zheng L., Zhang H., Sun S., Chandraker M., Yang Y., & Tian Q. (2017a) Person re-identification in the wild. In: CVPR

Zheng W. S., Gong S., & Xiang T. (2009) Associating groups of people. In: BMVC

Zheng W. S., Li X., Xiang T., Liao S., Lai J., & Gong S. (2015b) Partial person re-identification. In: ICCV

Zheng Z., Zheng L., & Yang Y. (2017b) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: ICCV

Zheng Z., Zheng L., & Yang Y. (2018) A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14(1)

Zhong Z., Zheng L., Cao D., & Li S. (2017a) Re-ranking person re-identification with k-reciprocal encoding. In: CVPR

Zhong Z., Zheng L., Kang G., Li S., & Yang Y. (2017b) Random erasing data augmentation. In: arxiv

Zhou C., & Yuan J. (2017) Multi-label learning of part detectors for heavily occluded pedestrian detection. In: ICCV

Zhou C., & Yuan J. (2018) Bi-box regression for pedestrian detection and occlusion estimation. In: ECCV

Zhou C., Yang M., & Yuan J. (2019) Discriminative feature transformation for occluded pedestrian detection. In: ICCV

Title: Guided Attention in CNNs for Occluded Pedestrian Detection and Re-identification
Authors: Shanshan Zhang
Di Chen
Jian Yang
Bernt Schiele
Publication date: 08-04-2021
Publisher: Springer US
Published in: International Journal of Computer Vision / Issue 6/2021
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-021-01461-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 6/2021

Learning Deep Patch representation for Probabilistic Graphical Model-Based Face Sketch Synthesis

Learning Adaptive Classifiers Synthesis for Generalized Few-Shot Learning

Development and Validation of an Unsupervised Feature Learning System for Leukocyte Characterization and Classification: A Multi-Hospital Study

Vote-Based 3D Object Detection with Context Modeling and SOB-3DNMS

Visual Structure Constraint for Transductive Zero-Shot Learning in the Wild

Polysemy Deciphering Network for Robust Human–Object Interaction Detection

Premium Partner