nach oben

International Journal of Computer Vision

Erschienen in:

28.11.2018

Learning Discriminative Aggregation Network for Video-Based Face Recognition and Person Re-identification

verfasst von: Yongming Rao, Jiwen Lu, Jie Zhou

Erschienen in: International Journal of Computer Vision | Ausgabe 6-7/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we propose a discriminative aggregation network method for video-based face recognition and person re-identification, which aims to integrate information from video frames for feature representation effectively and efficiently. Unlike existing video aggregation methods, our method aggregates raw video frames directly instead of the features obtained by complex processing. By combining the idea of metric learning and adversarial learning, we learn an aggregation network to generate more discriminative images compared to the raw input frames. Our framework reduces the number of image frames per video to be processed and significantly speeds up the recognition procedure. Furthermore, low-quality frames containing misleading information can be well filtered and denoised during the aggregation procedure, which makes our method more robust and discriminative. Experimental results on several widely used datasets show that our method can generate discriminative images from video clips and improve the overall recognition performance in both the speed and the accuracy for video-based face recognition and person re-identification.

Vorheriger Artikel Large-Scale Bisample Learning on ID Versus Spot Face Recognition

Nächster Artikel Detecting and Mitigating Adversarial Perturbations for Robust Face Recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Baltieri, D., Vezzani, R., & Cucchiara, R. (2011). 3dpes: 3d people dataset for surveillance and forensics. In Proceedings of the 2011 joint ACM workshop on human gesture and behavior understanding, ACM, pp. 59–64.

Beveridge, J. R., Phillips, P. J., Bolme, D. S., Draper, B. A., Givens, G. H., Lui, Y. M., et al. (2013). The challenge of face recognition from digital point-and-shoot cameras. In 2013 IEEE sixth international conference on BTAS, pp. 1–8.

Cao, Q., Shen, L., Xie, W., Parkhi, O. M., & Zisserman, A. (2018). Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face and gesture recognition (FG 2018), IEEE, pp. 67–74.

Cevikalp, H., & Triggs, B. (2010). Face recognition based on image sets. In 2010 IEEE conference on CVPR, pp. 2567–2573.

Chen, X., Duan, Y., Houthooft, R., Schulman, J,, Sutskever, I., & Abbeel, P. (2016b). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, pp. 2172–2180.

Chen, J. C., Patel, V. M., & Chellappa, R. (2016a). Unconstrained face verification using deep CNN features. In 2016 IEEE winter conference on applications of computer vision (WACV), pp. 1–9.

Chen, J. C., Ranjan, R., Kumar, A., Chen, C. H., Patel, V. M., & Chellappa, R. (2015). An end-to-end system for unconstrained face verification with deep convolutional neural networks. In Proceedings of the IEEE international conference on computer vision workshops, pp. 118–126.

Chen, Y. C., Patel, V. M., Phillips, P. J., & Chellappa, R. (2012). Dictionary-based face recognition from video, Springer, Berlin, pp. 766–779 .

Ding, C., & Tao, D. (2017). Trunk-branch ensemble convolutional neural networks for video-based face recognition. In PAMI.

Dong, C., Loy, C. C., He, K., & Tang, X. (2014). Learning a deep convolutional network for image super-resolution. In ECCV, Springer, pp. 184–199.

Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image super-resolution using deep convolutional networks. T-PAMI, 38(2), 295–307.CrossRef

Felzenszwalb, P. F., Girshick, R. B., & McAllester, D. (2010). Cascade object detection with deformable part models. In 2010 IEEE conference on CVPR, IEEE, pp. 2241–2248.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NIPS, pp. 2672–2680.

Gray, D., Brennan, S., & Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. In PETS, Citeseer, Vol. 3, pp. 1–7.

Guillaumin, M., Verbeek, J., & Schmid, C. (2009). Is that you? Metric learning approaches for face identification. In ICCV, pp. 498–505.

Hassner, T., Masi, I., Kim, J., Choi, J., Harel, S., Natarajan, P., et al. (2016). Pooling faces: Template based face recognition with pooled face images. In CVPRW, pp. 59–67.

Hayat, M., Bennamoun, M., & An, S. (2015). Deep reconstruction models for image set classification. PAMI, 37(4), 713–727.CrossRef

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, pp. 1026–1034.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.

Hermans, A., Beyer, L., Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737.

Hirzer, M., Beleznai, C., Roth, P. M., & Bischof, H. (2011). Person re-identification by descriptive and discriminative classification, Springer, Berlin, pp. 91–102.

Hu, J., Lu, J., & Tan, Y. P. (2014a). Discriminative deep metric learning for face verification in the wild. In CVPR, pp. 1875–1882.

Hu, J., Lu, J., Yuan, J., & Tan, Y. P. (2014b). Large margin multi-metric learning for face and kinship verification in the wild. In ACCV, pp. 252–267.

Hu, Y., Mian, A. S., & Owens, R. (2011). Sparse approximated nearest points for image set classification. In Computer vision and pattern recognition, pp. 121–128.

Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. IEEE Conference on Computer Vision and Pattern Recognition.

Huang, Z., & Van Gool, L. (2016). A riemannian network for SPD matrix learning. arXiv preprint arXiv:1608.04233.

Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report 07–49, University of Massachusetts, Amherst.

Huang, Z., Wang, R., Shan, S., & Chen, X. (2014). Learning euclidean-to-riemannian metric for point-to-set classification. In CVPR, pp. 1677–1684.

Huang, Z., Wang, R., Shan, S., Li, X., & Chen, X. (2015). Log-euclidean metric learning on symmetric positive definite manifold with application to image set classification. In ICML, pp. 720–729.

Huang, Z., Wu, J., & Van Gool, L. (2016). Building deep networks on grassmann manifolds. arXiv preprint arXiv:1611.05742.

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.

Ionescu, C., Vantzos, O., & Sminchisescu, C. (2015). Matrix backpropagation for deep networks with structured layers. In ICCV, pp. 2965–2973.

Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2016). Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004.

Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In NIPS, pp. 2017–2025.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM-MM, pp. 675–678.

Kawanishi, Y., Wu, Y., Mukunoki, M., & Minoh, M. (2014). Shinpuhkan2014: A multi-camera pedestrian dataset for tracking people across multiple cameras. In 20th Korea-Japan joint workshop on frontiers of computer vision (Vol. 5, p. 6).

Kim, M., Kumar, S., Pavlovic, V., & Rowley, H. (2008). Face tracking and recognition with visual constraints in real-world videos. In CVPR, pp. 1–8.

Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

Klare, B. F., Klein, B., Taborsky, E., Blanton, A., Cheney, J., Allen, K., et al. (2015). Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In CVPR, pp. 1931–1939.

Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015). Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300.

Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al. (2016). Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802.

Li, W., & Wang, X. (2013). Locally aligned feature transforms across views. In CVPR, pp. 3594–3601.

Li, H., Hua, G., Shen, X., Lin, Z., & Brandt, J. (2014a). Eigen-pep for video face recognition. In ACCV, pp. 17–33.

Li, W., Zhao, R., Xiao, T., & Wang, X. (2014b). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, pp. 152–159.

Lin, J., Ren, L., Lu, J., Feng, J., & Zhou, J. (2017). Consistent-aware deep learning for person re-identification in a camera network. In CVPR, pp. 5771–5780.

Liu, Y., Yan, J., & Ouyang, W. (2017). Quality aware network for set to set recognition. In CVPR, Vol. 2, p. 8.

Lu, J., Wang, G., & Moulin, P. (2013). Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. In ICCV, pp. 329–336.

Lu, J., Wang, G., Deng, W., Moulin, P., & Zhou, J. (2015). Multi-manifold deep metric learning for image set classification. In CVPR, pp. 1137–1145.

Lu, J., Wang, G., & Moulin, P. (2016). Localized multifeature metric learning for image-set-based face recognition. TCSVT, 26(3), 529–540.

Lvd, M., & Hinton, G. (2008). Visualizing data using t-sne. JMLR, 9(Nov), 2579–2605.MATH

Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In BMVC, Vol. 1, p. 6.

Paszke, A., Gross, S., Chintala, S., & Chanan, G. (2017). Pytorch: Tensors and dynamic neural networks in python with strong GPU acceleration.

Radford, A., Metz, L., Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

Rao, Y., Lin, J., Lu, J., & Zhou, J. (2017). Learning discriminative aggregation network for video-based face recognition. In ICCV, pp. 3781–3790.

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image synthesis. In ICML, Vol. 3.

Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In CVPR, pp. 815–823.

Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., et al. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, pp. 1874–1883.

Sohn, K., Liu, S., Zhong, G., Yu, X., Yang, M. H., & Chandraker, M. (2017). Unsupervised domain adaptation for face recognition in unlabeled videos. In CVPR, pp. 3210–3218.

Sun, Y., Wang, X., & Tang, X. (2015). Deeply learned face representations are sparse, selective, and robust. In CVPR, pp. 2892–2900.

Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Closing the gap to human-level performance in face verification. In CVPR, pp. 1701–1708.

Tesfaye, Y. T., Zemene, E., Prati, A., Pelillo, M., & Shah, M. (2017). Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196.

Tran, L., Yin, X., & Liu, X. (2017). Disentangled representation learning gan for pose-invariant face recognition. In CVPR, Vol. 3, p. 7.

Wang, R., & Chen, X. (2009). Manifold discriminant analysis. In CVPR, pp. 429–436.

Wang, R., Guo, H., Davis, L. S., & Dai, Q. (2012). Covariance discriminative learning: A natural and efficient approach to image set classification. In CVPR, pp. 2496–2503.

Wang, J., Lu, C., Wang, M., Li, P., Yan, S., & Hu, X. (2014). Robust face recognition via adaptive sparse representation. IEEE Transactions on Cybernetics, 44(12), 2368–2378.CrossRef

Wang, T., Gong, S., Zhu, X., & Wang, S. (2016). Person re-identification by discriminative selection in video ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(12), 2501–2514.CrossRef

Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In ECCV, pp. 499–515.

Whitelam, C., Taborsky, E., Blanton, A., Maze, B., Adams, J. C., Miller, T., et al. (2017). Iarpa janus benchmark-b face dataset. In Workshops on CVPR, pp. 592–600.

Wolf, L., Hassner, T., & Maoz, I. (2011). Face recognition in unconstrained videos with matched background similarity. In CVPR, pp. 529–534.

Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., & Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Transactions on Analysis and Machine Intelligence, 31(2), 210–227.CrossRef

Xiao, T., Li, H., Ouyang, W., & Wang, X. (2016). Learning deep feature representations with domain guided dropout for person re-identification. In CVPR, pp. 1249–1258.

Yang, J., Ren, P., Chen, D., Wen, F., Li, H., & Hua, G. (2016a). Neural aggregation network for video face recognition. arXiv preprint arXiv:1603.05474.

Yang, M., Wang, X., Liu, W., & Shen, L. (2016b). Joint regularized nearest points for image set based face recognition. In: IVC.

Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., & Metaxas, D. (2016a). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1612.03242.

Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016b). Joint face detection and alignment using multitask cascaded convolutional networks. SPL, 23(10), 1499–1503.

Zhang, W., Hu, S., & Liu, K. (2017). Learning compact appearance representation for video-based person re-identification. arXiv preprint arXiv:1702.06294.

Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., et al. (2016). Mars: A video benchmark for large-scale person re-identification. In ECCV, Springer, pp. 868–884.

Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV, pp. 1116–1124.

Zheng, W. S., Gong, S., & Xiang, T. (2009). Associating groups of people. In BMVC, Vol. 2.

Zhong, Z., Zheng, L., Cao, D., & Li, S. (2017). Re-ranking person re-identification with k-reciprocal encoding. arXiv preprint arXiv:1701.08398.

Zhou, Z., Huang, Y., Wang, W., Wang, L., & Tan, T. (2017). In CVPR, IEEE, pp. 6776–6785.

Titel: Learning Discriminative Aggregation Network for Video-Based Face Recognition and Person Re-identification
verfasst von: Yongming Rao
Jiwen Lu
Jie Zhou
Publikationsdatum: 28.11.2018
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 6-7/2019
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-018-1135-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 6-7/2019

Real-Time 3D Head Pose Tracking Through 2.5D Constrained Local Models with Local Neural Fields

The Menpo Benchmark for Multi-pose 2D and 3D Facial Landmark Localisation and Tracking

Learning from Longitudinal Face Demonstration—Where Tractable Deep Modeling Meets Inverse Reinforcement Learning

Single-Shot Scale-Aware Network for Real-Time Face Detection

Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks

Wavelet Domain Generative Adversarial Network for Multi-scale Face Hallucination

Premium Partner