Skip to main content
Erschienen in: International Journal of Computer Vision 6-7/2019

28.11.2018

Learning Discriminative Aggregation Network for Video-Based Face Recognition and Person Re-identification

verfasst von: Yongming Rao, Jiwen Lu, Jie Zhou

Erschienen in: International Journal of Computer Vision | Ausgabe 6-7/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we propose a discriminative aggregation network method for video-based face recognition and person re-identification, which aims to integrate information from video frames for feature representation effectively and efficiently. Unlike existing video aggregation methods, our method aggregates raw video frames directly instead of the features obtained by complex processing. By combining the idea of metric learning and adversarial learning, we learn an aggregation network to generate more discriminative images compared to the raw input frames. Our framework reduces the number of image frames per video to be processed and significantly speeds up the recognition procedure. Furthermore, low-quality frames containing misleading information can be well filtered and denoised during the aggregation procedure, which makes our method more robust and discriminative. Experimental results on several widely used datasets show that our method can generate discriminative images from video clips and improve the overall recognition performance in both the speed and the accuracy for video-based face recognition and person re-identification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Baltieri, D., Vezzani, R., & Cucchiara, R. (2011). 3dpes: 3d people dataset for surveillance and forensics. In Proceedings of the 2011 joint ACM workshop on human gesture and behavior understanding, ACM, pp. 59–64. Baltieri, D., Vezzani, R., & Cucchiara, R. (2011). 3dpes: 3d people dataset for surveillance and forensics. In Proceedings of the 2011 joint ACM workshop on human gesture and behavior understanding, ACM, pp. 59–64.
Zurück zum Zitat Beveridge, J. R., Phillips, P. J., Bolme, D. S., Draper, B. A., Givens, G. H., Lui, Y. M., et al. (2013). The challenge of face recognition from digital point-and-shoot cameras. In 2013 IEEE sixth international conference on BTAS, pp. 1–8. Beveridge, J. R., Phillips, P. J., Bolme, D. S., Draper, B. A., Givens, G. H., Lui, Y. M., et al. (2013). The challenge of face recognition from digital point-and-shoot cameras. In 2013 IEEE sixth international conference on BTAS, pp. 1–8.
Zurück zum Zitat Cao, Q., Shen, L., Xie, W., Parkhi, O. M., & Zisserman, A. (2018). Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face and gesture recognition (FG 2018), IEEE, pp. 67–74. Cao, Q., Shen, L., Xie, W., Parkhi, O. M., & Zisserman, A. (2018). Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face and gesture recognition (FG 2018), IEEE, pp. 67–74.
Zurück zum Zitat Cevikalp, H., & Triggs, B. (2010). Face recognition based on image sets. In 2010 IEEE conference on CVPR, pp. 2567–2573. Cevikalp, H., & Triggs, B. (2010). Face recognition based on image sets. In 2010 IEEE conference on CVPR, pp. 2567–2573.
Zurück zum Zitat Chen, X., Duan, Y., Houthooft, R., Schulman, J,, Sutskever, I., & Abbeel, P. (2016b). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, pp. 2172–2180. Chen, X., Duan, Y., Houthooft, R., Schulman, J,, Sutskever, I., & Abbeel, P. (2016b). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, pp. 2172–2180.
Zurück zum Zitat Chen, J. C., Patel, V. M., & Chellappa, R. (2016a). Unconstrained face verification using deep CNN features. In 2016 IEEE winter conference on applications of computer vision (WACV), pp. 1–9. Chen, J. C., Patel, V. M., & Chellappa, R. (2016a). Unconstrained face verification using deep CNN features. In 2016 IEEE winter conference on applications of computer vision (WACV), pp. 1–9.
Zurück zum Zitat Chen, J. C., Ranjan, R., Kumar, A., Chen, C. H., Patel, V. M., & Chellappa, R. (2015). An end-to-end system for unconstrained face verification with deep convolutional neural networks. In Proceedings of the IEEE international conference on computer vision workshops, pp. 118–126. Chen, J. C., Ranjan, R., Kumar, A., Chen, C. H., Patel, V. M., & Chellappa, R. (2015). An end-to-end system for unconstrained face verification with deep convolutional neural networks. In Proceedings of the IEEE international conference on computer vision workshops, pp. 118–126.
Zurück zum Zitat Chen, Y. C., Patel, V. M., Phillips, P. J., & Chellappa, R. (2012). Dictionary-based face recognition from video, Springer, Berlin, pp. 766–779 . Chen, Y. C., Patel, V. M., Phillips, P. J., & Chellappa, R. (2012). Dictionary-based face recognition from video, Springer, Berlin, pp. 766–779 .
Zurück zum Zitat Ding, C., & Tao, D. (2017). Trunk-branch ensemble convolutional neural networks for video-based face recognition. In PAMI. Ding, C., & Tao, D. (2017). Trunk-branch ensemble convolutional neural networks for video-based face recognition. In PAMI.
Zurück zum Zitat Dong, C., Loy, C. C., He, K., & Tang, X. (2014). Learning a deep convolutional network for image super-resolution. In ECCV, Springer, pp. 184–199. Dong, C., Loy, C. C., He, K., & Tang, X. (2014). Learning a deep convolutional network for image super-resolution. In ECCV, Springer, pp. 184–199.
Zurück zum Zitat Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image super-resolution using deep convolutional networks. T-PAMI, 38(2), 295–307.CrossRef Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image super-resolution using deep convolutional networks. T-PAMI, 38(2), 295–307.CrossRef
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., & McAllester, D. (2010). Cascade object detection with deformable part models. In 2010 IEEE conference on CVPR, IEEE, pp. 2241–2248. Felzenszwalb, P. F., Girshick, R. B., & McAllester, D. (2010). Cascade object detection with deformable part models. In 2010 IEEE conference on CVPR, IEEE, pp. 2241–2248.
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NIPS, pp. 2672–2680. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NIPS, pp. 2672–2680.
Zurück zum Zitat Gray, D., Brennan, S., & Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. In PETS, Citeseer, Vol. 3, pp. 1–7. Gray, D., Brennan, S., & Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. In PETS, Citeseer, Vol. 3, pp. 1–7.
Zurück zum Zitat Guillaumin, M., Verbeek, J., & Schmid, C. (2009). Is that you? Metric learning approaches for face identification. In ICCV, pp. 498–505. Guillaumin, M., Verbeek, J., & Schmid, C. (2009). Is that you? Metric learning approaches for face identification. In ICCV, pp. 498–505.
Zurück zum Zitat Hassner, T., Masi, I., Kim, J., Choi, J., Harel, S., Natarajan, P., et al. (2016). Pooling faces: Template based face recognition with pooled face images. In CVPRW, pp. 59–67. Hassner, T., Masi, I., Kim, J., Choi, J., Harel, S., Natarajan, P., et al. (2016). Pooling faces: Template based face recognition with pooled face images. In CVPRW, pp. 59–67.
Zurück zum Zitat Hayat, M., Bennamoun, M., & An, S. (2015). Deep reconstruction models for image set classification. PAMI, 37(4), 713–727.CrossRef Hayat, M., Bennamoun, M., & An, S. (2015). Deep reconstruction models for image set classification. PAMI, 37(4), 713–727.CrossRef
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, pp. 1026–1034. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, pp. 1026–1034.
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.
Zurück zum Zitat Hermans, A., Beyer, L., Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737. Hermans, A., Beyer, L., Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:​1703.​07737.
Zurück zum Zitat Hirzer, M., Beleznai, C., Roth, P. M., & Bischof, H. (2011). Person re-identification by descriptive and discriminative classification, Springer, Berlin, pp. 91–102. Hirzer, M., Beleznai, C., Roth, P. M., & Bischof, H. (2011). Person re-identification by descriptive and discriminative classification, Springer, Berlin, pp. 91–102.
Zurück zum Zitat Hu, J., Lu, J., & Tan, Y. P. (2014a). Discriminative deep metric learning for face verification in the wild. In CVPR, pp. 1875–1882. Hu, J., Lu, J., & Tan, Y. P. (2014a). Discriminative deep metric learning for face verification in the wild. In CVPR, pp. 1875–1882.
Zurück zum Zitat Hu, J., Lu, J., Yuan, J., & Tan, Y. P. (2014b). Large margin multi-metric learning for face and kinship verification in the wild. In ACCV, pp. 252–267. Hu, J., Lu, J., Yuan, J., & Tan, Y. P. (2014b). Large margin multi-metric learning for face and kinship verification in the wild. In ACCV, pp. 252–267.
Zurück zum Zitat Hu, Y., Mian, A. S., & Owens, R. (2011). Sparse approximated nearest points for image set classification. In Computer vision and pattern recognition, pp. 121–128. Hu, Y., Mian, A. S., & Owens, R. (2011). Sparse approximated nearest points for image set classification. In Computer vision and pattern recognition, pp. 121–128.
Zurück zum Zitat Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. IEEE Conference on Computer Vision and Pattern Recognition. Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report 07–49, University of Massachusetts, Amherst. Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report 07–49, University of Massachusetts, Amherst.
Zurück zum Zitat Huang, Z., Wang, R., Shan, S., & Chen, X. (2014). Learning euclidean-to-riemannian metric for point-to-set classification. In CVPR, pp. 1677–1684. Huang, Z., Wang, R., Shan, S., & Chen, X. (2014). Learning euclidean-to-riemannian metric for point-to-set classification. In CVPR, pp. 1677–1684.
Zurück zum Zitat Huang, Z., Wang, R., Shan, S., Li, X., & Chen, X. (2015). Log-euclidean metric learning on symmetric positive definite manifold with application to image set classification. In ICML, pp. 720–729. Huang, Z., Wang, R., Shan, S., Li, X., & Chen, X. (2015). Log-euclidean metric learning on symmetric positive definite manifold with application to image set classification. In ICML, pp. 720–729.
Zurück zum Zitat Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:​1502.​03167.
Zurück zum Zitat Ionescu, C., Vantzos, O., & Sminchisescu, C. (2015). Matrix backpropagation for deep networks with structured layers. In ICCV, pp. 2965–2973. Ionescu, C., Vantzos, O., & Sminchisescu, C. (2015). Matrix backpropagation for deep networks with structured layers. In ICCV, pp. 2965–2973.
Zurück zum Zitat Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2016). Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004. Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2016). Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:​1611.​07004.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In NIPS, pp. 2017–2025. Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In NIPS, pp. 2017–2025.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM-MM, pp. 675–678. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM-MM, pp. 675–678.
Zurück zum Zitat Kawanishi, Y., Wu, Y., Mukunoki, M., & Minoh, M. (2014). Shinpuhkan2014: A multi-camera pedestrian dataset for tracking people across multiple cameras. In 20th Korea-Japan joint workshop on frontiers of computer vision (Vol. 5, p. 6). Kawanishi, Y., Wu, Y., Mukunoki, M., & Minoh, M. (2014). Shinpuhkan2014: A multi-camera pedestrian dataset for tracking people across multiple cameras. In 20th Korea-Japan joint workshop on frontiers of computer vision (Vol. 5, p. 6).
Zurück zum Zitat Kim, M., Kumar, S., Pavlovic, V., & Rowley, H. (2008). Face tracking and recognition with visual constraints in real-world videos. In CVPR, pp. 1–8. Kim, M., Kumar, S., Pavlovic, V., & Rowley, H. (2008). Face tracking and recognition with visual constraints in real-world videos. In CVPR, pp. 1–8.
Zurück zum Zitat Klare, B. F., Klein, B., Taborsky, E., Blanton, A., Cheney, J., Allen, K., et al. (2015). Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In CVPR, pp. 1931–1939. Klare, B. F., Klein, B., Taborsky, E., Blanton, A., Cheney, J., Allen, K., et al. (2015). Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In CVPR, pp. 1931–1939.
Zurück zum Zitat Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015). Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300. Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015). Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:​1512.​09300.
Zurück zum Zitat Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al. (2016). Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al. (2016). Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:​1609.​04802.
Zurück zum Zitat Li, W., & Wang, X. (2013). Locally aligned feature transforms across views. In CVPR, pp. 3594–3601. Li, W., & Wang, X. (2013). Locally aligned feature transforms across views. In CVPR, pp. 3594–3601.
Zurück zum Zitat Li, H., Hua, G., Shen, X., Lin, Z., & Brandt, J. (2014a). Eigen-pep for video face recognition. In ACCV, pp. 17–33. Li, H., Hua, G., Shen, X., Lin, Z., & Brandt, J. (2014a). Eigen-pep for video face recognition. In ACCV, pp. 17–33.
Zurück zum Zitat Li, W., Zhao, R., Xiao, T., & Wang, X. (2014b). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, pp. 152–159. Li, W., Zhao, R., Xiao, T., & Wang, X. (2014b). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, pp. 152–159.
Zurück zum Zitat Lin, J., Ren, L., Lu, J., Feng, J., & Zhou, J. (2017). Consistent-aware deep learning for person re-identification in a camera network. In CVPR, pp. 5771–5780. Lin, J., Ren, L., Lu, J., Feng, J., & Zhou, J. (2017). Consistent-aware deep learning for person re-identification in a camera network. In CVPR, pp. 5771–5780.
Zurück zum Zitat Liu, Y., Yan, J., & Ouyang, W. (2017). Quality aware network for set to set recognition. In CVPR, Vol. 2, p. 8. Liu, Y., Yan, J., & Ouyang, W. (2017). Quality aware network for set to set recognition. In CVPR, Vol. 2, p. 8.
Zurück zum Zitat Lu, J., Wang, G., & Moulin, P. (2013). Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. In ICCV, pp. 329–336. Lu, J., Wang, G., & Moulin, P. (2013). Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. In ICCV, pp. 329–336.
Zurück zum Zitat Lu, J., Wang, G., Deng, W., Moulin, P., & Zhou, J. (2015). Multi-manifold deep metric learning for image set classification. In CVPR, pp. 1137–1145. Lu, J., Wang, G., Deng, W., Moulin, P., & Zhou, J. (2015). Multi-manifold deep metric learning for image set classification. In CVPR, pp. 1137–1145.
Zurück zum Zitat Lu, J., Wang, G., & Moulin, P. (2016). Localized multifeature metric learning for image-set-based face recognition. TCSVT, 26(3), 529–540. Lu, J., Wang, G., & Moulin, P. (2016). Localized multifeature metric learning for image-set-based face recognition. TCSVT, 26(3), 529–540.
Zurück zum Zitat Lvd, M., & Hinton, G. (2008). Visualizing data using t-sne. JMLR, 9(Nov), 2579–2605.MATH Lvd, M., & Hinton, G. (2008). Visualizing data using t-sne. JMLR, 9(Nov), 2579–2605.MATH
Zurück zum Zitat Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In BMVC, Vol. 1, p. 6. Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In BMVC, Vol. 1, p. 6.
Zurück zum Zitat Paszke, A., Gross, S., Chintala, S., & Chanan, G. (2017). Pytorch: Tensors and dynamic neural networks in python with strong GPU acceleration. Paszke, A., Gross, S., Chintala, S., & Chanan, G. (2017). Pytorch: Tensors and dynamic neural networks in python with strong GPU acceleration.
Zurück zum Zitat Radford, A., Metz, L., Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Radford, A., Metz, L., Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:​1511.​06434.
Zurück zum Zitat Rao, Y., Lin, J., Lu, J., & Zhou, J. (2017). Learning discriminative aggregation network for video-based face recognition. In ICCV, pp. 3781–3790. Rao, Y., Lin, J., Lu, J., & Zhou, J. (2017). Learning discriminative aggregation network for video-based face recognition. In ICCV, pp. 3781–3790.
Zurück zum Zitat Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image synthesis. In ICML, Vol. 3. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image synthesis. In ICML, Vol. 3.
Zurück zum Zitat Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In CVPR, pp. 815–823. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In CVPR, pp. 815–823.
Zurück zum Zitat Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., et al. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, pp. 1874–1883. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., et al. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, pp. 1874–1883.
Zurück zum Zitat Sohn, K., Liu, S., Zhong, G., Yu, X., Yang, M. H., & Chandraker, M. (2017). Unsupervised domain adaptation for face recognition in unlabeled videos. In CVPR, pp. 3210–3218. Sohn, K., Liu, S., Zhong, G., Yu, X., Yang, M. H., & Chandraker, M. (2017). Unsupervised domain adaptation for face recognition in unlabeled videos. In CVPR, pp. 3210–3218.
Zurück zum Zitat Sun, Y., Wang, X., & Tang, X. (2015). Deeply learned face representations are sparse, selective, and robust. In CVPR, pp. 2892–2900. Sun, Y., Wang, X., & Tang, X. (2015). Deeply learned face representations are sparse, selective, and robust. In CVPR, pp. 2892–2900.
Zurück zum Zitat Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Closing the gap to human-level performance in face verification. In CVPR, pp. 1701–1708. Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Closing the gap to human-level performance in face verification. In CVPR, pp. 1701–1708.
Zurück zum Zitat Tesfaye, Y. T., Zemene, E., Prati, A., Pelillo, M., & Shah, M. (2017). Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196. Tesfaye, Y. T., Zemene, E., Prati, A., Pelillo, M., & Shah, M. (2017). Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:​1706.​06196.
Zurück zum Zitat Tran, L., Yin, X., & Liu, X. (2017). Disentangled representation learning gan for pose-invariant face recognition. In CVPR, Vol. 3, p. 7. Tran, L., Yin, X., & Liu, X. (2017). Disentangled representation learning gan for pose-invariant face recognition. In CVPR, Vol. 3, p. 7.
Zurück zum Zitat Wang, R., & Chen, X. (2009). Manifold discriminant analysis. In CVPR, pp. 429–436. Wang, R., & Chen, X. (2009). Manifold discriminant analysis. In CVPR, pp. 429–436.
Zurück zum Zitat Wang, R., Guo, H., Davis, L. S., & Dai, Q. (2012). Covariance discriminative learning: A natural and efficient approach to image set classification. In CVPR, pp. 2496–2503. Wang, R., Guo, H., Davis, L. S., & Dai, Q. (2012). Covariance discriminative learning: A natural and efficient approach to image set classification. In CVPR, pp. 2496–2503.
Zurück zum Zitat Wang, J., Lu, C., Wang, M., Li, P., Yan, S., & Hu, X. (2014). Robust face recognition via adaptive sparse representation. IEEE Transactions on Cybernetics, 44(12), 2368–2378.CrossRef Wang, J., Lu, C., Wang, M., Li, P., Yan, S., & Hu, X. (2014). Robust face recognition via adaptive sparse representation. IEEE Transactions on Cybernetics, 44(12), 2368–2378.CrossRef
Zurück zum Zitat Wang, T., Gong, S., Zhu, X., & Wang, S. (2016). Person re-identification by discriminative selection in video ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(12), 2501–2514.CrossRef Wang, T., Gong, S., Zhu, X., & Wang, S. (2016). Person re-identification by discriminative selection in video ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(12), 2501–2514.CrossRef
Zurück zum Zitat Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In ECCV, pp. 499–515. Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In ECCV, pp. 499–515.
Zurück zum Zitat Whitelam, C., Taborsky, E., Blanton, A., Maze, B., Adams, J. C., Miller, T., et al. (2017). Iarpa janus benchmark-b face dataset. In Workshops on CVPR, pp. 592–600. Whitelam, C., Taborsky, E., Blanton, A., Maze, B., Adams, J. C., Miller, T., et al. (2017). Iarpa janus benchmark-b face dataset. In Workshops on CVPR, pp. 592–600.
Zurück zum Zitat Wolf, L., Hassner, T., & Maoz, I. (2011). Face recognition in unconstrained videos with matched background similarity. In CVPR, pp. 529–534. Wolf, L., Hassner, T., & Maoz, I. (2011). Face recognition in unconstrained videos with matched background similarity. In CVPR, pp. 529–534.
Zurück zum Zitat Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., & Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Transactions on Analysis and Machine Intelligence, 31(2), 210–227.CrossRef Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., & Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Transactions on Analysis and Machine Intelligence, 31(2), 210–227.CrossRef
Zurück zum Zitat Xiao, T., Li, H., Ouyang, W., & Wang, X. (2016). Learning deep feature representations with domain guided dropout for person re-identification. In CVPR, pp. 1249–1258. Xiao, T., Li, H., Ouyang, W., & Wang, X. (2016). Learning deep feature representations with domain guided dropout for person re-identification. In CVPR, pp. 1249–1258.
Zurück zum Zitat Yang, J., Ren, P., Chen, D., Wen, F., Li, H., & Hua, G. (2016a). Neural aggregation network for video face recognition. arXiv preprint arXiv:1603.05474. Yang, J., Ren, P., Chen, D., Wen, F., Li, H., & Hua, G. (2016a). Neural aggregation network for video face recognition. arXiv preprint arXiv:​1603.​05474.
Zurück zum Zitat Yang, M., Wang, X., Liu, W., & Shen, L. (2016b). Joint regularized nearest points for image set based face recognition. In: IVC. Yang, M., Wang, X., Liu, W., & Shen, L. (2016b). Joint regularized nearest points for image set based face recognition. In: IVC.
Zurück zum Zitat Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., & Metaxas, D. (2016a). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1612.03242. Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., & Metaxas, D. (2016a). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:​1612.​03242.
Zurück zum Zitat Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016b). Joint face detection and alignment using multitask cascaded convolutional networks. SPL, 23(10), 1499–1503. Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016b). Joint face detection and alignment using multitask cascaded convolutional networks. SPL, 23(10), 1499–1503.
Zurück zum Zitat Zhang, W., Hu, S., & Liu, K. (2017). Learning compact appearance representation for video-based person re-identification. arXiv preprint arXiv:1702.06294. Zhang, W., Hu, S., & Liu, K. (2017). Learning compact appearance representation for video-based person re-identification. arXiv preprint arXiv:​1702.​06294.
Zurück zum Zitat Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., et al. (2016). Mars: A video benchmark for large-scale person re-identification. In ECCV, Springer, pp. 868–884. Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., et al. (2016). Mars: A video benchmark for large-scale person re-identification. In ECCV, Springer, pp. 868–884.
Zurück zum Zitat Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV, pp. 1116–1124. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV, pp. 1116–1124.
Zurück zum Zitat Zheng, W. S., Gong, S., & Xiang, T. (2009). Associating groups of people. In BMVC, Vol. 2. Zheng, W. S., Gong, S., & Xiang, T. (2009). Associating groups of people. In BMVC, Vol. 2.
Zurück zum Zitat Zhong, Z., Zheng, L., Cao, D., & Li, S. (2017). Re-ranking person re-identification with k-reciprocal encoding. arXiv preprint arXiv:1701.08398. Zhong, Z., Zheng, L., Cao, D., & Li, S. (2017). Re-ranking person re-identification with k-reciprocal encoding. arXiv preprint arXiv:​1701.​08398.
Zurück zum Zitat Zhou, Z., Huang, Y., Wang, W., Wang, L., & Tan, T. (2017). In CVPR, IEEE, pp. 6776–6785. Zhou, Z., Huang, Y., Wang, W., Wang, L., & Tan, T. (2017). In CVPR, IEEE, pp. 6776–6785.
Metadaten
Titel
Learning Discriminative Aggregation Network for Video-Based Face Recognition and Person Re-identification
verfasst von
Yongming Rao
Jiwen Lu
Jie Zhou
Publikationsdatum
28.11.2018
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 6-7/2019
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-018-1135-x

Weitere Artikel der Ausgabe 6-7/2019

International Journal of Computer Vision 6-7/2019 Zur Ausgabe

Premium Partner