Skip to main content
Erschienen in:
Buchtitelbild

2019 | OriginalPaper | Buchkapitel

Video-Based Convolutional Attention for Person Re-Identification

verfasst von : Marco Zamprogno, Marco Passon, Niki Martinel, Giuseppe Serra, Giuseppe Lancioni, Christian Micheloni, Carlo Tasso, Gian Luca Foresti

Erschienen in: Image Analysis and Processing – ICIAP 2019

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we consider the problem of video-based person re-identification, which is the task of associating videos of the same person captured by different and non-overlapping cameras. We propose a Siamese framework in which video frames of the person to re-identify and of the candidate one are processed by two identical networks which produce a similarity score. We introduce an attention mechanisms to capture the relevant information both at frame level (spatial information) and at video level (temporal information given by the importance of a specific frame within the sequence). One of the novelties of our approach is given by a joint concurrent processing of both frame and video levels, providing in such a way a very simple architecture. Despite this fact, out approach achieves better performance than the state-of-the-art on the challenging iLIDS-VID dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Martinel, N., Micheloni, C., Piciarelli, C.: Distributed signature fusion for person re-identification. In: International Conference on Distributed Smart Cameras, pp. 1–6 (2012) Martinel, N., Micheloni, C., Piciarelli, C.: Distributed signature fusion for person re-identification. In: International Conference on Distributed Smart Cameras, pp. 1–6 (2012)
2.
Zurück zum Zitat Martinel, N., Dunnhofer, M., Foresti, G.L., Micheloni, C.: Person re-identification via unsupervised transfer of learned visual representations. In: International Conference on Distributed Smart Cameras, pp. 1–6 (2017) Martinel, N., Dunnhofer, M., Foresti, G.L., Micheloni, C.: Person re-identification via unsupervised transfer of learned visual representations. In: International Conference on Distributed Smart Cameras, pp. 1–6 (2017)
3.
Zurück zum Zitat Lisanti, G., Martinel, N., Del Bimbo, A., Foresti, G.L.: Group re-identification via unsupervised transfer of sparse features encoding. In: International Conference on Computer Vision, pp. 2449–2458 (2017) Lisanti, G., Martinel, N., Del Bimbo, A., Foresti, G.L.: Group re-identification via unsupervised transfer of sparse features encoding. In: International Conference on Computer Vision, pp. 2449–2458 (2017)
4.
Zurück zum Zitat Martinel, N., Foresti, G.L., Micheloni, C.: Unsupervised hashing with neural trees for image retrieval and person re-identification. In: International Conference on Distributed Smart Cameras (2018) Martinel, N., Foresti, G.L., Micheloni, C.: Unsupervised hashing with neural trees for image retrieval and person re-identification. In: International Conference on Distributed Smart Cameras (2018)
5.
Zurück zum Zitat Martinel, N.: Accelerated low-rank sparse metric learning for person re-identification. Pattern Recogn. Lett. 112, 234–240 (2018)CrossRef Martinel, N.: Accelerated low-rank sparse metric learning for person re-identification. Pattern Recogn. Lett. 112, 234–240 (2018)CrossRef
6.
Zurück zum Zitat Lisanti, G., Martinel, N., Micheloni, C., Del Bimbo, A., Luca Foresti, G.: From person to group re-identification via unsupervised transfer of sparse features. Image Vis. Comput. 83(84), 29–38 (2019)CrossRef Lisanti, G., Martinel, N., Micheloni, C., Del Bimbo, A., Luca Foresti, G.: From person to group re-identification via unsupervised transfer of sparse features. Image Vis. Comput. 83(84), 29–38 (2019)CrossRef
7.
Zurück zum Zitat Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: Past, present and future, CoRR, vol. abs/1610.02984 (2016) Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: Past, present and future, CoRR, vol. abs/1610.02984 (2016)
8.
Zurück zum Zitat McLaughlin, N., Martinez del Rincon, J., Miller, P.: Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1325–1334 (2016) McLaughlin, N., Martinez del Rincon, J., Miller, P.: Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1325–1334 (2016)
9.
Zurück zum Zitat Passon, M., Comuzzo, M., Serra, G., Tasso, C.: Keyphrase extraction via an attentive model. In: Italian Research Conference on Digital Libraries (2019) Passon, M., Comuzzo, M., Serra, G., Tasso, C.: Keyphrase extraction via an attentive model. In: Italian Research Conference on Digital Libraries (2019)
10.
Zurück zum Zitat Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Thirty-First AAAI Conference on Artificial Intelligence (2017) Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
11.
Zurück zum Zitat Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)MathSciNetCrossRef Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)MathSciNetCrossRef
12.
Zurück zum Zitat Liu, H., Feng, J., Qi, M., Jiang, J., Yan, S.: End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 26(7), 3492–3506 (2017)MathSciNetCrossRef Liu, H., Feng, J., Qi, M., Jiang, J., Yan, S.: End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 26(7), 3492–3506 (2017)MathSciNetCrossRef
13.
Zurück zum Zitat Xu, S., Cheng, Y., Gu, K., Yang, Y., Chang, S., Zhou, P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4733–4742 (2017) Xu, S., Cheng, Y., Gu, K., Yang, Y., Chang, S., Zhou, P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4733–4742 (2017)
14.
Zurück zum Zitat Vezzani, R., Baltieri, D., Cucchiara, R.: People reidentification in surveillance and forensics: a survey. ACM Comput. Surv. 46, 29:1–29:37 (2013)CrossRef Vezzani, R., Baltieri, D., Cucchiara, R.: People reidentification in surveillance and forensics: a survey. ACM Comput. Surv. 46, 29:1–29:37 (2013)CrossRef
15.
Zurück zum Zitat Gheissari, N., Sebastian, T.B., Hartley, R.: Person reidentification using spatiotemporal appearance. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1528–1535 (2006) Gheissari, N., Sebastian, T.B., Hartley, R.: Person reidentification using spatiotemporal appearance. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1528–1535 (2006)
16.
17.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
18.
Zurück zum Zitat Rani, A., Foresti, G.L., Micheloni, C.: A neural tree for classification using convex objective function. Pattern Recogn. Lett. 68, 41–47 (2015)CrossRef Rani, A., Foresti, G.L., Micheloni, C.: A neural tree for classification using convex objective function. Pattern Recogn. Lett. 68, 41–47 (2015)CrossRef
19.
Zurück zum Zitat Qian, X., Fu, Y., Jiang, Y.-G., Xiang, T., Xue, X.: Multi-scale deep learning architectures for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5399–5408 (2017) Qian, X., Fu, Y., Jiang, Y.-G., Xiang, T., Xue, X.: Multi-scale deep learning architectures for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5399–5408 (2017)
20.
Zurück zum Zitat Ustinova, E., Ganin, Y., Lempitsky, V.: Multi-region bilinear convolutional neural networks for person re-identification. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–6 (2017) Ustinova, E., Ganin, Y., Lempitsky, V.: Multi-region bilinear convolutional neural networks for person re-identification. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–6 (2017)
21.
Zurück zum Zitat Varior, R.R., Shuai, B., Lu, J., Xu, D., Wang, G.: A siamese long short-term memory architecture for human re-identification. In: European Conference on Computer Vision, pp. 135–153 (2016)CrossRef Varior, R.R., Shuai, B., Lu, J., Xu, D., Wang, G.: A siamese long short-term memory architecture for human re-identification. In: European Conference on Computer Vision, pp. 135–153 (2016)CrossRef
22.
Zurück zum Zitat Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1249–1258 (2016) Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1249–1258 (2016)
23.
Zurück zum Zitat Zhang, L., Xiang, T., Gong, S.: Learning a discriminative null space for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1239–1248 (2016) Zhang, L., Xiang, T., Gong, S.: Learning a discriminative null space for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1239–1248 (2016)
24.
Zurück zum Zitat Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1735–1742 (2006) Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1735–1742 (2006)
25.
Zurück zum Zitat Li, Y., Zhuo, L., Li, J., Zhang, J., Liang, X., Tian, Q.: Video-based person re-identification by deep feature guided pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 39–46 (2017) Li, Y., Zhuo, L., Li, J., Zhang, J., Liang, X., Tian, Q.: Video-based person re-identification by deep feature guided pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 39–46 (2017)
26.
Zurück zum Zitat Liu, K., Ma, B., Zhang, W., Huang, R.: A spatio-temporal appearance representation for video-based pedestrian re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3810–3818 (2015) Liu, K., Ma, B., Zhang, W., Huang, R.: A spatio-temporal appearance representation for video-based pedestrian re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3810–3818 (2015)
28.
Zurück zum Zitat Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., Yang, X.: Person re-identification via recurrent feature aggregation. In: ECCV, pp. 701–716 (2016)CrossRef Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., Yang, X.: Person re-identification via recurrent feature aggregation. In: ECCV, pp. 701–716 (2016)CrossRef
29.
Zurück zum Zitat Zhou, Z., Huang, Y., Wang, W., Wang, L., Tan, T.: See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: CVPR, pp. 4747–4756 (2017) Zhou, Z., Huang, Y., Wang, W., Wang, L., Tan, T.: See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: CVPR, pp. 4747–4756 (2017)
30.
Zurück zum Zitat Zhu, X., Jing, X.-Y., You, X., Zhang, X., Zhang, T.: Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. IEEE TIP 27(11), 5683–5695 (2018)MathSciNetMATH Zhu, X., Jing, X.-Y., You, X., Zhang, X., Zhang, T.: Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. IEEE TIP 27(11), 5683–5695 (2018)MathSciNetMATH
31.
Zurück zum Zitat Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2014) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:​1409.​0473 (2014)
32.
33.
34.
Zurück zum Zitat Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015) Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
35.
Zurück zum Zitat Rao, S., Rahman, T., Rochan, M., Wang, Y.: Video-based person re-identification using spatial-temporal attention networks (2018) Rao, S., Rahman, T., Rochan, M., Wang, Y.: Video-based person re-identification using spatial-temporal attention networks (2018)
36.
Zurück zum Zitat Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI, pp. 674–679 (1981) Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI, pp. 674–679 (1981)
37.
Zurück zum Zitat Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition, CoRR, vol. abs/1704.03373 (2017) Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition, CoRR, vol. abs/1704.03373 (2017)
38.
Zurück zum Zitat Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: ICPR, pp. 34–39 (2014) Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: ICPR, pp. 34–39 (2014)
39.
Zurück zum Zitat Zhang, W., Hu, S., Liu, K.: Learning compact appearance representation for video-based person re-identification. IEEE Trans. Circ. Syst. Video Technol. (2017). abs/1702.06294 Zhang, W., Hu, S., Liu, K.: Learning compact appearance representation for video-based person re-identification. IEEE Trans. Circ. Syst. Video Technol. (2017). abs/1702.06294
Metadaten
Titel
Video-Based Convolutional Attention for Person Re-Identification
verfasst von
Marco Zamprogno
Marco Passon
Niki Martinel
Giuseppe Serra
Giuseppe Lancioni
Christian Micheloni
Carlo Tasso
Gian Luca Foresti
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-30642-7_1

Premium Partner