Top

International Journal of Computer Vision

Published in:

09-07-2023

Attribute-Image Person Re-identification via Modal-Consistent Metric Learning

Authors: Jianqing Zhu, Liu Liu, Yibing Zhan, Xiaobin Zhu, Huanqiang Zeng, Dacheng Tao

Published in: International Journal of Computer Vision | Issue 11/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Attribute-image person re-identification (AIPR) is a cross-modal retrieval task that searches person images who meet a list of attributes. Due to large modal gaps between attributes and images, current AIPR methods generally depend on cross-modal feature alignment, but they do not pay enough attention to similarity metric jitters among varying modal configurations (i.e., attribute probe vs. image gallery, image probe vs. attribute gallery, image probe vs. image gallery, and attribute probe vs. attribute gallery). In this paper, we propose a modal-consistent metric learning (MCML) method that stably measures comprehensive similarities between attributes and images. Our MCML is with favorable properties that differ in two significant ways from previous methods. First, MCML provides a complete multi-modal triplet (CMMT) loss function that pulls the distance between the farthest positive pair as close as possible while pushing the distance between the nearest negative pair as far as possible, independent of their modalities. Second, MCML develops a modal-consistent matching regularization (MCMR) to reduce the diversity of matching matrices and guide consistent matching behaviors on varying modal configurations. Therefore, our MCML integrates the CMMT loss function and MCMR, requiring no complex cross-modal feature alignments. Theoretically, we offer the generalization bound to establish the stability of our MCML model by applying on-average stability. Experimentally, extensive results on PETA and Market-1501 datasets show that the proposed MCML is superior to the state-of-the-art approaches.

previous article Deep Unfolding for Snapshot Compressive Imaging

next article A Dynamic Feature Interaction Framework for Multi-task Visual Perception

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

We use \(i \in [ m ]\) to denote that i is generated from \([ m ] = \{ {1,2,...,m} \}\). The same definition is also applied to \(l_i\!\in \! [c].\)

The single-modal HMT loss function means only images are applied to the HMT loss function, while the cross-modal HMT loss function means both images and attributes are applied to the HMT loss function.

Andrew, G., Arora, R., Bilmes, J., & Livescu, K. (2013). Deep canonical correlation analysis. In ICML (pp. 1247–1255).

Bousquet, O., Klochkov, Y., & Zhivotovskiy, N. (2020). Sharper bounds for uniformly stable algorithms. In PMLR conference on learning theory (pp. 610–626).

Cao, Y. T., Wang, J., & Tao, D. (2020). Symbiotic adversarial learning for attribute-based person search. In ECCV.

Deng, Y., Luo, P., Loy, C. C., & Tang, X. (2014). Pedestrian attribute recognition at far distance. In ACMMM (pp. 789–792).

Dong, Q., Gong, S., & Zhu, X. (2019). Person search by text attribute query as zero-shot learning. In CVPR (pp. 3652–3661).

Eisenschtat, A., & Wolf, L. (2017). Linking image and text with 2-way nets. In CVPR (pp. 4601–4611).

Feldman, V., & Vondrak, J. (2018). Generalization bounds for uniformly stable algorithms. In NeurIPS (pp. 9770–9780).

Feldman, V., & Vondrak, J. (2019). High probability generalization bounds for uniformly stable algorithms with nearly optimal rate. In PMLR conference on learning theory (pp. 1270–1279).

Felix, R., Kumar, V. B., Reid, I., & Carneiro, G. (2018). Multi-modal cycle-consistent generalized zero-shot learning. In ECCV (pp. 21–37).

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).

He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In CVPR (pp. 9729–9738).

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).

Hubert Tsai, Y. H., Huang, L. K., & Salakhutdinov, R. (2017). Learning robust visual-semantic embeddings. In ICCV (pp. 3571–3580).

Iodice, S., & Mikolajczyk, K. (2020). Text attribute aggregation and visual feature decomposition for person search. In BMVC (2020).

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML (pp. 448–456).

Jeong, B., Park, J., & Kwak, S. (2021). Asmr: Learning attribute-based person search with adaptive semantic margin regularizer. In ICCV (pp. 12016–12025).

Ji, Z., He, E., Wang, H., & Yang, A. (2019). Image-attribute reciprocally guided attention network for pedestrian attribute recognition. Pattern Recognition Letters, 120, 89–95.CrossRef

Ji, Z., Hu, Z., He, E., Han, J., & Pang, Y. (2020). Pedestrian attribute recognition based on multiple time steps attention. Pattern Recognition Letters, 138, 170–176.CrossRef

Ji, Z., Sun, Y., Yu, Y., Pang, Y., & Han, J. (2019). Attribute-guided network for cross-modal zero-shot hashing. IEEE Transactions on Neural Networks and Learning Systems, 31(1), 321–330.CrossRef

Layne, R., Hospedales, T.M., & Gong, S. (2012a). Towards person identification and re-identification with attributes. In ECCV (pp. 402–412).

Layne, R., Hospedales, T. M., Gong, S., & Mary, Q. (2012b). Person re-identification by attributes. In BMVC (p. 8).

Lei, Y., Ledent, A., & Kloft, M. (2020). Sharper generalization bounds for pairwise learning. NeurIPS 33.

Li, D., Chen, X., & Huang, K. (2015a). Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In ACPR (pp. 111–115).

Li, D., Chen, X., & Huang, K. (2015b). Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In ACPR (pp. 111–115). IEEE.

Li, S., Xiao, T., Li, H., Yang, W., & Wang, X. (2017). Identity-aware textual-visual matching with latent co-attention. In ICCV (pp. 1890–1899).

Li, W., Zhu, X., & Gong, S. (2020). Scalable person re-identification by harmonious attention. International Journal of Computer Vision, 128(6), 1635–1653.CrossRef

Li, Z., Min, W., Song, J., Zhu, Y., Kang, L., Wei, X., Wei, X., & Jiang, S. (2022). Rethinking the optimization of average precision: Only penalizing negative instances before positive ones is enough. In AAAI (Vol. 36, pp. 1518–1526).

Lin, X., Ren, P., Xiao, Y., Chang, X., & Hauptmann, A. (2021). Person search challenges and solutions: A survey.

Lin, Y., Zheng, L., Zheng, Z., Wu, Y., Hu, Z., Yan, C., & Yang, Y. (2019). Improving person re-identification by attribute and identity learning. Pattern Recognition, 95, 151–161.CrossRef

Liu, L., Zhang, H., Xu, X., Zhang, Z., & Yan, S. (2019). Collocating clothes with generative adversarial networks cosupervised by categories and attributes: A multidiscriminator framework. IEEE Transactions on Neural Networks and Learning Systems, 31(9), 3540–3554.MathSciNetCrossRef

Liu, P., Liu, X., Yan, J., & Shao, J. (2018). Localization guided learning for pedestrian attribute recognition. In BMVC.

Liu, X., Zhao, H., Tian, M., Sheng, L., Shao, J., Yi, S., Yan, J., & Wang, X. (2017). Hydraplus-net: Attentive deep features for pedestrian analysis. In ICCV (pp. 350–359).

Luo, H., Jiang, W., Gu, Y., Liu, F., Liao, X., Lai, S., & Gu, J. (2019). A strong baseline and batch normalization neck for deep person re-identification. IEEE Transactions on Multimedia.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS (pp. 8026–8037).

Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In CVPR (pp. 815–823).

Schumann, A., & Stiefelhagen, R. (2017). Person re-identification by deep learning attribute-complementary information. In CVPR Workshop (pp. 20–28).

Su, C., Zhang, S., Xing, J., Gao, W., & Tian, Q. (2016). Deep attributes driven multi-camera person re-identification. In ECCV (pp. 475–491).

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR (pp. 1–9).

Tan, Z., Yang, Y., Wan, J., Guo, G., & Li, S. Z. (2020). Relation-aware pedestrian attribute recognition with graph convolutional networks. In AAAI (pp. 12055–12062).

Tan, Z., Yang, Y., Wan, J., Hang, H., Guo, G., & Li, S. Z. (2019). Attention-based pedestrian attribute analysis. Transactions on Image Processing, 28(12), 6126–6140.MathSciNetCrossRefMATH

Vaquero, D. A., Feris, R. S., Tran, D., Brown, L., Hampapur, A., & Turk, M. (2009). Attribute-based people search in surveillance environments. In Workshop on applications of computer vision (pp. 1–8).

Wang, B., Yang, Y., Xu, X., Hanjalic, A., & Shen, H. (2017). Adversarial cross-modal retrieval. In ACM MM (pp. 154–162).

Wang, J., Zhu, X., Gong, S., & Li, W. (2018). Transferable joint attribute-identity deep learning for unsupervised person re-identification. In CVPR (pp. 2275–2284).

Wang, W., Arora, R., Livescu, K., & Bilmes, J. (2015). On deep multi-view representation learning. In ICML (pp. 1083–1092).

Wang, X., Han, X., Huang, W., Dong, D., & Scott, M. R. (2019). Multi-similarity loss with general pair weighting for deep metric learning. In CVPR (pp. 5022–5030).

Wu, M., Huang, D., Guo, Y., & Wang, Y. (2019). Distraction-aware feature learning for human attribute recognition via coarse-to-fine attention mechanism. In AAAI.

Xu, B., Wang, N., Chen, T., & Li, M. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853.

Yang, Y., Tan, Z., Tiwari, P., Pandey, H. M., Wan, J., Lei, Z., Guo, G., & Li, S. Z. (2021). Cascaded split-and-aggregate learning with feature recombination for pedestrian attribute recognition. International Journal of Computer Vision (pp. 1–14).

Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., & Hoi, S.C. (2021). Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence (pp. 1–1).

Yin, J., Wu, A., & Zheng, W. S. (2020). Fine-grained person re-identification. International Journal of Computer Vision, 128(6), 1654–1672.CrossRef

Yin, Z., Zheng, W. S., Wu, A., Yu, H. X., Wan, H., Guo, X., Huang, F., & Lai, J. (2018). Adversarial attribute-image person re-identification. In IJCAI (pp. 1100–1106).

Yu, K., Leng, B., Zhang, Z., Li, D., & Huang, K. (2017). Weakly-supervised learning of mid-level features for pedestrian attribute recognition and localization. In ECCV.

Zeng, H., Ai, H., Zhuang, Z., & Chen, L. (2020). Multi-task learning via co-attentive sharing for pedestrian attribute recognition. In ICME (pp. 1–6).

Zhan, Y., Yu, J., Yu, T., & Tao, D. (2019). On exploring undetermined relationships for visual relationship detection. In CVPR (pp. 5128–5137).

Zhan, Y., Yu, J., Yu, T., & Tao, D. (2020). Multi-task compositional network for visual relationship detection. International Journal of Computer Vision, 128(8), 2146–2165.CrossRef

Zhan, Y., Yu, J., Yu, Z., Zhang, R., Tao, D., & Tian, Q. (2018). Comprehensive distance-preserving autoencoders for cross-modal retrieval. In ACM international conference on multimedia (pp. 1137–1145).

Zhang, J., Chen, Z., & Tao, D. (2021). Towards high performance human keypoint detection. International Journal of Computer Vision, 129(9), 2639–2662.CrossRef

Zhang, S., Song, Z., Cao, X., Zhang, H., & Zhou, J. (2019). Task-aware attention model for clothing attribute prediction. IEEE Transactions on Circuits and Systems for Video, 30(4), 1051–1064.CrossRef

Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV (pp. 1116–1124).

Zhu, J., Liao, S., Lei, Z., & Li, S. Z. (2017). Multi-label convolutional neural network based pedestrian attribute classification. Image and Vision Computing, 58, 224–229.CrossRef

Zhu, J., Liao, S., Yi, D., Lei, Z., & Li, S.Z. (2015). Multi-label cnn based pedestrian attribute learning for soft biometrics. In ICB (pp. 535–540).

Zhu, J., Zeng, H., Huang, J., Zhu, X., Lei, Z., Cai, C., & Zheng, L. (2019). Body symmetry and part-locality-guided direct nonparametric deep feature enhancement for person reidentification. IEEE Internet of Things Journal, 7(3), 2053–2065.CrossRef

Zhu, J., Zeng, H., Liao, S., Lei, Z., Cai, C., & Zheng, L. (2017). Deep hybrid similarity learning for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology, 28(11), 3183–3193.CrossRef

Title: Attribute-Image Person Re-identification via Modal-Consistent Metric Learning
Authors: Jianqing Zhu
Liu Liu
Yibing Zhan
Xiaobin Zhu
Huanqiang Zeng
Dacheng Tao
Publication date: 09-07-2023
Publisher: Springer US
Published in: International Journal of Computer Vision / Issue 11/2023
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-023-01841-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 11/2023

Improving Domain Adaptation Through Class Aware Frequency Transformation

GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence

Deep Corner

Camouflaged Object Segmentation with Omni Perception

Poincaré Kernels for Hyperbolic Representations

HiEve: A Large-Scale Benchmark for Human-Centric Video Analysis in Complex Events

Premium Partner