skip to main content
research-article

A Discriminatively Learned CNN Embedding for Person Reidentification

Authors Info & Claims
Published:13 December 2017Publication History
Skip Abstract Section

Abstract

In this article, we revisit two popular convolutional neural networks in person re-identification (re-ID): verification and identification models. The two models have their respective advantages and limitations due to different loss functions. Here, we shed light on how to combine the two models to learn more discriminative pedestrian descriptors. Specifically, we propose a Siamese network that simultaneously computes the identification loss and verification loss. Given a pair of training images, the network predicts the identities of the two input images and whether they belong to the same identity. Our network learns a discriminative embedding and a similarity measurement at the same time, thus taking full usage of the re-ID annotations. Our method can be easily applied on different pretrained networks. Albeit simple, the learned embedding improves the state-of-the-art performance on two public person re-ID benchmarks. Further, we show that our architecture can also be applied to image retrieval. The code is available at https://github.com/layumi/2016_person_re-ID.

References

  1. Ejaz Ahmed, Michael Jones, and Tim K. Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3908--3916.Google ScholarGoogle Scholar
  2. Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In Proceedings of the European Conference on Computer Vision. 584--599.Google ScholarGoogle ScholarCross RefCross Ref
  3. Jane Bromley, James W. Bentz, Léon Bottou, Isabelle Guyon, Yann LeCun, Cliff Moore, Eduard Säckinger, and Roopak Shah. 1993. Signature verification using a Siamese time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence 7, 04, 669--688.Google ScholarGoogle ScholarCross RefCross Ref
  4. Xiaojun Chang and Yi Yang. 2017. Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems 28, 10, 2294--2305.Google ScholarGoogle ScholarCross RefCross Ref
  5. Xiaojun Chang, Yao-Liang Yu, Yi Yang, and Eric P. Xing. 2017. Semantic pooling for complex event analysis in untrimmed videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 8, 1617--1632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1268--1277.Google ScholarGoogle ScholarCross RefCross Ref
  7. De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1335--1344.Google ScholarGoogle ScholarCross RefCross Ref
  8. Cheng Deng, Xu Tang, Junchi Yan, Wei Liu, and Xinbo Gao. 2016. Discriminative dictionary learning with common label alignment for cross-modal retrieval. IEEE Transactions on Multimedia 18, 2, 208--218.Google ScholarGoogle ScholarCross RefCross Ref
  9. Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao. 2015. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition 48, 10, 2993--3003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mengyue Geng, Yaowei Wang, Tao Xiang, and Yonghong Tian. 2016. Deep transfer learning for person re-identification. arXiv:1611.05244.Google ScholarGoogle Scholar
  11. Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, Los Alamitos, CA, 1735--1742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  13. Michael E. Houle, Xiguo Ma, Vincent Oria, and Jichao Sun. 2017. Query expansion for content-based similarity search using local and global features. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3, 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In Proceedings of the European Conference on Computer Vision. 685--701.Google ScholarGoogle ScholarCross RefCross Ref
  15. Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2288--2295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Martin Köstinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2288--2295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 152--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2197--2206.Google ScholarGoogle Scholar
  20. Giuseppe Lisanti, Svebor Karaman, and Iacopo Masi. 2017. Multichannel-kernel canonical correlation analysis for cross-view person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 2, 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2016. End-to-end comparative attention networks for person re-identification. arXiv:1606.04404.Google ScholarGoogle Scholar
  22. Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-margin softmax loss for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. 507--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical Gaussian descriptor for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1363--1372.Google ScholarGoogle ScholarCross RefCross Ref
  24. Alexis Mignon and Frédéric Jurie. 2012. PCCA: A new approach for distance learning from sparse pairwise constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2666--2672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Prabhu Natarajan, Pradeep K. Atrey, and Mohan Kankanhalli. 2015. Multi-camera coordination and control in surveillance systems: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 4, 57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep metric learning via lifted structured feature embedding. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4004--4012.Google ScholarGoogle ScholarCross RefCross Ref
  27. James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  28. Filip Radenović, Hervé Jégou, and Ondrej Chum. 2015. Multiple measurements and joint dimensionality reduction for large scale image search with short vectors. In Proceedings of the International Conference on Multimedia Retrieval. ACM, New York, NY, 587--590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Filip Radenović, Giorgos Tolias, and Ondřej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. arXiv:1604.02426.Google ScholarGoogle Scholar
  30. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google ScholarGoogle Scholar
  32. Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1, 1929--1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation by joint identification-verification. In Proceedings of the International Conference on Neural Information Processing Systems. 1988--1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2015. Deeply learned face representations are sparse, selective, and robust. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2892--2900.Google ScholarGoogle ScholarCross RefCross Ref
  35. Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879.Google ScholarGoogle Scholar
  36. Evgeniya Ustinova, Yaroslav Ganin, and Victor Lempitsky. 2015. Multiregion bilinear convolutional neural networks for person re-identification. arXiv:1512.05300.Google ScholarGoogle Scholar
  37. Laurens Van Der Maaten. 2014. Accelerating t-SNE using tree-based algorithms.Journal of Machine Learning Research 15, 1, 3221--3245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated Siamese convolutional neural network architecture for human re-identification. In Proceedings of the European Conference on Computer Vision. 791--808.Google ScholarGoogle ScholarCross RefCross Ref
  39. Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, and Gang Wang. 2016. A Siamese long short-term memory architecture for human re-identification. In Proceedings of the European Conference on Computer Vision. 135--153.Google ScholarGoogle ScholarCross RefCross Ref
  40. A. Vedaldi and K. Lenc. 2015. MatConvNet—convolutional neural networks for MATLAB. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zheng Wang, Ruimin Hu, Chao Liang, Yi Yu, Junjun Jiang, Mang Ye, Jun Chen, and Qingming Leng. 2016. Zero-shot person re-identification via cross-view consistency. IEEE Transactions on Multimedia 18, 2, 260--272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Lin Wu, Chunhua Shen, and Anton van den Hengel. 2016. Deep linear discriminant analysis on Fisher networks: A hybrid architecture for person re-identification. arXiv:1606.01595.Google ScholarGoogle Scholar
  43. Lin Wu, Chunhua Shen, and Anton van den Hengel. 2016. PersonNet: Person re-identification with deep convolutional neural networks. arXiv:1601.07255.Google ScholarGoogle Scholar
  44. Yan Yan, Feiping Nie, Wen Li, Chenqiang Gao, Yi Yang, and Dong Xu. 2016. Image classification by cross-media active learning with privileged information. IEEE Transactions on Multimedia 18, 12, 2494--2502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. 2017. Enhancing person re-identification in a self-trained subspace. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3, Article No. 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yi Yang, Dong Xu, Feiping Nie, Jiebo Luo, and Yueting Zhuang. 2009. Ranking with local regression and global alignment for cross media retrieval. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 175--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep metric learning for person re-identification. In Proceedings of the Conference on Pattern Recognition. IEEE, Los Alamitos, CA, 34--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1239--1248.Google ScholarGoogle ScholarCross RefCross Ref
  49. Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1278--1287.Google ScholarGoogle ScholarCross RefCross Ref
  50. Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the International Conference on Computer Vision. 2528--2535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the European Conference on Computer Vision. 868--884.Google ScholarGoogle ScholarCross RefCross Ref
  52. Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the International Conference on Computer Vision. 1116--1124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Liang Zheng, Shengjin Wang, Ziqiong Liu, and Qi Tian. 2015. Fast image retrieval: Query pruning and early termination. IEEE Transactions on Multimedia 17, 5, 648--659.Google ScholarGoogle ScholarCross RefCross Ref
  54. Liang Zheng, Shengjin Wang, and Qi Tian. 2014. Coupled binary embedding for large-scale image retrieval. IEEE Transactions on Image Processing 23, 8, 3368--3380.Google ScholarGoogle ScholarCross RefCross Ref
  55. Liang Zheng, Shengjin Wang, Jingdong Wang, and Qi Tian. 2016. Accurate image search with multi-scale contextual evidences. International Journal of Computer Vision 120, 1, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv:1610.02984.Google ScholarGoogle Scholar
  57. Liang Zheng, Yi Yang, and Qi Tian. 2017. SIFT meets CNN: A decade survey of instance retrieval. arXiv:1608.01807.Google ScholarGoogle Scholar
  58. Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2013. Reidentification by relative distance comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 3, 653--668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Discriminatively Learned CNN Embedding for Person Reidentification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 14, Issue 1
        February 2018
        287 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3173554
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 December 2017
        • Revised: 1 October 2017
        • Accepted: 1 October 2017
        • Received: 1 July 2017
        Published in tomm Volume 14, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader