research-article

A Discriminatively Learned CNN Embedding for Person Reidentification

Authors:
Zhedong Zheng

University of Technology Sydney, Australia

University of Technology Sydney, Australia
View Profile

,
Liang Zheng

University of Technology Sydney, Australia

University of Technology Sydney, Australia
View Profile

,
Yi Yang

University of Technology Sydney and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Beijing, China

University of Technology Sydney and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Beijing, China
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 14 Issue 1Article No.: 13pp 1–20https://doi.org/10.1145/3159171

Published:13 December 2017Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

In this article, we revisit two popular convolutional neural networks in person re-identification (re-ID): verification and identification models. The two models have their respective advantages and limitations due to different loss functions. Here, we shed light on how to combine the two models to learn more discriminative pedestrian descriptors. Specifically, we propose a Siamese network that simultaneously computes the identification loss and verification loss. Given a pair of training images, the network predicts the identities of the two input images and whether they belong to the same identity. Our network learns a discriminative embedding and a similarity measurement at the same time, thus taking full usage of the re-ID annotations. Our method can be easily applied on different pretrained networks. Albeit simple, the learned embedding improves the state-of-the-art performance on two public person re-ID benchmarks. Further, we show that our architecture can also be applied to image retrieval. The code is available at https://github.com/layumi/2016_person_re-ID.

References

Ejaz Ahmed, Michael Jones, and Tim K. Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3908--3916.Google Scholar
Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In Proceedings of the European Conference on Computer Vision. 584--599.Google ScholarCross Ref
Jane Bromley, James W. Bentz, Léon Bottou, Isabelle Guyon, Yann LeCun, Cliff Moore, Eduard Säckinger, and Roopak Shah. 1993. Signature verification using a Siamese time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence 7, 04, 669--688.Google ScholarCross Ref
Xiaojun Chang and Yi Yang. 2017. Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems 28, 10, 2294--2305.Google ScholarCross Ref
Xiaojun Chang, Yao-Liang Yu, Yi Yang, and Eric P. Xing. 2017. Semantic pooling for complex event analysis in untrimmed videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 8, 1617--1632.Google ScholarDigital Library
Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1268--1277.Google ScholarCross Ref
De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1335--1344.Google ScholarCross Ref
Cheng Deng, Xu Tang, Junchi Yan, Wei Liu, and Xinbo Gao. 2016. Discriminative dictionary learning with common label alignment for cross-modal retrieval. IEEE Transactions on Multimedia 18, 2, 208--218.Google ScholarCross Ref
Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao. 2015. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition 48, 10, 2993--3003. Google ScholarDigital Library
Mengyue Geng, Yaowei Wang, Tao Xiang, and Yonghong Tian. 2016. Deep transfer learning for person re-identification. arXiv:1611.05244.Google Scholar
Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, Los Alamitos, CA, 1735--1742. Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
Michael E. Houle, Xiguo Ma, Vincent Oria, and Jichao Sun. 2017. Query expansion for content-based similarity search using local and global features. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3, 25. Google ScholarDigital Library
Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In Proceedings of the European Conference on Computer Vision. 685--701.Google ScholarCross Ref
Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2288--2295. Google ScholarDigital Library
Martin Köstinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2288--2295. Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105. Google ScholarDigital Library
Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 152--159. Google ScholarDigital Library
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2197--2206.Google Scholar
Giuseppe Lisanti, Svebor Karaman, and Iacopo Masi. 2017. Multichannel-kernel canonical correlation analysis for cross-view person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 2, 13. Google ScholarDigital Library
Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2016. End-to-end comparative attention networks for person re-identification. arXiv:1606.04404.Google Scholar
Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-margin softmax loss for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. 507--516. Google ScholarDigital Library
Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical Gaussian descriptor for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1363--1372.Google ScholarCross Ref
Alexis Mignon and Frédéric Jurie. 2012. PCCA: A new approach for distance learning from sparse pairwise constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2666--2672. Google ScholarDigital Library
Prabhu Natarajan, Pradeep K. Atrey, and Mohan Kankanhalli. 2015. Multi-camera coordination and control in surveillance systems: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 4, 57. Google ScholarDigital Library
Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep metric learning via lifted structured feature embedding. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4004--4012.Google ScholarCross Ref
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1--8.Google ScholarCross Ref
Filip Radenović, Hervé Jégou, and Ondrej Chum. 2015. Multiple measurements and joint dimensionality reduction for large scale image search with short vectors. In Proceedings of the International Conference on Multimedia Retrieval. ACM, New York, NY, 587--590. Google ScholarDigital Library
Filip Radenović, Giorgos Tolias, and Ondřej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. arXiv:1604.02426.Google Scholar
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211--252. Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google Scholar
Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1, 1929--1958. Google ScholarDigital Library
Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation by joint identification-verification. In Proceedings of the International Conference on Neural Information Processing Systems. 1988--1996. Google ScholarDigital Library
Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2015. Deeply learned face representations are sparse, selective, and robust. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2892--2900.Google ScholarCross Ref
Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879.Google Scholar
Evgeniya Ustinova, Yaroslav Ganin, and Victor Lempitsky. 2015. Multiregion bilinear convolutional neural networks for person re-identification. arXiv:1512.05300.Google Scholar
Laurens Van Der Maaten. 2014. Accelerating t-SNE using tree-based algorithms.Journal of Machine Learning Research 15, 1, 3221--3245. Google ScholarDigital Library
Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated Siamese convolutional neural network architecture for human re-identification. In Proceedings of the European Conference on Computer Vision. 791--808.Google ScholarCross Ref
Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, and Gang Wang. 2016. A Siamese long short-term memory architecture for human re-identification. In Proceedings of the European Conference on Computer Vision. 135--153.Google ScholarCross Ref
A. Vedaldi and K. Lenc. 2015. MatConvNet—convolutional neural networks for MATLAB. In Proceedings of the ACM International Conference on Multimedia. Google ScholarDigital Library
Zheng Wang, Ruimin Hu, Chao Liang, Yi Yu, Junjun Jiang, Mang Ye, Jun Chen, and Qingming Leng. 2016. Zero-shot person re-identification via cross-view consistency. IEEE Transactions on Multimedia 18, 2, 260--272.Google ScholarDigital Library
Lin Wu, Chunhua Shen, and Anton van den Hengel. 2016. Deep linear discriminant analysis on Fisher networks: A hybrid architecture for person re-identification. arXiv:1606.01595.Google Scholar
Lin Wu, Chunhua Shen, and Anton van den Hengel. 2016. PersonNet: Person re-identification with deep convolutional neural networks. arXiv:1601.07255.Google Scholar
Yan Yan, Feiping Nie, Wen Li, Chenqiang Gao, Yi Yang, and Dong Xu. 2016. Image classification by cross-media active learning with privileged information. IEEE Transactions on Multimedia 18, 12, 2494--2502. Google ScholarDigital Library
Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. 2017. Enhancing person re-identification in a self-trained subspace. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3, Article No. 27. Google ScholarDigital Library
Yi Yang, Dong Xu, Feiping Nie, Jiebo Luo, and Yueting Zhuang. 2009. Ranking with local regression and global alignment for cross media retrieval. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 175--184. Google ScholarDigital Library
Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep metric learning for person re-identification. In Proceedings of the Conference on Pattern Recognition. IEEE, Los Alamitos, CA, 34--39. Google ScholarDigital Library
Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1239--1248.Google ScholarCross Ref
Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1278--1287.Google ScholarCross Ref
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the International Conference on Computer Vision. 2528--2535. Google ScholarDigital Library
Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the European Conference on Computer Vision. 868--884.Google ScholarCross Ref
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the International Conference on Computer Vision. 1116--1124. Google ScholarDigital Library
Liang Zheng, Shengjin Wang, Ziqiong Liu, and Qi Tian. 2015. Fast image retrieval: Query pruning and early termination. IEEE Transactions on Multimedia 17, 5, 648--659.Google ScholarCross Ref
Liang Zheng, Shengjin Wang, and Qi Tian. 2014. Coupled binary embedding for large-scale image retrieval. IEEE Transactions on Image Processing 23, 8, 3368--3380.Google ScholarCross Ref
Liang Zheng, Shengjin Wang, Jingdong Wang, and Qi Tian. 2016. Accurate image search with multi-scale contextual evidences. International Journal of Computer Vision 120, 1, 1--13. Google ScholarDigital Library
Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv:1610.02984.Google Scholar
Liang Zheng, Yi Yang, and Qi Tian. 2017. SIFT meets CNN: A decade survey of instance retrieval. arXiv:1608.01807.Google Scholar
Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2013. Reidentification by relative distance comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 3, 653--668. Google ScholarDigital Library
Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision.Google ScholarCross Ref

Index Terms

A Discriminatively Learned CNN Embedding for Person Reidentification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

Deep feature embedding learning for person re-identification based on lifted structured loss

Person re-identification (re-id) aims at matching the same individual in videos captured by multiple cameras, and much progress has been made in recent years due to large scale pedestrian data sets and deep learning-based techniques. In this paper, we ...
Read More
Discriminatively Learned Hierarchical Rank Pooling Networks

Rank pooling is a temporal encoding method that summarizes the dynamics of a video sequence to a single vector which has shown good results in human action recognition in prior work. In this work, we present novel temporal encoding methods for action ...
Read More
Circle-Based Ratio Loss for Person Reidentification
Person reidentification (re-id) aims to recognize a specific pedestrian from uncrossed surveillance camera views. Most re-id methods perform the retrieval task by comparing the similarity of pedestrian features extracted from deep learning models. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 14, Issue 1
February 2018
287 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3173554
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 December 2017
- Revised: 1 October 2017
- Accepted: 1 October 2017
- Received: 1 July 2017
Published in tomm Volume 14, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Person reidentification
convolutional neural networks
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 475
  Total Citations
  View Citations
- 2,178
  Total Downloads
- Downloads (Last 12 months)174
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Discriminatively Learned CNN Embedding for Person Reidentification

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

Deep feature embedding learning for person re-identification based on lifted structured loss

Discriminatively Learned Hierarchical Rank Pooling Networks

Circle-Based Ratio Loss for Person Reidentification