Abstract
In this article, we revisit two popular convolutional neural networks in person re-identification (re-ID): verification and identification models. The two models have their respective advantages and limitations due to different loss functions. Here, we shed light on how to combine the two models to learn more discriminative pedestrian descriptors. Specifically, we propose a Siamese network that simultaneously computes the identification loss and verification loss. Given a pair of training images, the network predicts the identities of the two input images and whether they belong to the same identity. Our network learns a discriminative embedding and a similarity measurement at the same time, thus taking full usage of the re-ID annotations. Our method can be easily applied on different pretrained networks. Albeit simple, the learned embedding improves the state-of-the-art performance on two public person re-ID benchmarks. Further, we show that our architecture can also be applied to image retrieval. The code is available at https://github.com/layumi/2016_person_re-ID.
- Ejaz Ahmed, Michael Jones, and Tim K. Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3908--3916.Google Scholar
- Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In Proceedings of the European Conference on Computer Vision. 584--599.Google ScholarCross Ref
- Jane Bromley, James W. Bentz, Léon Bottou, Isabelle Guyon, Yann LeCun, Cliff Moore, Eduard Säckinger, and Roopak Shah. 1993. Signature verification using a Siamese time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence 7, 04, 669--688.Google ScholarCross Ref
- Xiaojun Chang and Yi Yang. 2017. Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems 28, 10, 2294--2305.Google ScholarCross Ref
- Xiaojun Chang, Yao-Liang Yu, Yi Yang, and Eric P. Xing. 2017. Semantic pooling for complex event analysis in untrimmed videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 8, 1617--1632.Google ScholarDigital Library
- Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1268--1277.Google ScholarCross Ref
- De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1335--1344.Google ScholarCross Ref
- Cheng Deng, Xu Tang, Junchi Yan, Wei Liu, and Xinbo Gao. 2016. Discriminative dictionary learning with common label alignment for cross-modal retrieval. IEEE Transactions on Multimedia 18, 2, 208--218.Google ScholarCross Ref
- Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao. 2015. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition 48, 10, 2993--3003. Google ScholarDigital Library
- Mengyue Geng, Yaowei Wang, Tao Xiang, and Yonghong Tian. 2016. Deep transfer learning for person re-identification. arXiv:1611.05244.Google Scholar
- Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, Los Alamitos, CA, 1735--1742. Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
- Michael E. Houle, Xiguo Ma, Vincent Oria, and Jichao Sun. 2017. Query expansion for content-based similarity search using local and global features. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3, 25. Google ScholarDigital Library
- Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In Proceedings of the European Conference on Computer Vision. 685--701.Google ScholarCross Ref
- Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2288--2295. Google ScholarDigital Library
- Martin Köstinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2288--2295. Google ScholarDigital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105. Google ScholarDigital Library
- Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 152--159. Google ScholarDigital Library
- Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2197--2206.Google Scholar
- Giuseppe Lisanti, Svebor Karaman, and Iacopo Masi. 2017. Multichannel-kernel canonical correlation analysis for cross-view person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 2, 13. Google ScholarDigital Library
- Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2016. End-to-end comparative attention networks for person re-identification. arXiv:1606.04404.Google Scholar
- Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-margin softmax loss for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. 507--516. Google ScholarDigital Library
- Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical Gaussian descriptor for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1363--1372.Google ScholarCross Ref
- Alexis Mignon and Frédéric Jurie. 2012. PCCA: A new approach for distance learning from sparse pairwise constraints. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2666--2672. Google ScholarDigital Library
- Prabhu Natarajan, Pradeep K. Atrey, and Mohan Kankanhalli. 2015. Multi-camera coordination and control in surveillance systems: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 4, 57. Google ScholarDigital Library
- Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep metric learning via lifted structured feature embedding. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4004--4012.Google ScholarCross Ref
- James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1--8.Google ScholarCross Ref
- Filip Radenović, Hervé Jégou, and Ondrej Chum. 2015. Multiple measurements and joint dimensionality reduction for large scale image search with short vectors. In Proceedings of the International Conference on Multimedia Retrieval. ACM, New York, NY, 587--590. Google ScholarDigital Library
- Filip Radenović, Giorgos Tolias, and Ondřej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. arXiv:1604.02426.Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211--252. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google Scholar
- Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1, 1929--1958. Google ScholarDigital Library
- Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation by joint identification-verification. In Proceedings of the International Conference on Neural Information Processing Systems. 1988--1996. Google ScholarDigital Library
- Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2015. Deeply learned face representations are sparse, selective, and robust. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2892--2900.Google ScholarCross Ref
- Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879.Google Scholar
- Evgeniya Ustinova, Yaroslav Ganin, and Victor Lempitsky. 2015. Multiregion bilinear convolutional neural networks for person re-identification. arXiv:1512.05300.Google Scholar
- Laurens Van Der Maaten. 2014. Accelerating t-SNE using tree-based algorithms.Journal of Machine Learning Research 15, 1, 3221--3245. Google ScholarDigital Library
- Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated Siamese convolutional neural network architecture for human re-identification. In Proceedings of the European Conference on Computer Vision. 791--808.Google ScholarCross Ref
- Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, and Gang Wang. 2016. A Siamese long short-term memory architecture for human re-identification. In Proceedings of the European Conference on Computer Vision. 135--153.Google ScholarCross Ref
- A. Vedaldi and K. Lenc. 2015. MatConvNet—convolutional neural networks for MATLAB. In Proceedings of the ACM International Conference on Multimedia. Google ScholarDigital Library
- Zheng Wang, Ruimin Hu, Chao Liang, Yi Yu, Junjun Jiang, Mang Ye, Jun Chen, and Qingming Leng. 2016. Zero-shot person re-identification via cross-view consistency. IEEE Transactions on Multimedia 18, 2, 260--272.Google ScholarDigital Library
- Lin Wu, Chunhua Shen, and Anton van den Hengel. 2016. Deep linear discriminant analysis on Fisher networks: A hybrid architecture for person re-identification. arXiv:1606.01595.Google Scholar
- Lin Wu, Chunhua Shen, and Anton van den Hengel. 2016. PersonNet: Person re-identification with deep convolutional neural networks. arXiv:1601.07255.Google Scholar
- Yan Yan, Feiping Nie, Wen Li, Chenqiang Gao, Yi Yang, and Dong Xu. 2016. Image classification by cross-media active learning with privileged information. IEEE Transactions on Multimedia 18, 12, 2494--2502. Google ScholarDigital Library
- Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. 2017. Enhancing person re-identification in a self-trained subspace. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3, Article No. 27. Google ScholarDigital Library
- Yi Yang, Dong Xu, Feiping Nie, Jiebo Luo, and Yueting Zhuang. 2009. Ranking with local regression and global alignment for cross media retrieval. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 175--184. Google ScholarDigital Library
- Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep metric learning for person re-identification. In Proceedings of the Conference on Pattern Recognition. IEEE, Los Alamitos, CA, 34--39. Google ScholarDigital Library
- Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1239--1248.Google ScholarCross Ref
- Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1278--1287.Google ScholarCross Ref
- Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the International Conference on Computer Vision. 2528--2535. Google ScholarDigital Library
- Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the European Conference on Computer Vision. 868--884.Google ScholarCross Ref
- Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the International Conference on Computer Vision. 1116--1124. Google ScholarDigital Library
- Liang Zheng, Shengjin Wang, Ziqiong Liu, and Qi Tian. 2015. Fast image retrieval: Query pruning and early termination. IEEE Transactions on Multimedia 17, 5, 648--659.Google ScholarCross Ref
- Liang Zheng, Shengjin Wang, and Qi Tian. 2014. Coupled binary embedding for large-scale image retrieval. IEEE Transactions on Image Processing 23, 8, 3368--3380.Google ScholarCross Ref
- Liang Zheng, Shengjin Wang, Jingdong Wang, and Qi Tian. 2016. Accurate image search with multi-scale contextual evidences. International Journal of Computer Vision 120, 1, 1--13. Google ScholarDigital Library
- Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv:1610.02984.Google Scholar
- Liang Zheng, Yi Yang, and Qi Tian. 2017. SIFT meets CNN: A decade survey of instance retrieval. arXiv:1608.01807.Google Scholar
- Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2013. Reidentification by relative distance comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 3, 653--668. Google ScholarDigital Library
- Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision.Google ScholarCross Ref
Index Terms
- A Discriminatively Learned CNN Embedding for Person Reidentification
Recommendations
Deep feature embedding learning for person re-identification based on lifted structured loss
Person re-identification (re-id) aims at matching the same individual in videos captured by multiple cameras, and much progress has been made in recent years due to large scale pedestrian data sets and deep learning-based techniques. In this paper, we ...
Discriminatively Learned Hierarchical Rank Pooling Networks
Rank pooling is a temporal encoding method that summarizes the dynamics of a video sequence to a single vector which has shown good results in human action recognition in prior work. In this work, we present novel temporal encoding methods for action ...
Circle-Based Ratio Loss for Person Reidentification
Person reidentification (re-id) aims to recognize a specific pedestrian from uncrossed surveillance camera views. Most re-id methods perform the retrieval task by comparing the similarity of pedestrian features extracted from deep learning models. ...
Comments