Editor's Choice ArticleA survey of approaches and trends in person re-identification☆
Graphical abstract
Figure illustrates the role of person re-identification in a typical surveillance scenario. An area monitored by multiple cameras is depicted by top view of a building floor plan and the relative placement of the cameras' with respect to the building. Colored dots depict different people and numbers besides the dots are the IDs assigned to the people. As a person moves from one camera's FOV into another camera's FOV, re-identification is required to establish correspondence between disconnected tracks to accomplish multiple camera tracking. This paper explores the problem of person re-identification and discusses the current solutions. Open issues and challenges of the problem are highlighted with a discussion on potential directions for further research.
Introduction
Large networks of cameras are increasingly deployed in public places like airports, railway stations, college campuses and office buildings. These cameras typically span large geospatial areas and have non-overlapping fields-of-views (FOVs) to provide enhanced coverage. Such networks provide huge amounts of video data, which is either manually monitored by law enforcement officers or utilized after the fact for forensic purposes. Human monitoring of these videos is erroneous, time consuming and expensive, thereby severely reducing the effectiveness of surveillance. Automated analysis of large amounts of video data can not only process the data faster but significantly improve the quality of surveillance [1]. Video analysis can enable long term activity and behavior characterization of people in a scene. Such analysis is required for high-level surveillance tasks like suspicious activity detection or undesirable event prediction for timely alerts to security personnel making surveillance more pro-active [2].
Understanding of a surveillance scene through computer vision requires the ability to track people across multiple cameras, perform crowd movement analysis and activity detection. Tracking people across multiple cameras is essential for wide area scene analytics and person re-identification is a fundamental aspect of multi-camera tracking. Re-identification (Re-ID) is defined as a process of establishing correspondence between images of a person taken from different cameras. It is used to determine whether instances captured by different cameras belong to the same person, in other words, assign a stable ID to different instances of the person. Fig. 1 shows an example of a surveillance area monitored by multiple cameras with non-overlapping FOVs. The figure shows the top view of a building floor plan and the relative placement of the cameras with respect to the building. Colored dots depict different people and numbers besides the dots are the IDs assigned to the people. The dotted lines with arrows represent the directions in which certain people move through the camera network.
As a person moves from one camera's FOV into another camera's FOV, Re-ID is used to establish correspondence between disconnected tracks to accomplish tracking across the multiple cameras. Thus, single camera tracking along with Re-ID across cameras allows for the reconstruction of the trajectory of a person across the larger scene. Person Re-ID is a non-trivial task, but is critical in improving the semantic coherence of analysis. Re-ID is relevant for surveillance applications with a single camera as well. For example, to determine if a person visits a particular location multiple times or if the same or different person picks up an unattended package/bag. Beyond surveillance it has applications in robotics, multimedia, and more popular utilities like automated photo tagging or photo browsing [3].
Person Re-ID as a task is quite simple to understand. As humans, we do it all the time without much effort. Our eyes and brains are trained to detect, localize, identify and later re-identify objects and people in the real world. Re-ID implies that a person that has been previously seen is identified in their next appearance using a unique descriptor of the person. Humans are able to extract such a descriptor based on the person's face, height and built, clothing, hair color, hair style, walking pattern, etc. A person's face is the most unique and reliable feature that humans use to identify people. Automation of person Re-ID on the other hand is quite difficult to accomplish without human intervention.
Section snippets
Person Re-ID: task and its challenges
In general, person Re-ID is difficult to automate for a number of reasons, which we will discuss later in this section, but the main challenge to Re-ID comes from the variation in a person's appearance across different cameras. Fig. 2 shows images of a person taken by different cameras on the same and different days, highlighting the variations in appearance. The top row illustrates the changes in appearance of a person across different cameras. It is also interesting to note that the
Person Re-ID scenarios
In the previous section, we presented the general definition of person Re-ID and discussed the implementation pipeline and associated challenges. However, the Re-ID problem can be split into two scenarios: open set Re-ID and closed set Re-ID. A Re-ID system is similar to a recognition system, which comprises of a gallery set (set of known people) and the probe (unknown person) on which the recognition has to be performed. Fig. 4 depicts the Re-ID system setup as a recognition system.
Let the
Current work in person Re-ID
Re-ID has been a topic of intense research in the past five years [13], [14], [15], [16], [17]. In almost all of the research, the problem of Re-ID has been widely treated as a retrieval or recognition problem. Given an image or multiple images of an unknown person (probe) and a gallery set that consists of a number of known people, the objective is to produce a ranked list of all the people in gallery based on their visual similarity with the unknown person. The expectation is that the highest
Public datasets and evaluation metrics
The visual characteristics of a person vary drastically across cameras, introducing variability in illumination, poses, view angles, scales and camera resolutions. Factors like occlusions, cluttered background and articulated bodies further add to visual variabilities. Thus, in order to develop robust Re-ID techniques it is important to acquire data that captures these factors effectively. Along with high quality data emulating real world conditions, there is also a need to compare and contrast
Open issues in person Re-ID
As is evident, most of the work on person Re-ID leverages clothing appearance based features designed for short-period Re-ID and is evaluated in closed set Re-ID scenarios. The issue of long-period Re-ID is entirely unexplored and open set Re-ID is not completely tackled.
Conclusion
In this paper we have presented the problem of person re-identification, challenging issues and an overview of current research in the computer vision community. We have considered two types of Re-ID tasks: closed set Re-ID and open set Re-ID. We have categorized the methods used and discussed their characteristics and limitations. In addition, we have provided descriptions of the available Re-ID datasets and their pros and cons. A brief discussion of popular Re-ID evaluation techniques is
Acknowledgments
This work was supported in part by the US Department of Justice 2009-MU-MU-K004. Any opinions, findings, conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of our sponsors.
References (117)
- et al.
Symmetry-driven accumulation of local features for human characterization and re-identification
Comput. Vis. Image Underst.
(2013) - et al.
Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views
Comput. Vis. Image Underst.
(2008) - et al.
Person re-identification in crowd
Pattern Recogn. Lett.
(2012) Intelligent multi-camera video surveillance: a review
Pattern Recogn. Lett.
(2013)- et al.
Part-based spatio-temporal model for multi-person re-identification
Pattern Recogn. Lett.
(2012) - et al.
Boosted human re-identification using Riemannian manifolds
Image Vision Comput.
(2012) - et al.
Multiple-shot person re-identification by chromatic and epitomic analyses
Pattern Recogn. Lett.
(2012) - et al.
Fast person re-identification based on dissimilarity representations
Pattern Recogn. Lett.
(2012) - et al.
An intelligent video framework for homeland protection
- et al.
Smart surveillance: applications, technologies and implications
Finding people in repeated shots of the same scene
Histograms of oriented gradients for human detection
Pictorial structures for object recognition
Int. J. Comput. Vis.
A discriminatively trained, multiscale, deformable part model
Poselets: body part detectors trained using 3D human pose annotations
People-tracking-by-detection and people-detection-by-tracking
A boosted particle filter: multitarget detection and tracking
You'll never walk alone: modeling social behavior for multi-target tracking
Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet based part detectors
Int. J. Comput. Vis.
Object tracking: a survey
ACM Comput. Surv.
Person re-identification based on global color context
Multiple-shot person re-identification by hpe signature
Person reidentification using spatiotemporal appearance
Shape and appearance context modeling
Bridging the gaps between cameras
Simultaneous calibration and tracking with a network of non-overlapping sensors
Multi-camera activity correlation analysis
Time-delayed correlation analysis for multi-camera activity understanding
Int. J. Comput. Vis.
Relations between two sets of variates
Incremental activity modeling in multiple disjoint cameras
IEEE Trans. Pattern. Anal. Mach. Intell.
Topology estimation for thousand-camera surveillance networks
Vip: vision tool for comparing images of people, vision interface
Vise: visual search engine using multiple networked cameras
Person tracking and reidentification: introducing panoramic appearance map (pam) for feature representation
Mach. Vis. Appl.
Principal axis-based correspondence between multiple cameras for people tracking
IEEE Trans. Pattern. Anal. Mach. Intell.
Multi-view people surveillance using 3D information
Person re-identification using spatial covariance regions of human body parts
Bicov: a novel image representation for person re-identification and face verification
Custom pictorial structures for re-identification
Viewpoint invariant pedestrian recognition with an ensemble of localized features
Learning discriminative appearance-based models using partial least squares
Associating groups of people
Towards person identification and re-identification with attributes
Learning to match appearances by correlations in a covariance metric space
Unsupervised salience learning for person re-identification
Pedestrian recognition with a learned metric
Reidentification by relative distance comparison
IEEE Trans. Pattern. Anal. Mach. Intell.
Person re-identification by support vector ranking
Person re-identification by efficient impostor-based metric learning
Cited by (0)
- ☆
Editor's Choice Articles are invited and handled by a select rotating 12 member Editorial Board committee. This paper has been recommended for acceptance by Xiaogang Wang.