Skip to main content

2014 | Buch

Person Re-Identification

herausgegeben von: Shaogang Gong, Marco Cristani, Shuicheng Yan, Chen Change Loy

Verlag: Springer London

Buchreihe : Advances in Pattern Recognition

insite
SUCHEN

Über dieses Buch

The first book of its kind dedicated to the challenge of person re-identification, this text provides an in-depth, multidisciplinary discussion of recent developments and state-of-the-art methods. Features: introduces examples of robust feature representations, reviews salient feature weighting and selection mechanisms and examines the benefits of semantic attributes; describes how to segregate meaningful body parts from background clutter; examines the use of 3D depth images and contextual constraints derived from the visual appearance of a group; reviews approaches to feature transfer function and distance metric learning and discusses potential solutions to issues of data scalability and identity inference; investigates the limitations of existing benchmark datasets, presents strategies for camera topology inference and describes techniques for improving post-rank search efficiency; explores the design rationale and implementation considerations of building a practical re-identification system.

Inhaltsverzeichnis

Frontmatter
Chapter 1. The Re-identification Challenge
Abstract
For making sense of the vast quantity of visual data generated by the rapid expansion of large-scale distributed multi-camera systems, automated person re-identification is essential. However, it poses a significant challenge to computer vision systems. Fundamentally, person re-identification requires to solve two difficult problems of ‘finding needles in haystacks’ and ‘connecting the dots’ by identifying instances and associating the whereabouts of targeted people travelling across large distributed space–time locations in often crowded environments. This capability would enable the discovery of, and reasoning about, individual-specific long-term structured activities and behaviours. Whilst solving the person re-identification problem is inherently challenging, it also promises enormous potential for a wide range of practical applications, ranging from security and surveillance to retail and health care. As a result, the field has drawn growing and wide interest from academic researchers and industrial developers. This chapter introduces the re-identification problem, highlights the difficulties in building person re-identification systems, and presents an overview of recent progress and the state-of-the-art approaches to solving some of the fundamental challenges in person re-identification, benefiting from research in computer vision, pattern recognition and machine learning, and drawing insights from video analytics system design considerations for engineering practical solutions. It also provides an introduction of the contributing chapters of this book. The chapter ends by posing some open questions for the re-identification challenge arising from emerging and future applications.
Shaogang Gong, Marco Cristani, Chen Change Loy, Timothy M. Hospedales

Features and Representations

Frontmatter
Chapter 2. Discriminative Image Descriptors for Person Re-identification
Abstract
This chapter looks at person re-identification from a computer vision point of view, by proposing two new image descriptors designed for matching people bounding boxes in images. Indeed, one key issue of person re-identification is the ability to measure the similarity between two person-centered image regions, allowing to predict if these regions represent the same person despite changes in illumination, viewpoint, background clutter, occlusion, and image quality/resolution. They hence heavily rely on the signatures or descriptors used for representing and comparing the regions. The first proposed descriptor is a combination of Biologically Inspired Features (BIF) and covariance descriptors, while the second builds on the recent advances of Fisher Vectors. These two image descriptors are validated through experiments on two different person re-identification benchmarks (VIPeR and ETHZ), achieving state-of-the-art performance on both datasets.
Bingpeng Ma, Yu Su, Frédéric Jurie
Chapter 3. SDALF: Modeling Human Appearance with Symmetry-Driven Accumulation of Local Features
Abstract
In video surveillance, person re-identification (re-id) is probably the open challenge, when dealing with a camera network with non-overlapped fields of view. Re-id allows the association of different instances of the same person across different locations and time. A large number of approaches have emerged in the last 5 years, often proposing novel visual features specifically designed to highlight the most discriminant aspects of people, which are invariant to pose, scale and illumination. In this chapter, we follow this line, presenting a strategy with three important key-characteristics that differentiate it with respect to the state of the art: (1) a symmetry-driven method to automatically segment salient body parts, (2) an accumulation of features making the descriptor more robust to appearance variations, and (3) a person re-identification procedure casted as an image retrieval problem, which can be easily embedded into a multi-person tracking scenario, as the observation model.
Loris Bazzani, Marco Cristani, Vittorio Murino
Chapter 4. Re-identification by Covariance Descriptors
Abstract
This chapter addresses the problem of appearance matching, while employing the covariance descriptor. We tackle the extremely challenging case in which the same nonrigid object has to be matched across disjoint camera views. Covariance statistics averaged over a Riemannian manifold are fundamental for designing appearance models invariant to camera changes. We discuss different ways of extracting an object appearance by incorporating various training strategies. Appearance matching is enhanced either by discriminative analysis using images from a single camera or by selecting distinctive features in a covariance metric space employing data from two cameras. By selecting only essential features for a specific class of objects (e.g., humans) without defining a priori feature vector for extracting covariance, we remove redundancy from the covariance descriptor and ensure low computational cost. Using a feature selection technique instead of learning on a manifold, we avoid the over-fitting problem. The proposed models have been successfully applied to the person re-identification task in which a human appearance has to be matched across nonoverlapping cameras. We carry out detailed experiments of the suggested strategies, demonstrating their pros and cons w.r.t. recognition rate and suitability to video analytics systems.
Sławomir Bąk, François Brémond
Chapter 5. Attributes-Based Re-identification
Abstract
Automated person re-identification using only visual information from public-space CCTV video is challenging for many reasons, such as poor resolution or challenges involved in dealing with camera calibration. More critically still, the majority of clothing worn in public spaces tends to be non-discriminative and therefore of limited disambiguation value. Most re-identification techniques developed so far have relied on low-level visual-feature matching approaches that aim to return matching gallery detections earlier in the ranked list of results. However, for many applications an initial probe image may not be available, or a low-level feature representation may not be sufficiently invariant to viewing condition changes as well as being discriminative for re-identification. In this chapter, we show how mid-level “semantic attributes” can be computed for person description. We further show how this attribute-based description can be used in synergy with low-level feature descriptions to improve re-identification accuracy when an attribute-centric distance measure is employed. Moreover, we discuss a “zero-shot” scenario in which a visual probe is unavailable but re-identification can still be performed with user-provided semantic attribute description.
Ryan Layne, Timothy M. Hospedales, Shaogang Gong
Chapter 6. Person Re-identification by Attribute-Assisted Clothes Appearance
Abstract
Person re-identification across nonoverlapping camera views is a challenging computer vision task. Due to the often low video quality and high camera position, it is difficult to get clear human faces. Therefore, clothes appearance is the main cue to re-identify a person. It is difficult to represent clothes appearance using low-level features due to its nonrigidity, but daily clothes have many characteristics in common. Based on this observation, we study person re-identification by embedding middle-level clothes attributes into the classifier via a latent support vector machine framework. We also collect a large-scale person re-identification dataset, and the effectiveness of the proposed method is demonstrated on this dataset under open-set experimental settings.
Annan Li, Luoqi Liu, Shuicheng Yan
Chapter 7. Person Re-identification by Articulated Appearance Matching
Abstract
Re-identification of pedestrians in video-surveillance settings can be effectively approached by treating each human figure as an articulated body, whose pose is estimated through the framework of Pictorial Structures (PS). In this way, we can focus selectively on similarities between the appearance of body parts to recognize a previously seen individual. In fact, this strategy resembles what humans employ to solve the same task in the absence of facial details or other reliable biometric information. Based on these insights, we show how to perform single image re-identification by matching signatures coming from articulated appearances, and how to strengthen this process in multi-shot re-identification by using Custom Pictorial Structures (CPS) to produce improved body localizations and appearance signatures. Moreover, we provide a complete and detailed breakdown-analysis of the system that surrounds these core procedures, with several novel arrangements devised for efficiency and flexibility. Finally, we test our approach on several public benchmarks, obtaining convincing results.
Dong Seon Cheng, Marco Cristani
Chapter 8. One-Shot Person Re-identification with a Consumer Depth Camera
Abstract
In this chapter, we propose a comparison between two techniques for one-shot person re-identification from soft biometric cues. One is based upon a descriptor composed of features provided by a skeleton estimation algorithm; the other compares body shapes in terms of whole point clouds. This second approach relies on a novel technique we propose to warp the subject’s point cloud to a standard pose, which allows to disregard the problem of the different poses a person can assume. This technique is also used for composing 3D models which are then used at testing time for matching unseen point clouds. We test the proposed approaches on an existing RGB-D re-identification dataset and on the newly built BIWI RGBD-ID dataset. This dataset provides sequences of RGB, depth, and skeleton data for 50 people in two different scenarios and it has been made publicly available to foster advancement in this new research branch.
Matteo Munaro, Andrea Fossati, Alberto Basso, Emanuele Menegatti, Luc Van Gool
Chapter 9. Group Association: Assisting Re-identification by Visual Context
Abstract
In a crowded public space, people often walk in groups, either with people they know or with strangers. Associating a group of people over space and time can assist understanding an individual’s behaviours as it provides vital visual context for matching individuals within the group. This seems to be an ‘easier’ task compared with person re-identification due to the availability of more and richer visual content in associating a group; however, solving this problem turns out to be rather challenging because a group of people can be highly non-rigid with changing relative position of people within the group and severe self-occlusions. In this work, the problem of matching/associating groups of people over large space and time gaps captured in multiple non-overlapping camera views is addressed. Specifically, a novel people group representation and a group matching algorithm are proposed. The former addresses changes in the relative positions of people in a group and the latter uses the proposed group descriptors for measuring the similarity between two candidate images. Based on group matching, we further formulate a method for matching individual person using the group description as visual context. These methods are validated using the 2008 i-LIDS Multiple-Camera Tracking Scenario (MCTS) dataset on multiple camera views from a busy airport arrival hall.
Wei-Shi Zheng, Shaogang Gong, Tao Xiang
Chapter 10. Evaluating Feature Importance for Re-identification
Abstract
Person re-identification methods seek robust person matching through combining feature types. Often, these features are assigned implicitly with a single vector of global weights, which are assumed to be universally and equally good for matching all individuals, independent of their different appearances. In this study, we present a comprehensive comparison and evaluation of up-to-date imagery features for person re-identification. We show that certain features play more important roles than others for different people. To that end, we introduce an unsupervised approach to learning a bottom-up measurement of feature importance. This is achieved through first automatically grouping individuals with similar appearance characteristics into different prototypes/clusters. Different features extracted from different individuals are then automatically weighted adaptively driven by their inherent appearance characteristics defined by the associated prototype. We show comparative evaluation on the re-identification effectiveness of the proposed prototype-sensitive feature importance-based method as compared to two generic weight-based global feature importance methods. We conclude by showing that their combination is able to yield more accurate person re-identification.
Chunxiao Liu, Shaogang Gong, Chen Change Loy, Xinggang Lin

Matching and Distance Metric

Frontmatter
Chapter 11. Learning Appearance Transfer for Person Re-identification
Abstract
In this chapter we review methods that model the transfer a person’s appearance undergoes when passing between two cameras with non-overlapping fields of view. While many recent studies deal with re-identifying a person at any new location and search for universal signatures and metrics, here we focus on solutions for the natural setup of surveillance systems in which the cameras are specific and stationary, solutions which exploit the limited transfer domain associated with a specific camera pair. We compare the performance of explicit transfer modeling, implicit transfer modeling, and camera-invariant methods. Although explicit transfer modeling is advantageous over implicit transfer modeling when the inter-camera training data are poor, implicit camera transfer, which can model multi-valued mappings and better utilize negative training data, is advantageous when a larger training set is available. While camera-invariant methods have the advantage of not relying on specific inter-camera training data, they are outperformed by both camera-transfer approaches when sufficient training data are available. We therefore conclude that camera-specific information is very informative for improving re-identification in sites with static non-overlapping cameras and that it should still be considered even with the improvement of camera-invariant methods.
Tamar Avraham, Michael Lindenbaum
Chapter 12. Mahalanobis Distance Learning for Person Re-identification
Abstract
Recently, Mahalanobis metric learning has gained a considerable interest for single-shot person re-identification. The main idea is to build on an existing image representation and to learn a metric that reflects the visual camera-to-camera transitions, allowing for a more powerful classification. The goal of this chapter is twofold. We first review the main ideas of Mahalanobis metric learning in general and then give a detailed study on different approaches for the task of single-shot person re-identification, also comparing to the state of the art. In particular, for our experiments, we used Linear Discriminant Metric Learning (LDML), Information Theoretic Metric Learning (ITML), Large Margin Nearest Neighbor (LMNN), Large Margin Nearest Neighbor with Rejection (LMNN-R), Efficient Impostor-based Metric Learning (EIML), and KISSME. For our evaluations we used four different publicly available datasets (i.e., VIPeR, ETHZ, PRID 2011, and CAVIAR4REID). Additionally, we generated the new, more realistic PRID 450S dataset, where we also provide detailed segmentations. For the latter one, we also evaluated the influence of using well-segmented foreground and background regions. Finally, the corresponding results are presented and discussed.
Peter M. Roth, Martin Hirzer, Martin Köstinger, Csaba Beleznai, Horst Bischof
Chapter 13. Dictionary-Based Domain Adaptation Methods for the Re-identification of Faces
Abstract
Re-identification refers to the problem of recognizing a person at a different location after one has been captured by a camera at a previous location. We discuss re-identification of faces using the domain adaptation approach which tackles the problem where data in the target domain (different location) are drawn from a different distribution as the source domain (previous location), due to different view points, illumination conditions, resolutions, etc. In particular, we discuss the adaptation of dictionary-based methods for re-identification of faces. We first present a domain adaptive dictionary learning (DADL) framework for the task of transforming a dictionary learned from one visual domain to the other, while maintaining a domain-invariant sparse representation of a signal. Domain dictionaries are modeled by a linear or nonlinear parametric function. The dictionary function parameters and domain-invariant sparse codes are then jointly learned by solving an optimization problem. We then discuss an unsupervised domain adaptive dictionary learning (UDADL) method where labeled data are only available in the source domain. We propose to interpolate subspaces through dictionary learning to link the source and target domains. These subspaces are able to capture the intrinsic domain shift and form a shared feature representation for cross-domain identification.
Qiang Qiu, Jie Ni, Rama Chellappa
Chapter 14. From Re-identification to Identity Inference: Labeling Consistency by Local Similarity Constraints
Abstract
In this chapter, we introduce the problem of identity inference as a generalization of person re-identification. It is most appropriate to distinguish identity inference from re-identification in situations where a large number of observations must be identified without knowing a priori that groups of test images represent the same individual. The standard single- and multishot person re-identification common in the literature are special cases of our formulation. We present an approach to solving identity inference by modeling it as a labeling problem in a Conditional Random Field (CRF). The CRF model ensures that the final labeling gives similar labels to detections that are similar in feature space. Experimental results are given on the ETHZ, i-LIDS and CAVIAR datasets. Our approach yields state-of-the-art performance for multishot re-identification, and our results on the more general identity inference problem demonstrate that we are able to infer the identity of very many examples even with very few labeled images in the gallery.
Svebor Karaman, Giuseppe Lisanti, Andrew D. Bagdanov, Alberto Del Bimbo
Chapter 15. Re-identification for Improved People Tracking
Abstract
Re-identification is usually defined as the problem of deciding whether a person currently in the field of view of a camera has been seen earlier either by that camera or another. However, a different version of the problem arises even when people are seen by multiple cameras with overlapping fields of view. Current tracking algorithms can easily get confused when people come close to each other and merge trajectory fragments into trajectories that include erroneous identity switches. Preventing this means re-identifying people across trajectory fragments. In this chapter, we show that this can be done very effectively by formulating the problem as a minimum-cost maximum-flow linear program. This version of the re-identification problem can be solved in real-time and produces trajectories without identity switches. We demonstrate the power of our approach both in single- and multicamera setups to track pedestrians, soccer players, and basketball players.
François Fleuret, Horesh Ben Shitrit, Pascal Fua

Evaluation and Application

Frontmatter
Chapter 16. Benchmarking for Person Re-identification
Abstract
The evaluation of computer vision and pattern recognition systems is usually a burdensome and time-consuming activity. In this chapter all the benchmarks publicly available for re-identification will be reviewed and compared, starting from the ancestors VIPeR and Caviar to the most recent datasets for 3D modeling such as SARC3d (with calibrated cameras) and RGBD-ID (with range sensors). Specific requirements and constraints are highlighted and reported for each of the described collections. In addition, details on the metrics that are mostly used to test and evaluate the re-identification systems are provided.
Roberto Vezzani, Rita Cucchiara
Chapter 17. Person Re-identification: System Design and Evaluation Overview
Abstract
Person re-identification has important applications in video surveillance. It is particularly challenging because observed pedestrians undergo significant variations across camera views, and there are a large number of pedestrians to be distinguished given small pedestrian images from surveillance videos. This chapter discusses different approaches of improving the key components of a person re-identification system, including feature design, feature learning, and metric learning, as well as their strength and weakness. It provides an overview of various person re-identification systems and their evaluation on benchmark datasets. Multiple benchmark datasets for person re-identification are summarized and discussed. The performance of some state-of-the-art person identification approaches on benchmark datasets is compared and analyzed. It also discusses a few future research directions on improving benchmark datasets, evaluation methodology, and system design.
Xiaogang Wang, Rui Zhao
Chapter 18. People Search with Textual Queries About Clothing Appearance Attributes
Abstract
Person re-identification consists of searching for an individual of interest in video sequences acquired by a camera network, using an image of that individual as a query. Here we consider a related task, named people search with textual queries, which consists of searching images of individuals that match a textual description of clothing appearance, given by a Boolean combination of predefined attributes. People search can be useful in applications like forensic video analysis, where the query can be obtained from a eyewitness report. We propose a general method for implementing people search as an extension of any given re-identification system that uses any multiple part-multiple component appearance descriptor. In our method, the same descriptor of the re-identification system at hand is used, and attributes are chosen by taking into account the information it provides. The original descriptor is then transformed into a dissimilarity one. Attribute detectors are finally constructed as supervised classifiers, using dissimilarity descriptors as the input feature vectors. We experimentally evaluate our method on a benchmark re-identification dataset.
Riccardo Satta, Federico Pala, Giorgio Fumera, Fabio Roli
Chapter 19. Large-Scale Camera Topology Mapping: Application to Re-identification
Abstract
In this chapter we describe the problem of camera network topology mapping, which is a critical precursor to person re-identification in large camera networks. After surveying previous approaches to this problem we describe “exclusion”, a practical, robust method for deriving a topology estimate that scales to thousands of cameras. We then consider re-identification within such networks by modelling and matching target appearance. By combining a simple appearance model with the topology estimate generated by exclusion, person re-identification can be accomplished within far larger scale networks than would be possible using appearance matching alone.
Anthony Dick, Anton van den Hengel, Henry Detmold
Chapter 20. Scalable Multi-camera Tracking in a Metropolis
Abstract
The majority of work in person re-identification is focused primarily on the matching process at an algorithmic level, from identifying reliable features to formulating effective classifiers and distance metrics in order to improve matching scores on established ‘closed-world’ benchmark datasets of limited scope and size. Very little work has explored the pragmatic and ultimately challenging question of how to engineer working systems that best leverage the strengths and tolerate the weaknesses of the current state of the art in re-identification techniques, and which are capable of scaling to ‘open-world’ operational requirements in a large urban environment. In this work, we present the design rationale, implementational considerations and quantitative evaluation of a retrospective forensic tool known as Multi-Camera Tracking (MCT). The MCT system was developed for re-identifying and back-tracking individuals within huge quantities of open-world CCTV video data sourced from a large distributed multi-camera network encompassing different public transport hubs in a metropolis. There are three key characteristics of MCT, associativity, capacity and accessibility, that underpin its scalability to spatially large, temporally diverse, highly crowded and topologically complex urban environments with transport links. We discuss a multitude of functional features that in combination address these characteristics. We consider computer vision techniques and machine learning algorithms, including relative feature ranking for inter-camera matching, global (crowd-level) and local (person-specific) space–time profiling, attribute re-ranking and machine-guided data mining using a ‘man-in-the-loop’ interactive paradigm. We also discuss implementational considerations designed to facilitate linear scalability to an aribitrary number of cameras by employing a distributed computing architecture. We conduct quantitative trials to illustrate the potential of the MCT system and its performance characteristics in coping with very large-scale open-world multi-camera data covering crowded transport hubs in a metropolis.
Yogesh Raja, Shaogang Gong
Backmatter
Metadaten
Titel
Person Re-Identification
herausgegeben von
Shaogang Gong
Marco Cristani
Shuicheng Yan
Chen Change Loy
Copyright-Jahr
2014
Verlag
Springer London
Electronic ISBN
978-1-4471-6296-4
Print ISBN
978-1-4471-6295-7
DOI
https://doi.org/10.1007/978-1-4471-6296-4

Premium Partner