Elsevier

Pattern Recognition Letters

Volume 33, Issue 14, 15 October 2012, Pages 1828-1837
Pattern Recognition Letters

Person re-identification in crowd

https://doi.org/10.1016/j.patrec.2012.02.014Get rights and content

Abstract

Person re-identification aims to recognize the same person viewed by disjoint cameras at different time instants and locations. In this paper, after an extensive review of state-of-the-art approaches, we propose a re-identification method that takes into account the appearance of people, the spatial location of cameras and potential paths a person can choose to follow. This choice is modeled with a set of areas of interest (landmarks) that constrain the propagation of people trajectories in non-observed regions between the field-of-view of cameras. We represent people with a selective patch around their upper body to work in crowded scenes when occlusions are frequent. We demonstrate the proposed method in a challenging scenario from London Gatwick airport and compare it to well-known person re-identification methods, highlighting their strengths and limitations. Finally, we show by Cumulative Matching Characteristic curve that the best performance results by modeling people movements in non-observed regions combined with appearance methods, achieving an average improvement of 6% when only appearance is used and 15% when only motion is used for the association of people across cameras.

Highlights

► Person re-identification is challenging in crowd and non-overlapping camera networks. ► Motion models are used for re-identification. ► Top-part of a person is the most representative in crowded scenarios. ► Combination of motion and appearance models improve re-identification.

Introduction

The surveillance of wide areas such as airports and train stations requires the deployment of networks of cameras whose field-of-views (FOV) may be disjoint. Disjoint cameras make person re-identification a challenging problem, because of changes in pose, scale and illumination that modify the perceived appearance of a person across cameras (Fig. 1). Moreover, in a crowd, the full body is often not visible due to occlusions. Finally, people exiting the FOV of a camera may enter in different regions of the FOV of the next camera so the time needed to travel across cameras and the area of reappearance are variable and difficult to model.

We can identify four main phases in person re-identification, namely multi-person detection, feature extraction, cross-camera calibration, and person association (Fig. 2). The first phase, multi-person detection, extracts image regions corresponding to people Enzweiler and Gavrila, 2009), based on a trained classifier, a motion detector or a combination of both. The second phase extracts features from the detected people. Appearance features include color, texture and shape, which can be used separately or combined (Gray and Tao, 2008). These features can be extracted from a single snapshot of the target (Zheng et al., 2011) or, when intra-camera tracking information is available, after grouping features over time (Berdugo et al., 2010). The third phase, cross-camera calibration, establishes the color and spatio–temporal relationship across cameras and allows to account for the variability of observations of the same person across different FOVs. Spatio–temporal calibration methods encapsulate information about the camera deployment, the spatial relationship between cameras, the entry/exit points in the scene, and the traveling time across cameras (Javed et al., 2008). Finally, the association of candidates across cameras matches different instances of the same person using the information extracted in the previous phases.

Existing person re-identification methods are validated on snapshot-based or video-based datasets. VIPeR (Farenzena et al., 2010, Prosser et al., 2010) and i-LIDS-static (Farenzena et al., 2010, Bak et al., 2010, Prosser et al., 2010, Zheng et al., 2011) are the most common snapshot-based datasets used to validate appearance-based methods mostly containing people with full body visibility. VIPeR consists of 632 images taken from two outdoor views, while i-LIDS-static contains from 44 (Bak et al., 2010) to 479 (Zheng et al., 2011) image pairs of people taken from four cameras at London Gatwick airport. A video-based dataset is the Terrascope dataset (Jeong and Jaynes, 2008) that consists of nine indoor cameras where eight people walk and act in an office environment. Javed et al. (Javed et al., 2008) presented a video-based dataset with three sequences composed of up to three cameras from indoor and outdoor scenarios with large illumination changes and up to four fully visible people. Finally, a more challenging dataset in terms of occlusions that with three outdoor cameras where up to ten people walk alone or in small groups (Kuo et al., 2010).

In this paper, we present a unifying overall structure and an in-depth survey of the state-of-the-art for person re-identification methods that allow us to identify the major common features and drawbacks of existing approaches. Unlike previous method-based surveys (Doretto et al., 2011), our survey has a phase-based organization. Based on the outcome of this survey, we propose a method that (i) integrates simple knowledge of the site under surveillance, (ii) models people movements in non-observed regions using landmark points (regions of interest) in the scene, and (iii) can cope with crowded scenes. The association method uses distances based on appearance, location, and their combination. Appearance features are extracted from a selected area of the upper body (the most visible in case of crowd) and candidate locations are generated using landmark points and people motion. We compare the most representative state-of-the-art person re-identification methods and our proposed method on the London Gatwick airport dataset (iLIDS, 2008) by Cumulative Matching Characteristic curves (CMC) (Gray and Tao, 2008, Prosser et al., 2010, Zheng et al., 2011).

The paper is organized as follows. Section 2 discusses a comprehensive survey on person re-identification methods, based on their four main phases. Section 3 presents our framework for re-identification that uses a landmark-based spatio–temporal modeling and an upper body representation. In Section 4, we validate the proposed approach and compare it with the most common state-of-the-art-methods. Finally, Section 5 discusses the results and draws conclusions.

Section snippets

Person re-identification: a survey

In this section, we discuss person re-identification methods presented in the literature based on the classification in phases proposed in the previous section. The methods are summarized in Table 1. In the following, we consider the first phase (multi-person detection) to have been already solved.

Overview

The proposed person re-identification method deals with crowded scenes and scenarios with challenging non-observed regions where spatio–temporal calibration is not straightforward. The method extracts appearance features from the upper part of the body and models movements in non-observed regions. This modeling is performed using the map of the site under surveillance where candidate positions for people reappearance are created. Then, the association phase integrates information from

Experimental setup

In this section, we evaluate the most representative re-identification algorithms and compare them with the proposed LBM approach. Results are provided for methods based on appearance features only, spatio–temporal features only, and a combination of them. Moreover, we validate our choice to extract appearance features from a vertical stripe of the upper body (Fig. 4).

The experiments are run on the i-LIDS dataset from the London Gatwick airport (iLIDS, 2008) where similarly to previous works (

Conclusions

We proposed and validated a person re-identification method based on modeling people movements in non-observed regions using a site map and regions of interest where people are likely to transit. The two main novel contributions of the proposed method are the integration of appearance information for association with spatio–temporal modeling of people movements in non-observed regions, and the extraction of features from a vertical stripe partially covering the upper body and the head, thus

Acknowledgements

Fahad Tahir was supported by the Erasmus Mundus Joint Doctorate in Interactive and Cognitive Environments, which is funded by the Education, Audiovisual & Culture Executive Agency (FPA n° 2010–0012).

References (26)

  • O. Javed et al.

    Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views

    Computer Vision and Image Understanding

    (2008)
  • L.F. Teixeira et al.

    Video object matching across multiple independent views using local descriptors and adaptive learning

    Pattern Recognition Letters

    (2009)
  • Bak, S., Corvee, E., Bremond, F., Thonnat, M., 2010. Person re-identification using haar-based and dcd-based signature....
  • Bauml, M., Bernardin, K., Fischer, M., Ekenel, H.K., 2010. Multi-pose face recognition for person retrieval in camera...
  • Bauml, M., Stiefelhagen, R., 2011. Evaluation of local features for person re-identification in image sequences. In:...
  • Berdugo, G., Soceanu, O., Moshe, Y., Rudoy, D., Dvir, I., 2010. Object re-identification in real world scenarios across...
  • Cheng, Y., Zhou, W., Wang, Y., Zhao, C., Zhang, S., 2009. Multi-camera-based object handoff using decision-level...
  • G. Doretto et al.

    Appearance-based person reidentification in camera networks: problem overview and current approaches

    Journal of Ambient Intelligence and Humanized Computing

    (2011)
  • M. Enzweiler et al.

    Monocular pedestrian detection: survey and experiments

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • Farenzena, M., Bazzani, L., Perina, A., Murino, V., Cristani, M., 2010. Person re-identification by symmetry-driven...
  • Gheissari, N., Sebastian, T.B., Tu, P.H., Rittscher, J., Hartley, R., 2006. Person reidentification using...
  • Gray, D., Tao, H., 2008. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Proc....
  • Hamdoun, O., Moutarde, F., Stanciulescu, B., Steux, B., 2008. Person re-identification in multi-camera system by...
  • Cited by (89)

    • Deep visual Re-identification with confidence

      2021, Transportation Research Part C: Emerging Technologies
    • Variance-preserving deep metric learning for content-based image retrieval

      2020, Pattern Recognition Letters
      Citation Excerpt :

      However, with the recent success of deep learning in supervised learning tasks, along with the availability of larger annotated datasets [2], the interest has gradually shifted into exploring whether supervised objectives, which directly employ class/label annotations, can be effectively used for metric learning for information retrieval purposes [18,39,52]. Indeed, supervised metric learning greatly outperformed the previously used methods in a wide range of settings and setups, ranging from hashing [40,41] to re-identification [5,23]. In fact, the success of these approaches was so spectacular that slowly led to the belief that image retrieval and classification are probably just slightly different variations of the same problem [49].

    View all citing articles on Scopus
    View full text