Review ArticleEditor's Choice Article
Comparison of human and computer performance across face recognition experiments

https://doi.org/10.1016/j.imavis.2013.12.002Get rights and content

Highlights

  • We review experiments comparing humans and machines in NIST face recognition tests.

  • We introduce the cross-modal performance analysis (CMPA) framework.

  • We apply the CMPA framework to the human–machine experiments in the NIST tests.

  • We propose a challenge problem to develop algorithms with human level performance.

Abstract

Since 2005, human and computer performance has been systematically compared as part of face recognition competitions, with results being reported for both still and video imagery. The key results from these competitions are reviewed. To analyze performance across studies, the cross-modal performance analysis (CMPA) framework is introduced. The CMPA framework is applied to experiments that were part of face a recognition competition. The analysis shows that for matching frontal faces in still images, algorithms are consistently superior to humans. For video and difficult still face pairs, humans are superior. Finally, based on the CMPA framework and a face performance index, we outline a challenge problem for developing algorithms that are superior to humans for the general face recognition problem.

Introduction

Overall, humans are the most accurate face recognition systems. People recognize faces as part of social interactions, at a distance, in still and video imagery, and under a wide variety of poses, expressions, and illuminations. A holy grail in automatic face recognition is developing an algorithm that has performance equivalent to humans—this is equivalent to solving the general face recognition problem. While it is easy to state the problem, accuracy equivalent to humans, it is not obvious how to determine if an algorithm's recognition accuracy is better than a human. One of the key challenges is establishing a measurable goal line and knowing when the goal line is crossed.

Since 2005, human and computer performance has been systematically compared as part of face recognition competitions conducted by the National Institute of Standards and Technology (NIST) [1], [2], [3], [4]. The comparisons provided an assessment of accuracy for both humans and machines for each competition. However, there has not been a systematic analysis of these results across the competitions.

To analyze the results across experiments, we introduce the cross-modal performance analysis (CMPA) framework, which is demonstrated on the NIST competitions. CMPA was adapted from techniques in neuroscience that were developed to compare output from different sensing modalities of brain activity; e.g., functional magnetic resonance imaging (fMRI) and human perceptual judgments [5], [6]. These techniques can measure concordance between experimental data and computational models. In our study, the modalities compared are human and algorithm performance. In the psychology and neuroscience literature, face recognition algorithms can be referred to as computational models. The computational model can be designed to optimize performance or to model the human face recognition processes. The framework is sufficiently general that it provides a goal line for determining when machine performance reaches human levels.

On frontal faces in high quality still images, our analysis shows that machine performance is superior to humans. For these images, machines represent a person's identity primarily by encoding information extracted from the face; information from the body, hair, and head is generally ignored. For video and extremely difficult-to-recognize face pairs, experiments show that humans take advantage of all available identity cues when recognizing people [7], [8]. CMPA quantifies the potential for improving machine performance if all possible identity information is encoded by algorithms.

Comparing machine and human performance started with independent experiments in NIST competitions. The synthesis of the results across experiments gives a greater understanding of the relative strengths of machines and humans. The CMPA framework provides a goal line for determining if algorithm and human performance is comparable on the general face recognition problem.

Section snippets

Review of human and machine comparisons

We examine the relative performance of humans and machines for both still and video imagery. This review section presents the key details and conclusion for each study. The key details and conclusions were selected to lay the groundwork for the cross-experiment analysis in Section 3. The summary includes an overview of the images in the experiment, how the images were selected for measuring human performance, the key receiver operating characteristics (ROCs) comparing machine and humans, and

Cross experiment comparison

The next step is to take the experiments reviewed in the previous section and to analyze them as a group. The analysis is performed by using the cross-modal performance analysis (CMPA) framework, which we introduce.

Insights from structural comparisons

The analysis in Section 3 directly compared human and machine performance. There is more to learn from the interplay of machines and humans than what can be learned from relative performance comparisons. We will examine this interplay in the context of three topics. The first is the other-race effect, where algorithms have contributed to understanding the human face processing systems and human face processing has contributed to understanding machine performance. Second, it has been possible to

Future directions

The cross-modal performance analysis framework was designed to compare human and machine performance across a series of experiments. Although this framework is useful in its own right, we apply this technique to establish goals for advancing face recognition technology.

Over the last two decades, phenomenal progress has been made in automated face recognition from frontal images taken in mobile studio or mugshot environments. Results from the MBE 2010 report a false reject rates of 27 in 10,000

Acknowledgments

PJP was supported by the Federal Bureau of Investigation and AJO was supported by the Department of Defense. The identification of any commercial product or trade name does not imply endorsement or recommendation by NIST or U of Texas at Dallas.

References (45)

  • A.J. O'Toole et al.

    Comparing face recognition algorithms to humans on challenging tasks

    ACM Trans. Appl. Percept.

    (2012)
  • J.J. DiCarlo

    Untangling object recognition:which neuronal population codes can ex- 7Q1125 plain human object recognition performance? in: Neural Computation: Population 713 Coding of High-Level Representations

    (2011)
  • N. Kriegeskorte et al.

    Representational similarity analysis—connecting the branches of systems neuroscience

    Front. Syst. Neurosci.

    (2008)
  • A. Rice et al.

    Unaware person recognition from the body when face identification fails

    Psychol. Sci.

    (2013)
  • P.J. Phillips

    Improving face recognition technology

    IEEE Comput.

    (2011)
  • P.J. Phillips et al.

    Overview of the face recognition grand challenge

  • P.J. Phillips et al.

    An introduction to the good, the bad, and the ugly face recognition challenge problem

  • J.R. Beveridge et al.

    The CSU face identification evaluation system

    Mach. Vis. Appl.

    (2005)
  • H. Moon et al.

    Computational and performance aspects of PCA-based face-recognition algorithms

    Perception

    (2001)
  • M. Turk et al.

    Eigenfaces for recognition

    J. Cogn. Neurosci.

    (1991)
  • C. Liu

    Capitalize on dimensionality increasing techniques for improving face recognition performance

    IEEE Trans. PAMI

    (2006)
  • M. Husken et al.

    Strategies and benefits of fusion of 2D and 3D face recognition

  • Cited by (110)

    • Stable individual differences in unfamiliar face identification: Evidence from simultaneous and sequential matching tasks

      2023, Cognition
      Citation Excerpt :

      Trials in which a participant recognized an identity were not included in the analyses. Images were obtained from the following databases: The Face and Ocular Challenge Series (Phillips et al., 2011; Phillips & O′Toole, 2014), The Center for Vital Longevity Face Database (Minear & Park, 2004), and Brock University′s Let′s face it database. Images were also obtained from previous publications: Burton et al. (2010), Baker and Mondloch (2019), Matthews and Mondloch (2018), and Dowsett, Sandford, and Burton (2016).

    View all citing articles on Scopus

    Editor's Choice Articles are invited and handled by a select rotating 12 member Editorial Board committee. This paper has been recommended for acceptance by Ioannis A. Kakadiaris.

    View full text