Skip to main content

About this book

This timely text/reference presents a broad overview of advanced deep learning architectures for learning effective feature representation for perceptual and biometrics-related tasks. The text offers a showcase of cutting-edge research on the use of convolutional neural networks (CNN) in face, iris, fingerprint, and vascular biometric systems, in addition to surveillance systems that use soft biometrics. Issues of biometrics security are also examined.Topics and features: addresses the application of deep learning to enhance the performance of biometrics identification across a wide range of different biometrics modalities; revisits deep learning for face biometrics, offering insights from neuroimaging, and provides comparison with popular CNN-based architectures for face recognition; examines deep learning for state-of-the-art latent fingerprint and finger-vein recognition, as well as iris recognition; discusses deep learning for soft biometrics, including approaches for gesture-based identification, gender classification, and tattoo recognition; investigates deep learning for biometrics security, covering biometrics template protection methods, and liveness detection to protect against fake biometrics samples; presents contributions from a global selection of pre-eminent experts in the field representing academia, industry and government laboratories.Providing both an accessible introduction to the practical applications of deep learning in biometrics, and a comprehensive coverage of the entire spectrum of biometric modalities, this authoritative volume will be of great interest to all researchers, practitioners and students involved in related areas of computer vision, pattern recognition and machine learning.

Table of Contents


Deep Learning for Face Biometrics


Chapter 1. The Functional Neuroanatomy of Face Processing: Insights from Neuroimaging and Implications for Deep Learning

Face perception is critical for normal social functioning, and is mediated by a cortical network of regions in the ventral visual stream. Comparative analysis between present deep neural network architectures for biometrics and neural architectures in the human brain is necessary for developing artificial systems with human abilities. Neuroimaging research has advanced our understanding regarding the functional architecture of the human ventral face network. Here, we describe recent neuroimaging findings in three domains: (1) the macro- and microscopic anatomical features of the ventral face network in the human brain, (2) the characteristics of white matter connections, and (3) the basic computations performed by population receptive fields within face-selective regions composing this network. Then, we consider how empirical findings can inform the development of accurate computational deep neural networks for face recognition as well as shed light on computational benefits of specific neural implementational features.
Kalanit Grill-Spector, Kendrick Kay, Kevin S. Weiner

Chapter 2. Real-Time Face Identification via Multi-convolutional Neural Network and Boosted Hashing Forest

The family of real-time face representations is obtained via Convolutional Network with Hashing Forest (CNHF). We learn the CNN, then transform CNN to the multiple convolution architecture and finally learn the output hashing transform via new Boosted Hashing Forest (BHF) technique. This BHF generalizes the Boosted Similarity Sensitive Coding (SSC) approach for hashing learning with joint optimization of face verification and identification. CNHF is trained on CASIA-WebFace dataset and evaluated on LFW dataset. We code the output of single CNN with 97% on LFW. For Hamming embedding we get CBHF-200 bit (25 byte) code with 96.3% and 2,000-bit code with 98.14% on LFW. CNHF with 2,000\(\times \)7-bit hashing trees achieves 93% rank-1 on LFW relative to basic CNN 89.9% rank-1. CNHF generates templates at the rate of 40\(+\) fps with CPU Core i7 and 120\(+\) fps with GPU GeForce GTX 650.
Yury Vizilter, Vladimir Gorbatsevich, Andrey Vorotnikov, Nikita Kostromov

Chapter 3. CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection

Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e., unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g., heavy facial occlusions, extremely low resolutions, strong illumination, exceptional pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. First, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Second, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e., the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods.
Chenchen Zhu, Yutong Zheng, Khoa Luu, Marios Savvides

Deep Learning for Fingerprint, Fingervein and Iris Recognition


Chapter 4. Latent Fingerprint Image Segmentation Using Deep Neural Network

We present a deep artificial neural network (DANN) model that learns latent fingerprint image patches using a stack of restricted Boltzmann machines (RBMs), and uses it to perform segmentation of latent fingerprint images. Artificial neural networks (ANN) are biologically inspired architectures that produce hierarchies of maps through learned weights or filters. Latent fingerprints are fingerprint impressions unintentionally left on surfaces at a crime scene. To make identifications or exclusions of suspects, latent fingerprint examiners analyze and compare latent fingerprints to known fingerprints of individuals. Due to the poor quality and often complex image background and overlapping patterns characteristic of latent fingerprint images, separating the fingerprint region of interest from complex image background and overlapping patterns is very challenging. Our proposed DANN model based on RBMs learns fingerprint image patches in two phases. The first phase (unsupervised pre-training) involves learning an identity mapping of the input image patches. In the second phase, fine-tuning and gradient updates are performed to minimize the cost function on the training dataset. The resulting trained model is used to classify the image patches into fingerprint and non-fingerprint classes. We use the fingerprint patches to reconstruct the latent fingerprint image and discard the non-fingerprint patches which contain the structured noise in the original latent fingerprint. The proposed model is evaluated by comparing the results from the state-of-the-art latent fingerprint segmentation models. The results of our evaluation show the superior performance of the proposed method.
Jude Ezeobiejesi, Bir Bhanu

Chapter 5. Finger Vein Identification Using Convolutional Neural Network and Supervised Discrete Hashing

Automated personal identification using vascular biometrics, such as from the finger vein images, is highly desirable as it helps to protect the personal privacy and anonymity in during the identification process. The Convolutional Neural Network (CNN) has shown remarkable capability for learning biometric features that can offer robust and accurate matching. We introduce a new approach for the finger vein authentication using the CNN and supervised discrete hashing. We also systematically investigate comparative performance using several popular CNN architectures in other domains, i.e., Light CNN, VGG-16, Siamese and the CNN with Bayesian inference-based matching. The experimental results are presented using a publicly available two-session finger vein images database. Most accurate performance is achieved by incorporating supervised discrete hashing from a CNN trained using the triplet-based loss function. The proposed approach not only achieves outperforming results over other considered CNN architecture available in the literature but also offers significantly reduced template size as compared with those over the other finger vein images matching methods available in the literature to date.
Cihui Xie, Ajay Kumar

Chapter 6. Iris Segmentation Using Fully Convolutional Encoder–Decoder Networks

As a considerable breakthrough in artificial intelligence, deep learning has gained great success in resolving key computer vision challenges. Accurate segmentation of the iris region in the eye image plays a vital role in efficient performance of iris recognition systems, as one of the most reliable systems used for biometric identification. In this chapter, as the first contribution, we consider the application of Fully Convolutional Encoder–Decoder Networks (FCEDNs) for iris segmentation. To this extent, we utilize three types of FCEDN architectures for segmentation of the iris in the images, obtained from five different datasets, acquired under different scenarios. Subsequently, we conduct performance analysis, evaluation, and comparison of these three networks for iris segmentation. Furthermore, and as the second contribution, in order to subsidize the true evaluation of the proposed networks, we apply a selection of conventional (non-CNN) iris segmentation algorithms on the same datasets, and similarly evaluate their performances. The results then get compared against those obtained from the FCEDNs. Based on the results, the proposed networks achieve superior performance over all other algorithms, on all datasets.
Ehsaneddin Jalilian, Andreas Uhl

Deep Learning for Soft Biometrics


Chapter 7. Two-Stream CNNs for Gesture-Based Verification and Identification: Learning User Style

A gesture is a short body motion that contains both static (nonrenewable) anatomical information and dynamic (renewable) behavioral information. Unlike traditional biometrics such as face, fingerprint, and iris, which cannot be easily changed, gestures can be modified if compromised. We consider two types of gestures: full-body gestures, such as a wave of the arms, and hand gestures, such as a subtle curl of the fingers and palm, as captured by a depth sensor (Kinect v1 and v2 in our case). Most prior work in this area evaluates gestures in the context of a “password,” where each user has a single, chosen gesture motion. Contrary to this, we aim to learn a user’s gesture “style” from a set of training gestures. This allows for user convenience since an exact user motion is not required for user recognition. To achieve the goal of learning gesture style, we use two-stream convolutional neural networks, a deep learning framework that leverages both the spatial (depth) and temporal (optical flow) information of a video sequence. First, we evaluate the generalization performance during testing of our approach against gestures of users that have not been seen during training. Then, we study the importance of dynamics by suppressing the use of dynamic information in training and testing. Finally, we assess the capacity of the aforementioned techniques to learn representations of gestures that are invariant across users (gesture recognition) or to learn representations of users that are invariant across gestures (user style in verification and identification) by visualizing the two-dimensional t-Distributed Stochastic Neighbor Embedding (t-SNE) of neural network features. We find that our approach outperforms state-of-the-art methods in identification and verification on two biometrics-oriented gesture datasets for full-body and in-air hand gestures.
Jonathan Wu, Jiawei Chen, Prakash Ishwar, Janusz Konrad

Chapter 8. DeepGender2: A Generative Approach Toward Occlusion and Low-Resolution Robust Facial Gender Classification via Progressively Trained Attention Shift Convolutional Neural Networks (PTAS-CNN) and Deep Convolutional Generative Adversarial Networks (DCGAN)

In this work, we have undertaken the task of occlusion and low-resolution robust facial gender classification. Inspired by the trainable attention model via deep architecture, and the fact that the periocular region is proven to be the most salient region for gender classification purposes, we are able to design a progressive convolutional neural network training paradigm to enforce the attention shift during the learning process. The hope is to enable the network to attend to particular high-profile regions (e.g., the periocular region) without the need to change the network architecture itself. The network benefits from this attention shift and becomes more robust toward occlusions and low-resolution degradations. With the progressively trained attention shift convolutional neural networks (PTAS-CNN) models, we have achieved better gender classification results on the large-scale PCSO mugshot database with 400 K images under occlusion and low-resolution settings, compared to the one undergone traditional training. In addition, our progressively trained network is sufficiently generalized so that it can be robust to occlusions of arbitrary types and at arbitrary locations, as well as low resolution. One way to further improve the robustness of the proposed gender classification algorithm is to invoke a generative approach for occluded image recovery, such as using the deep convolutional generative adversarial networks (DCGAN). The facial occlusions degradation studied in this work is a missing data challenge. For the occlusion problems, the missing data locations are known whether they are randomly scattered, or in a contiguous fashion. We have shown, on the PCSO mugshot database, that a deep generative model can very effectively recover the aforementioned degradation, and the recovered images show significant performance improvement on gender classification tasks.
Felix Juefei-Xu, Eshan Verma, Marios Savvides

Chapter 9. Gender Classification from NIR Iris Images Using Deep Learning

Gender classification from NIR iris image is a new topic with only a few papers published. All previous work on gender-from-iris tried to find the best feature extraction techniques to represent the information of the iris texture for gender classification using normalized, encoded or periocular images. However this is a new topic in deep-learning application with soft biometric. In this chapter, we show that learning gender-iris representations through the use of deep neural networks may increase the performance obtained on these tasks. To this end, we propose the application of deep-learning methods to separate the gender-from-iris images even when the amount of learning data is limited, using an unsupervised stage with Restricted Boltzmann Machine (RBM) and a supervised stage using a Convolutional Neural Network (CNN).
Juan Tapia, Carlos Aravena

Chapter 10. Deep Learning for Tattoo Recognition

Soft biometrics are physiological and behavioral characteristics that provide some identifying information about an individual. Color of eye, gender, ethnicity, skin color, height, weight, hair color, scar, birthmarks, and tattoos are examples of soft biometrics. Several techniques have been proposed to identify or verify an individual based on soft biometrics in the literature. In particular, person identification and retrieval systems based on tattoos have gained a lot of interest in recent years. Tattoos, in some extent, indicate ones personal beliefs and characteristics. Hence, the analysis of tattoos can lead to a better understanding of ones background and membership to gang and hate groups. They have been used to assist law enforcement in investigations leading to the identification of criminals. In this chapter, we will provide an overview of recent advances in tattoo recognition and detection based on deep learning. In particular, we will present deep convolutional neural network-based methods for automatic matching of tattoo images based on the AlexNet and Siamese networks. Furthermore, we will show that rather than using a simple contrastive loss function, triplet loss function can significantly improve the performance of a tattoo matching system. Various experimental results on a recently introduced Tatt-C dataset will be presented.
Xing Di, Vishal M. Patel

Deep Learning for Biometrics Security and Protection


Chapter 11. Learning Representations for Cryptographic Hash Based Face Template Protection

In this chapter, we discuss the impact of recent advancements in deep learning in the field of biometric template protection. The representation learning ability of neural networks has enabled them to achieve state-of-the-art results in several fields, including face recognition. Consequently, biometric authentication using facial images has also benefited from this, with deep convolutional neural networks pushing the matching performance numbers to all time highs. This chapter studies the ability of neural networks to learn representations which could benefit template security in addition to matching accuracy. Cryptographic hashing is generally considered most secure form of protection for the biometric data, but comes at the high cost of requiring an exact match between the enrollment and verification templates. This requirement generally leads to a severe loss in matching performance (FAR and FRR) of the system. We focus on two relatively recent face template protection algorithms that study the suitability of representations learned by neural networks for cryptographic hash based template protection. Local region hashing tackles hash-based template security by attempting exact matches between features extracted from local regions of the face as opposed to the entire face. A comparison of the suitability of different feature extractors for the task is presented and it is found that a representation learned by an autoencoder is the most promising. Deep secure encoding tackles the problem in an alternative way by learning a robust mapping of face classes to secure codes which are then hashed and stored as the secure template. This approach overcomes several pitfalls of local region hashing and other face template algorithms. It also achieves state-of-the-art matching performance with a high standard of template security.
Rohit Kumar Pandey, Yingbo Zhou, Bhargava Urala Kota, Venu Govindaraju

Chapter 12. Deep Triplet Embedding Representations for Liveness Detection

Liveness detection is a fundamental element for all biometric systems that have to be safe against spoofing attacks at sensor level. In particular, for an attacker it is relatively easy to build a fake replica of a legitimate finger and apply it directly to the sensor, thereby fooling the system by declaring its corresponding identity. In order to ensure that the declared identity is genuine and it corresponds to the individual present at the time of capture, the task is usually formulated as a binary classification problem, where a classifier is trained to detect whether the fingerprint at hand is real or an artificial replica. In this chapter, unlike the binary classification model, a metric learning approach based on triplet convolutional networks is proposed. A representation of the fingerprint images is generated, where the distance between feature points reflects how much the real fingerprints are dissimilar from the ones generated artificially. A variant of the triplet objective function is employed, that considers patches taken from two real fingerprint and a replica (or two replicas and a real fingerprint), and gives a high penalty if the distance between the matching couple is greater than the mismatched one. Given a test fingerprint image, its liveness is established by matching its patches against a set of reference genuine and fake patches taken from the training set. The use of small networks along with a patch-based representation allows the system to perform the acquisitions in real time and provide state-of-the-art performances. Experiments are presented on several benchmark datasets for liveness detection including different sensors and fake fingerprint materials. The approach is validated on the cross-sensor and cross-material scenarios, to understand how well it generalizes to new acquisition devices, and how robust it is to new presentation attacks.
Federico Pala, Bir Bhanu


Additional information

Premium Partner

    Image Credits