nach oben

2017 | Buch

Biometric Recognition

12th Chinese Conference, CCBR 2017, Shenzhen, China, October 28-29, 2017, Proceedings

herausgegeben von: Jie Zhou, Yunhong Wang, Zhenan Sun, Yong Xu, Linlin Shen, Dr. Jianjiang Feng, Shiguang Shan, Yu Qiao, Zhenhua Guo, Shiqi Yu

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Recognition, CCBR 2017, held in Shenzhen, China, in October 2017.
The 15 full papers and 65 poster papers presented in this book were carefully reviewed and selected from 138 submissions. The papers are organized in topical sections on face;
fingerprint, palm-print and vascular biometrics; iris; gesture and gait; emerging biometrics;
voice and speech; video surveillance; feature extraction and classification theory; behavioral
biometrics.

Inhaltsverzeichnis

Frontmatter

Face

Frontmatter

Detecting Face with Densely Connected Face Proposal Network

Accuracy and efficiency are two conflicting challenges for face detection, since effective models tend to be computationally prohibitive. To address these two conflicting challenges, our core idea is to shrink the input image and focus on detecting small faces. Specifically, we propose a novel face detector, dubbed the name Densely Connected Face Proposal Network (DCFPN), with high performance as well as real-time speed on the CPU devices. On the one hand, we subtly design a lightweight-but-powerful fully convolutional network with the consideration of efficiency and accuracy. On the other hand, we use the dense anchor strategy and propose a fair L1 loss function to handle small faces well. As a consequence, our method can detect faces at 30 FPS on a single 2.60 GHz CPU core and 250 FPS using a GPU for the VGA-resolution images. We achieve state-of-the-art performance on the AFW, PASCAL face and FDDB datasets.

Shifeng Zhang, Xiangyu Zhu, Zhen Lei, Hailin Shi, Xiaobo Wang, Stan Z. Li

Deep Transformation Learning for Depression Diagnosis from Facial Images

As a severe emotional disorder, depression seriously affects people’s thoughts, behavior, feeling, sense of well-being and daily life. With the increasing number of depression patients, it has aroused the attention of researchers in this field. An effective and reliable machine learning based system has been expected to facilitate automated depression diagnose. This paper presents a novel deep transformation learning (DTL) method for visual-based depression recognition. Different from most existing depression recognition methods, our DTL trains a deep neural network that learns a set of hierarchical nonlinear transformations to project original input features into a new feature subspace, so as to capture the non-linear manifold of depression data. Extensive experiments are conducted on the AVEC2014 dataset and the results demonstrate that our method is highly competitive to several state-of-the-art methods for automated prediction of the severity of depression.

Yajun Kang, Xiao Jiang, Ye Yin, Yuanyuan Shang, Xiuzhuang Zhou

Comparison and Fusion of Multiple Types of Features for Image-Based Facial Beauty Prediction

Facial beauty prediction is an emerging research topic that has many potential applications. Existing works adopt features either suggested by putative rules or borrowed from other face analysis tasks, without an optimization procedure. In this paper, we make a comprehensive comparison of different types of features in terms of facial beauty prediction accuracy, including the rule-based features, global features, and local descriptors. Each type of feature is optimized by dimensionality reduction and feature selection. Then, we investigate the optimal fusion strategy of multiple types of features. The results show that the fusion of AAM, LBP, and PCANet features obtains the best performance, which can serve as a competitive baseline for further studies.

Fangmei Chen, David Zhang, Cunrui Wang, Xiaodong Duan

Deep Embedding for Face Recognition in Public Video Surveillance

Face recognition is essential to the surveillance-based crime investigation. The recognition accuracy on benchmark datasets has been boosted by deep learning, while there is still large gap between academic research and practical application. This work aims to identify few suspects from the crowd in real time for public video surveillance, which is a large-scale open-set classification task. The task specific face dataset is built from security surveillance cameras in Beijng subway. The state-of-the-art deep convolutional neural networks are trained end-to-end by triplet supervisory signal to embed faces into 128-dimension feature spaces. The Euclid distances in the embedding space directly correspond to face similarity, which enables real time large scale recognition in embedded system. Experiments demonstrate a 98.92% ± 0.005 pair-wise verification accuracy, which indicates the automatic learned features are highly discriminative and generalize well to new identities. This method outperforms other state-of-the-art methods on the suspects identification task, which fills the application gap in public video surveillance.

Guan Wang, Yu Sun, Ke Geng, Shengguang Li, Wenjing Chen

Random Feature Discriminant for Linear Representation Based Robust Face Recognition

The linear representation based classification methods include two independent steps: representation and decision. First, the query image is represented as a linear combination of training samples. Then the classification decision is made by evaluating which class leads to the minimum class-wise representation error. However, these two steps have different goals. The representation step prefers accuracy while the decision step requires discrimination. Thus precisely representing the query image does not always benefit the decision process. In this paper, we propose a novel linear representation based classifier which no longer separates representation from decision. We repeatedly construct linear representation based classification models with randomly selected features. Then the best model is selected by using the representation discriminant criterion (RDC) which evaluates the discrimination of a representation model. We conduct extensive experiments on public benchmark databases to verify the efficacy of the proposed method.

Jian-Xun Mi, Xiangbin Ma, Qiankun Fu, Chaowei Zhao, Can Long

Adv-Kin: An Adversarial Convolutional Network for Kinship Verification

Kinship verification in the wild is an interesting and challenging problem, which aims to determine whether two unconstrained facial images are from the same family. Most previous methods for kinship verification can be divided as low-level hand-crafted features based shallow methods and kin data trained generic convolutional neural network (CNN) based deep methods. Nevertheless, these general methods cannot well mining the potential information implied in kin-relation data. Inspired by MMD and GAN, Adv-Kin method is proposed in this paper. The discrimination of deep features can be improved by introducing MMD loss (ML) to minimize the distribution difference between parents domain and children domain. In addition, we propose the adversarial loss (AL) that can further improve the robustness of CNN model. Extensive experiments on the benchmark KinFaceW-I, KinFaceW-II, Cornell KinFace and UB KinFace show promising results over many state-of-the-art methods.

Qingyan Duan, Lei Zhang, Wei Jia

Max-Feature-Map Based Light Convolutional Embedding Networks for Face Verification

The powerful image feature extraction ability of convolutional neural network makes it possible to achieve great success in the field of face recognition. However, this category of models tend to be deep and paralleled which is not capable to be applied in real-time face recognition tasks. In order to improve its feasibility, we propose a max-feature-map activation based fully convolutional structure to extract face features with higher speed and less computational cost. The learned model has a great potential on embedding in the hardware devices due to its high recognition performance and small storage space. Experimental results demonstrate that the proposed model is 63 times smaller in comparison with the famous VGG model. At the same time, 96.80% verification accuracy is achieved for a single network on LFW benchmark.

Zhou Yang, Meng Jian, Bingkun Bao, Lifang Wu

Three Dimensional Face Recognition via Surface Harmonic Mapping and Deep Learning

In this paper, we propose a general 3D face recognition framework by combining the idea of surface harmonic mapping and deep learning. In particular, given a 3D face scan, we first run the pre-processing pipeline and detect three main facial landmarks (i.e., nose tip and two inner eye corners). Then, harmonic mapping is employed to map the 3D coordinates and differential geometry quantities (e.g., normal vectors, curvatures) of each 3D face scan to a 2D unit disc domain, generating a group of 2D harmonic shape images (HSI). The 2D rotation of the harmonic shape images are removed by using the three detected landmarks. All these pose normalized harmonic shape images are fed into a pre-trained deep convolutional neural network (DCNN) to generate their deep representations. Finally, sparse representation classifier with score-level fusion is used for face similarity measurement and the final decision. The advantage of our method is twofold: (i) it is a general framework and can be easily extended to other surface mapping and deep learning algorithms. (ii) it is registration-free and only needs three landmarks. The effectiveness of the proposed framework was demonstrated on the BU-3DFE database, and reporting a rank-one recognition rate of 89.38% on the whole database.

Xiaofan Wei, Huibin Li, Xianfeng David Gu

2D-3D Heterogeneous Face Recognition Based on Deep Canonical Correlation Analysis

As one of the major branches in Face Recognition (FR), 2D-3D Heterogeneous FR (HFR), where face comparison is made across the texture and shape modalities, has become more important due to its scientific challenges and application potentials. In this paper, we propose a novel and effective approach, which adapts the Deep Canonical Correlation Analysis (Deep CCA) network to such an issue. Two solutions are presented to speed up the training process and improve the recognition accuracy so that Deep CCA better fits the problem of matching different types of face images. Thanks to the deep structure, the proposed approach hierarchically learns the mapping between 2D and 3D face clues and shows distinct superiority to the previous hand-crafted feature based techniques. Experiments are carried out on the FRGC v2.0 database, and the results achieved clearly demonstrate its competency.

Shupeng Wang, Di Huang, Yunhong Wang, Yuanyan Tang

Age Estimation by Refining Label Distribution in Deep CNN

This paper proposes an age estimation algorithm by refining the label distribution in a deep learning framework. There are two tasks during the training period of our algorithm. The first one finds the optimal parameters of supervised deep CNN by given the label distribution of the training sample as the ground truth, while the second one estimates the variances of label distribution to fit the output of the CNN. These two tasks are performed alternatively and both of them are treated as the supervised learning tasks. The AlexNet and ResNet-50 architectures are adopted as the classifiers and the Gaussian form of the label distribution is assumed. Experiments show that the accuracy of age estimation can be improved by refining label distribution.

Wanxia Shen, Li Sun, Song Qiu, Qingli Li

Face Recognition via Heuristic Deep Active Learning

Recent successes on face recognition tasks require a large number of annotated samples for training models. However, the sample-labeling process is slow and expensive. An effective approach to reduce the annotation effort is active learning (AL). However, the traditional AL methods are limited by the hand-craft features and the small-scale datasets. In this paper, we propose a novel deep active learning framework combining the optimal feature representation of deep convolutional neural network (CNN) and labeling-cost saving of AL, which jointly learns feature and recognition model from unlabeled samples with minimal annotation cost. The model is initialized by a relative small number of labeled samples, and strengthened gradually by adding much more complementary samples for retraining in a progressive way. Our method takes both high-uncertainty samples and the high-confidence samples into consideration for the stability of model. Specifically, the high-confidence samples are selected in a self-paced learning way, and they are double verified by the prior knowledge for more reliable. These high-confidence samples are labeled by estimated class directly, and our framework jointly learns features and recognition model by combining AL with deep CNN, so we name our approach as heuristic deep active learning (HDAL). We apply HDAL on face recognition task, it achieves our goal of “minimizing the annotation cost while avoiding the performance degradation”, and the experimental results on Cross-Age Celebrity Dataset (CACD) show that the HDAL outperforms other state-of-the-art approaches in both recognition accuracy and annotation cost.

Ya Li, Keze Wang, Lin Nie, Qing Wang

One-Snapshot Face Anti-spoofing Using a Light Field Camera

Face recognition is an increasingly popular technology for user authentication. However, face recognition is susceptible to spoofing attacks. Therefore, a reliable way to detect malicious attacks is crucial to the robustness of the face recognition system. This paper describes a new approach to utilizing light field camera for defending spoofing face attacks, like (warped) printed 2D facial photos and high-definition tablet images. The light field camera is a sensor that can record the directions as well as the colors of incident rays. Needing only one snapshot, multiple refocused images can be generated. In the proposed method, three kinds of features extracted from a pair of refocused images are extracted to discriminate fake faces and real faces. To verify the performance, we build a light field photograph databases and conduct experiments. Experimental results reveal that the employed features can achieve remarkable anti-spoofing accuracy under different types of spoofing attacks.

Xiaohua Xie, Yan Gao, Wei-Shi Zheng, Jianhuang Lai, Junyong Zhu

Content-Independent Face Presentation Attack Detection with Directional Local Binary Pattern

Aiming to counter photo attack and video attack in face recognition (FR) systems, a content-independent face presentation attack detection scheme based on directional local binary pattern (DLBP) is proposed. In order to minimize the influences of the image content, DLBP is proposed to investigate the noise characteristics of the facial image. By using directional difference filtering, the discrepancies between the real face and the facial artefact in terms of the consistency of adjacent pixels are effectively exploited. With the DLBP feature, the detection is accomplished by using a Softmax classifier. Experiments are done with four public benchmark databases, and the results indicate its effectiveness both in intra-database and cross-database testing.

Le Qin, Le-Bing Zhang, Fei Peng, Min Long

Matching Depth to RGB for Boosting Face Verification

Low cost RGB-D sensors like Kinect and RealSense enable easy acquisition of both RGB (i.e., texture) and depth images of human faces. Many methods have been proposed to improve the RGB-to-RGB face matcher by fusing it with the Depth-to-Depth face matcher. Yet, few efforts have been devoted to the matching between RGB and Depth face images. In this paper, we propose two deep convolutional neural network (DCNN) based approaches to Depth-to-RGB face recognition, and compare their performance in terms of face verification accuracy. We further combine the Depth-to-RGB matcher with the RGB-to-RGB matcher via score-level fusion. Evaluation experiments on two databases demonstrate that matching depth to RGB does boost face verification accuracy.

Han Liu, Feixiang He, Qijun Zhao, Xiangdong Fei

Deeply Learned Pore-Scale Facial Features

Pore-scale facial features consist of pores, fine wrinkles, and hair, which commonly appear in the whole face region. Similar to iris features and fingerprint features, pore-scale facial features are one of the biometric features that can distinguish human identities. Most of the local features of biometric depend on hand-crafted design. However, such hand-crafted features rely heavily on human experience and are usually composed of complicated operations, costing a great deal of time. This paper introduces a novel pore-scale facial features - Deeply Learned Pore-scale Facial Features (DLPFF). We use Convolutional Neural Networks (CNNs) to learn discriminant representations of pore-scale facial features. Experiments show that our deep network based method outperforms the state-of-the-art methods on the Bosphorus database.

Xiaodong Wang, Yuwei Liang, Xianxian Zeng, Dong Li, Wei Jia

Synthesis and Recognition of Internet Celebrity Face Based on Deep Learning

The similarity among Internet Celebrity Faces brings a big challenge to the recognition and verification of faces. To study this problem, more than 20,000 Internet Celebrity Face pictures are collected from the Internet. We utilize these faces to train the Variational Auto-Encoder (VAE) to synthesize the fake Internet Celebrity Faces and compare the faces with real samples. Results show that the performance of the deep network in Internet Celebrity Face greatly decreases. 20 pairs of the same or different Internet Celebrity Faces are selected to test the human’s ability to recognize Internet Celebrity Faces by questionnaire. The comparison with the VGG deep network shows that the deep learning algorithm performs much better than human in terms of recognition accuracy.

Jiancan Zhou, Guohang Zeng, Jia He, Xi Jia, Linlin Shen

Face Detection with Better Representation Using a Multi-region WR-Inception Network Model

This paper proposes a multi-region WR-Inception network model for face detection based on the Faster RCNN framework. Firstly, we utilize multi-region features to obtain better face representation and introduce block loss to enable our model to be robust to occluded faces. Then we adopt WR-Inception network with shallower and wider layers as our base feature extractor. Finally, we apply a new pre-training strategy to learn representation more suitable for face detection, and exploit soft-nms for the post processing. Specially, experimental results show that our method achieves recall rate of 85.1% on FDDB dataset.

Lianping Yang, Yuanyuan Li, Xu Duan, Xiangde Zhang

Coarse and Fine: A New Method for Gender Classification in the Wild

As one of the most important soft biometrics, gender has substantial applications in various areas such as demography and human-computer interaction. Successful gender estimation of face images taken under real-world also contributes to improving the face identification results in the wild. However, most existing gender classification methods estimate gender under well controlled environment, which limits its implementation in real-world applications. In this paper, we propose a new network architecture to combine the coarse appearance features with delicate facial features for gender estimation task. We call this method “coarse and fine” to give a harsh description of the gender estimation process. Trained on the large scale uncontrolled CelebA dataset without any alignment, the proposed network tries to learn how to estimate gender of real-world face images. Cross-database experiments on LFWA and CASIA-WebFace dataset show the superiority of our proposed method.

Qianbo Jiang, Li Shao, Zhengxi Liu, Qijun Zhao

Joint Collaborative Representation with Deep Feature for Image-Set Face Recognition

With the progress and development of mobile camera and video surveillance, it becomes more efficiently to collect multiple face images for each query. Face recognition based on image set has attracted more and more attention in the community of computer vision and the application of biometrics. In this paper, instead of using handcraft features, we proposed to utilize the deep feature (e.g., convolutional neural network feature) in the application of image-set face recognition. In order to fully explore the discrimination of original query samples and the query virtual nearest point, we proposed a novel joint collaborative representation with a newly designed class-level similarity constraint on the coding coefficients. An alternative solving algorithm is proposed to solve the proposed model. Two experiments were conducted on the YouTube Face database and a new image-set database established based on Labeled Faced in the Wild (LFW). The result of experiments show that our approach has more advantages than previous image-set face recognition approaches.

Hui Li, Meng Yang

Multi-task Deep Face Recognition

In recent years, deep learning has become one of the most representative and effective techniques in face recognition. Due to the high expense of labelling data, it is costly to collect a large-scale face dataset with accurate label information. For the tasks without sufficient data, deep models cannot be well trained. Generally, parameters of deep models are usually initialized with a pre-trained model, and then fine-tuned on a small dataset of specific task. However, by straightforward fine-tuning, the final model usually does not generalize well. In this paper, we propose a multi-task deep learning (MTDL) method for face recognition. The superiority of the proposed multi-task method is demonstrated by experiments on LFW and CCFD.

Jirui Yuan, Wenya Ma, Pengfei Zhu, Karen Egiazarian

Enhancing 3D Facial Expression Recognition by Exaggerating Geometry Characteristics

This paper studies exaggerated facial shapes in addition to original facial shapes to assist 3D Facial Expression Recognition (FER). We propose a Poisson equation based approach to exaggerate facial shape characteristics to highlight expression clues that are latent in original facial surfaces but useful for recognizing expressions. To validate this idea, we exploit two off-the-shelf descriptors that reach state of the art performance in 3D FER, namely Geometric Scattering Representation (GSR) and Multi-Scale Local Normal Patterns (MS-LNPs) for expression-related feature extraction, and adopt early fusion to combine the credits of the original surface and the enhanced one, followed by the SVMs and Multiple Kernel Learning (MKL) classifiers. The accuracy gain of two features achieved on BU-3DFE is 0.8% and 1.3% respectively. Such results show that the exaggerated faces are complementary to the original faces in discriminating different facial expressions in the 3D domain.

Weijian Li, Yunhong Wang, Huibin Li, Di Huang

Fingerprint, Palm-Print and Vascular Biometrics

Frontmatter

Dorsal Hand Vein Recognition Based on Improved Bag of Visual Words Model

At present, the Bag of Visual Words (BoVW) model has been successfully applied to Image Retrieval and Object Recognition. However, how to build a visual dictionary with high efficiency and low redundancy is still a key issue. Therefore, in this paper, we proposed an improved BoVW model to study the dorsal hand vein recognition problem. Specifically, when constructing a visual dictionary, we first use K-means++ to obtain some clustering center points for each image category, and each center point represents a visual word. Secondly, we combine all the categories of words into a visual dictionary. Finally, we use the mutual information method to eliminate redundancy between words to optimize the visual dictionary. The proposed method was tested on image databases collected under weak constraints, and the results show that the improved model has good robustness, low computational complexity, and the expression of each image category is more prominent, so it can get better performance.

Yiding Wang, Shan Dong

Local Orientation Binary Pattern with Use for Palmprint Recognition

In this paper, we extensively exploit the discriminative orientation features of palmprint, including the principal orientation and corresponding orientation confidence, and further propose a local orientation binary pattern (LOBP) for palmprint recognition. Different from the existing binary based representation methods, the LOBP method first captures the principal orientation consistency by comparing the center point with the neighbor sets, and then captures the confidence variations by thresholding the center confidence with neighborhoods so as to obtain orientation binary pattern (OBP) and confidence binary pattern (CBP), respectively. Furthermore, the block-wise statistics of OBP and CBP are concentrated to generate a novel descriptor, namely LOBP, of palmprint. Experiment results on different types of palmprint databases demonstrate the effectiveness of the proposed method.

Lunke Fei, Yong Xu, Shaohua Teng, Wei Zhang, Wenliang Tang, Xiaozhao Fang

Multi-scaling Detection of Singular Points Based on Fully Convolutional Networks in Fingerprint Images

Most of the existing conventional methods for singular points detection of fingerprints depend on the orientation fields of fingerprints, which cannot achieve the reliable and accurate detection of poor quality fingerprints. In this paper, a novel algorithm is proposed for fingerprint singular points detection, which combines multi-scaling fully convolutional networks (FCN) and probability model. Firstly, we divide fingerprint image into overlapping blocks and pose them into a classification problem. And we propose a convolutional neural network (ConvNet) based approach for estimating whether the center of a block is one singularity point. Then, we transform the ConvNet into FCN and fine-tuned. Finally, we adopt probabilistic methods to determine the actual positions of singular points. The performance testing was conducted on NIST DB4 and FVC2002 DB1 database, which concluded that the proposed algorithm gives better results than competing approaches.

Jin Qin, Congying Han, Chaochao Bai, Tiande Guo

An Efficient Slap Fingerprint Segmentation Algorithm Based on Convnets and Knuckle Line

We propose a novel and efficient technique to extract individual fingerprints from a slap-image and identify them into their corresponding indices i.e. index, middle, ring or little finger of left/right hand. We pose the orientation of the hand to a classification problem, and present an approach based on Convolutional Neural Networks (ConvNets) to address the angle of the hand. Geometrical and spatial properties of hand are applied to split a single finger and detect the knuckle line. The proposed algorithm solves the challenges of segmentation like the large rotational angles of the hand and non-elliptical shape of components. Extensive experimental evaluations demonstrate the success of this approach.

Siqi Tang, Jin Qin, Yan Liu, Congying Han, Tiande Guo

An Adaptive Contrast Enhancement Method for Palm Vein Image

Contrast enhancement plays an important role in palm vein image processing applications. However, the over enhancement of noise the commonly used enhancement method produces in relatively homogeneous regions is still a challenging problem. This paper proposes a low-complexity gray-level transformation method for this contrast enhancement problem. Firstly, we calculate a grid size based on the image’s dimension and extract the high frequency from each patch as a weighting matrix. Then we construct a Gaussian model to express the expected contrast-stretching ratio based on the analysis of patch’s high-frequency distribution. Finally, we use the intensity of each pixel as an index to find its mapping at the four closest neighboring grid points and then interpolate among these values to get the gray scale transformation. Experimental results for some of the widely accepted criterions demonstrate its superiority to the conventional contrast enhancement techniques in enhancement performance and anti-noise capability.

Xiongwei Sun, Xiubo Ma, Chunyi Wang, Zede Zu, Shouguo Zheng, Xinhua Zeng

A Scheme of Template Selection and Updating for Palmprint Authentication Systems

The representativeness of templates is a fundamental problem in palmprint authentication systems, where false rejection rate rises if the enrolled templates are less representative of intra-class variations such as posture changes, lighting conditions, and scars. In order to solve this problem, existing techniques typically store multiple templates for the sake of temporary variations such as posture changes, and followed by templates updating to manage the periodical and permanent variations. However, the techniques in the literature do not combine template selection and updating organically to make the system maintain or even improve the representativeness of templates. In this paper, we propose a scheme to provide effective solutions to two important issues of palmprint authentication system: how to automatically select representative templates in a number of candidate samples, and how to update the templates if they are insufficient or are no longer representative of intra-class variations. In enrollment stage, the proposed scheme performs Chameleon clustering and selects the most representative template from each cluster. During authentication stage, the proposed scheme updates the templates in online mode based on historical hit-counts and incoming samples. The experimental results on our palmprint database have demonstrated the effectiveness of our templates selection and updating scheme.

Xi Chen, Ming Yu, Feng Yue, Bin Li

Supervised Hashing with Deep Convolutional Features for Palmprint Recognition

Palmprint representations using multiple filters followed by encoding, i.e. OrdiCode and SMCC, always achieve promising recognition performance. With the similar architecture but distinct idea, we propose a novel learnable palmprint coding representation, by integrating the two recent potentials, e.g. CNN and supervised Hashing, called as deep convolutional features based supervised hashing (DCFSH). DCFSH performs the CNN-F network to extract palmprint convolutional features, whose 13-layer features distilled by the PCA are used for the coding. To learn the compact binary code, the column sampling based discrete supervised hashing, which directly obtains the hashing code from semantic information, is employed. The proposed DCFSH is extensively evaluated by using various code bits and samplings on the PolyU palmprint database, and achieves the verification accuracy of EER = 0.0000% even with 128-bit code, illuminating the great potential of CNN and Hashing for palmprint recognition.

Jingdong Cheng, Qiule Sun, Jianxin Zhang, Qiang Zhang

Contrast Research on Full Finger Area Extraction Method of Touchless Fingerprint Images Under Different Illuminants

Touchless fingerprint recognition with high acceptance, high security, hygiene advantages, is currently a hot research field of biometrics. The background areas of touchless fingerprints are more complex and bigger than those of the contact. So the general methods for contact fingerprint images are difficult to achieve a good effect when extracting the full finger area. The purpose of this research is to compare the performance of finger area extraction based on different color model and illuminants, and then lays the foundation for touchless fingerprint identification. The fingerprint images are respectively collected under blue, green and red illuminants. And then, the Otsu based on YCbCr model, HSV model, and YIQ model is adopted to extract the finger area. Experimental results show that the Otsu based on the Cb component of YCbCr model and S component of HSV model can achieve excellent extraction results under blue illuminant.

Kejun Wang, Yi Cao, Xianglei Xing

Fingerprint Pore Extraction Using U-Net Based Fully Convolutional Network

The public demand for personal safety is increasing rapidly. Fingerprint features as the most commonly used bio-signature need to improve their safety continuously. The third level features of fingerprint (especially the sweat pores) can be added to the automatic fingerprint recognition system to increase the accuracy of fingerprint identification in a variety of environments. Due to perspiration activities, the shape and size sweat of pores are varying spatially and temporally. Extraction of fingerprint pores is both critical and challenging. In this paper, we adapt a novel fully convolutional neural network called U-net for ridges and sweat pores extraction. The PolyU High-Resolution-Fingerprint (HRF) database is used for testing of the proposed method. The results show the validity of the proposed method. With the majority of the pores correctly extracted, the proposed method can serve for fingerprint recognition using Level 3 features.

Haixia Wang, Xicheng Yang, Lingtao Ma, Ronghua Liang

Fingerprint Pore Extraction Based on Multi-scale Morphology

This paper proposes a new method to extract pores on high resolution fingerprints. The basic idea of this method is to binarize the fingerprint images based on multi-scale morphological transformation, and then extract pores by different strategies. The closed pores are extracted by the size of connected regions, and the open pores are detected using the skeleton of valleys. The noise and false detected points are finally removed by using a comprehensive selection rule. Experimental results have shown that the proposed method can improve the accuracy of existing methods.

Yuanrong Xu, Guangming Lu, Feng Liu, Yanxia Li

Finger Vein Presentation Attack Detection Using Convolutional Neural Networks

As an emerging biometric modality, finger vein recognition has received considerable attentions. However, recent studies have shown that finger vein biometrics is vulnerable to presentation attacks, i.e. printed versions of authorized individuals’ finger veins could be used to gain access to facilities or services. In this paper, we have designed a specific shallow convolutional neural network (CNN) for finger vein presentation attack detection (PAD), which is called as FPNet for short. The proposed FPNet has been evaluated on a public-database and an intra-database. Lots of h × h patches have been extracted from vein images with a stride s for dataset augmentation and then used to train our networks without any pre-trained model. For further improving models’ generalizability and robustness, training patches of two databases have been mixed together and our best model has achieved an accuracy of 100% on both test datasets, clearly outperforming state-of-the-art methods.

Xinwei Qiu, Senping Tian, Wenxiong Kang, Wei Jia, Qiuxia Wu

Fingerprint Minutiae Detection Based on Multi-scale Convolution Neural Networks

Minutiae points are defined as the minute discontinuities of local ridge flows, which are widely used as the fine level features for fingerprint recognition. Accurate minutiae detection is important and traditional methods are often based on the hand-crafted processes such as image enhancement, binarization, thinning and tracing of the ridge flows etc. These methods require strong prior knowledge to define the patterns of minutiae points and are easily sensitive to noises. In this paper, we propose a machine learning based algorithm to detect the minutiae points with the gray fingerprint image based on Convolution Neural Networks (CNN). The proposed approach is divided into the training and testing stages. In the training stage, a number of local image patches are extracted and labeled and CNN models are trained to classify the image patches. The test fingerprint is scanned with the CNN model to locate the minutiae position in the testing stage. To improve the detection accuracy, two CNN models are trained to classify the local patch into minutiae v.s. non-minutiae and into ridge ending v.s. bifurcation, respectively. In addition, multi-scale CNNs are constructed with the image patches of varying sizes and are combined to achieve more accurate detection. Finally, the proposed algorithm is tested the fingerprints of FVC2002 DB1 database. Experimental results and comparisons have been presented to show the effectiveness of the proposed method.

Huinan Jiang, Manhua Liu

Customized Local Line Binary Pattern Method for Finger Vein Recognition

Finger vein images present plenty of oriented features. Local line binary pattern (LLBP) and its variance are very good oriented feature representation methods, but their discrimination may be limited, since they does not utilize the class labels in the process of extracting features. In this paper, a class based orientation-selectable PLLBP method, called customized local line binary pattern (CLLBP), is proposed for finger vein recognition. We first calculate the average genuine scores using components of PLLBP at different orientations for each class on the training set, respectively. Secondly, we sort these average genuine scores from the different orientations for each class to rank each component in their relative importance. Thirdly, we choose the k most important components at the top-k orientations for each class. Lastly, given a testing image and an enrolled image, we only use the components at the top-k orientations of the enrolled class to calculate the matching score. Experimental results on the PolyU database verify the better performance of the proposed method than other algorithms, such as LBP and LLBP.

Haiying Liu, Lingfei Song, Gongping Yang, Lu Yang, Yilong Yin

Fingerprint Segmentation via Convolutional Neural Networks

In automatic fingerprint identification systems, it is crucial to segment the fingerprint images. Inspired by the superiority of convolutional neural networks for various classification and regression tasks, we approach fingerprint segmentation as a binary classification problem and propose a convolutional neural network based method for fingerprint segmentation. Given a fingerprint image, we first apply the total variation model to decompose it into cartoon and texture components. Then, the obtained texture component image is divided into overlapping patches, which are classified by the trained convolutional neural network as either foreground or background. Based on the classification results and by applying morphology-based post-processing, we get the final segmentation result for the whole fingerprint image. In the experiments, we investigate the effect of different patch sizes on the segmentation performance, and compare the proposed method with state-of-the-art algorithms on FVC2000, FVC2002 and FVC2004. Experimental results demonstrate that the proposed method outperforms existing algorithms.

Xiaowei Dai, Jie Liang, Qijun Zhao, Feng Liu

Feature Guided Fingerprint Pore Matching

The huge number of sweat pores in fingerprint images results in low efficiency of direct pore (DP) matching methods. To overcome this drawback, this paper proposes a feature guided fingerprint pore matching method. It selects “distinctive” pores around the minutiae and singular points from fingerprint images which extremely reduced the number of pore features for matching. And then, the selected “distinctive” pores are matched using the-state-of-the-art DP matching methods. We also consider to take the select “distinctive” pores together with the extracted minutiae and singular points as a whole feature set for matching. The experimental results have shown that the matching time of the proposed method can be reduced to a quarter of the original time when the recognition accuracy is kept at the same level. Both of the matching time and recognition accuracy are improved when multi-features are taken as a whole set for matching.

Feng Liu, Yuanhao Zhao, Linlin Shen

A CNN-Based Fingerprint Image Quality Assessment Method

Fingerprint image quality assessment is a very important task as the performance of automatic fingerprint identification systems relies heavily on the quality of fingerprint images. Existing methods have made many efforts to find out more appropriate solutions, but most of them operate either on full regions of a fingerprint image, or on local areas. Unlike previous methods, we divide fingerprint images into blocks, and define the quality levels of the blocks according to the minutiae on them and their ridge orientation certainty. With the manually prepared quality-specific fingerprint image blocks, we train a convolutional neural network (CNN) to fulfill end-to-end quality prediction for fingerprint image blocks. The global quality of a fingerprint image can be obtained by fusing the quality levels of its blocks. We evaluate the proposed method on FVC2002 DB1A and FVC2002 DB2A. Experimental results show that the proposed method can effectively distinguish good quality fingerprints from bad ones, and ensure high fingerprint recognition accuracy.

Jianqi Yan, Xiaowei Dai, Qijun Zhao, Feng Liu

2D Fake Fingerprint Detection for Portable Devices Using Improved Light Convolutional Neural Networks

With the increasing use of fingerprint authentication systems on portable devices, fake fingerprint detection has become growing important because fingerprints can be easily spoofed from a variety of available fabrication materials. Recently, many smartphones are hacked successfully by 2D fake fingerprint, which is a serious threat to authentication security. In order to enhance the robustness against fake fingerprint type, this paper proposes a novel 2D fake fingerprint detection method for portable devices based on improved light Convolutional Neural Networks (CNN). To evaluate the performance of the proposed method, a new 2D fake fingerprint dataset including three fabrication materials is created from capacitive fingerprint scanner. In addition, batch normalization and global average pooling are integrated to optimize the network. Experimental results show that the proposed method has high accuracy, strong robustness and good real-time performance, and can meet the requirements on portable devices.

Yongliang Zhang, Bing Zhou, Xiaoguang Qiu, Hongtao Wu, Xiaosi Zhan

Fusing 3D Gabor and Block-Wise Spatial Features for Hyperspectral Palmprint Recognition

Hyperspectral palmprint contains various information in the joint spatial-spectral domain. One crucial task in hyperspectral palmprint recognition is how to extract spatial-spectral features. Since hyperspectral palmprint is three dimensional, most of the existing 2D based algorithms, such as collaborative representation (CR) based framework [15], may not fully explore the information on the spectral domain. Although 3D Gabor filter [18] can be utilized to encode the information on the joint spatial-spectral domain, the texture direction information such as the surface map may not be explored sufficiently. In this work, a novel response-competition (ResCom) feature is proposed to present the spectral information of hyperspectral palmprint based on 3D Gabor filters. Incorporated with the 2D surface map, the ResCom feature can encode not only the 2D texture but also the 3D response variation. Therefore, features of hyperspectral palmprint will be extracted efficiently on the joint spatial-spectral domain. By fusing Block-wise and ResCom features, the proposed approach achieves so far the highest recognition rate of 99.43% on the public hyperspectral palmprint database.

Mian Li, Weicheng Xie, Linlin Shen

Weighted Graph Based Description for Finger-Vein Recognition

The randomness of vein networks determines the discrimination of finger veins patterns in recognition. Effectively describing the random patterns is therefore very important for finger-vein based biometrics. In this paper, a new graph-based method is proposed for finger-vein network feature representation. A block-wise action is first done for graph node generation from a finger-vein image. By applying Delaunay triangulation to these obtained nodes, the graph edges are then built for featuring the spatial relations between images blocks. For a given feature space, each of these edges can locally represent a relationship between two adjacent nodes. Considering local variations in image contents, the graph edges are further weighted node-wisely using the statistics of image blocks. Thus, a graph can globally represent a finger-vein network, and its weighted edges can locally describe the relations of image blocks. Experimental results on two image databases totally 1,200 image samples show that the proposed method performs well in finger-vein recognition.

Ziyun Ye, Jinfeng Yang, Jose Hernandez Palancar

Iris

Frontmatter

Iris Recognition Based on Adaptive Gabor Filter

Aiming at the problem of multi-category iris recognition, there proposes a method of iris recognition algorithm based on adaptive Gabor filter. Use DE-PSO to adaptive optimize the Gabor filter parameters. DE-PSO is composed of particle swarm optimization and differential evolution algorithm. Use 16 groups of 2D-Gabor filters with different frequencies and directions to process iris images. According to the direction and frequency of maximum response amplitude, transform iris features into 512-bit binary feature encoding. Calculate the Hamming distance of feature code and compare with the classification threshold, determine iris the type of iris. Experiment on a variety of iris databases with multiple Gabor filter algorithms, the results showed that this algorithm has higher recognition rate, the ROC curve is closer to the coordinate axis and the robustness is better, compare with other Gabor filter algorithm.

Shuai Liu, Yuanning Liu, Xiaodong Zhu, Guang Huo, Jingwei Cui, Yihao Chen

Deep Convolutional Features for Iris Recognition

Traditional iris recognition methods are mostly based on hand-crafted features, having limited success in less constrained scenarios due to non-ideal images caused by less cooperation of subjects. Though learned features via deep convolutional neural network (CNN) has shown remarkable success in computer vision field, it has been rarely used in the area of iris recognition. To tackle this issue, this paper proposes a novel method for robust iris recognition based on CNN models. As large-scale labeled iris images are not available, we design a lightweight CNN architecture suitable for iris datasets with small-scale labeled images. Different from existing works which use fully-connected features to capture the global texture, we propose to use the convolutional features for modeling local property and deformation of iris texture. We also develop a mechanism which can effectively combine the mask image for excluding the corrupted regions in the CNN model. The proposed method achieves much better performance than the compared methods on challenging ND-IRIS-0405 benchmark.

Xingqiang Tang, Jiangtao Xie, Peihua Li

Mobile Iris Recognition via Fusing Different Kinds of Features

Iris recognition is widely accepted in different kinds of applications. When it comes to mobile iris recognition, the task is quite challenging because of the low quality of iris images. To solve this problem, we propose a mobile iris recognition algorithm based on fusing features and Joint Bayesian. The iris feature representations are extracted by 2D Gabor and Ordinal Measures. Then these feature representations are fused by Joint Bayesian and the similarity of two iris images is measured by log-likelihood ratio. The experiments are conducted on MIR-Train database and a self-established low-quality iris image database (LQIID). The proposed method achieves EER at 1.2% on MIR-Train database and 0.8% on LQIID. These experiments support the effectiveness of the proposed method.

Qi Wang, Xia Su, Zhenlin Cai, Xiangde Zhang

Coarse-to-Fine Iris Recognition Based on Multi-variant Ordinal Measures Feature Complementarity

Iris recognition inevitably need to tackle extremely large scale database matching issue which challenges the iris recognition in both computing efficiency and accuracy. As a feasible solution, the iris image classification has great potential and needs further studies. We propose a multi-variant Ordinal Measures feature complementarity based coarse-to-fine iris recognition strategy. Two OM variant feature are proposed for iris classification. One is very large scale OM feature (VLSOM), and the other is histogram statistics of OM Run-Length Coding (HOMRLC). VLSOM, HOMRLC and OM describes overall appearance, global statistic and local characteristics of iris respectively. Extensive experiments show advantages of the proposed complementarity feature.

Hui Zhang, Man Zhang, Zhaofeng He, Hang Zou, Rui Wang

An Eye Localization Method for Iris Recognition

Eye localization plays a fundamental role in iris recognition, for it can define the effective regions used for iris recognition. This paper presents a new eye localization method based on light spots detection. First, images are preprocessed by Gaussian Smoothing Filter and dilation. Then, FAST feature detection algorithm is used to select candidate regions by detecting the light spots. Furthermore, we calculate the HOG features of these candidates and use the SVM classifier to obtain eye regions. If only one eye is localized, block matching algorithm is applied to fix the missed detection. The localization accuracy on two public dual-eye iris image databases, CASIA-IrisV4-Distance and MIR, keeps above 99.75%, and it shows that 98% of the samples have a normalized error value below 0.1, which demonstrates the success of the proposed method.

Yongliang Zhang, Xiaozhu Chen, Xiao Chen, Dixin Zhou, Erzhe Cao

Visible Spectral Iris Segmentation via Deep Convolutional Network

Iris segmentation is the prerequisite for the precise iris recognition. Visible spectral iris images may result in lower segmentation accuracy due to noise interference. We use deep learning method to segment the iris region in visible spectral iris images. A deep convolution neural network is designed to extract the eye features and segment the iris, pupil, sclera and background. It’s an end-to-end model which requires no further processing. We collect the eye images and manually mask different part of the eye to establish the visible spectral iris dataset for training and testing. The proposed method was trained based on DeepLab framework. Experimental results show that the proposed method has efficiency on iris segmentation.

Yuqing He, Saijie Wang, Kuo Pei, Mingqi Liu, Jiawei Lai

Gesture and Gait

Frontmatter

Gesture Recognition Based on Deep Belief Networks

Analyzing the data acquired from the inertial sensor in mobile phones has been proved to be an effective way in gesture recognition. This research introduces deep belief networks (DBN) to solve the inertial sensor-based gesture recognition problem and obtains a satisfactory result on the BUAA Mobile Gesture Database. The optimal architecture and the hyper parameters of DBN were tuned according to the performance of experiments in order to get a high recognition accuracy within short time. Besides, three state-of-the-art methods were tested on the same database and the comparison of results indicates that the proposed method achieved a much better recognition accuracy, which considerably improves the recognition performance.

Yunqi Miao, Linna Wang, Chunyu Xie, Baochang Zhang

DeepGait: A Learning Deep Convolutional Representation for Gait Recognition

Human gait, as a soft biometric, helps to recognize people by walking. To further improve the recognition performances, we propose a novel video sensor-based gait representation, DeepGait, using deep convolutional features. DeepGait is generated by using an pre-trained VGG-D net without any fine-tuning. When compared with other traditional hand-crafted gait representations (eg. GEI, FDF and GFI etc.) experimentally on OU-ISR large population (OULP) dataset and CASIA-B dataset, DeepGait has been shown that the performances of the proposed method is outstanding under different walking variations (view, clothing, carrying bags). The OULP dataset, which includes 4007 subjects, makes our result reliable in a statically way. Even in a very low dimension, our proposed gait representation still outperforms the commonly used 11264-dimensional GEI. For further comparison, all the gait representation vectors are available.

Xianfu Zhang, Shouqian Sun, Chao Li, Xiangyu Zhao, Yuping Hu

Gait Identification by Joint Spatial-Temporal Feature

In order to extract the gait spatial-temporal feature, we propose a novel Long-Short Term Memory (LSTM) network for gait recognition in this paper. Given a gait sequence, a CNNs unit with three layers convolution neural networks is used to extract the spatial feature. Then the spatial feature vector is sent to the LSTM unit, which is used to extract the temporal feature. Based on the spatial-temporal feature vector, the triplet loss function is adopted to optimize the network parameters. The CNNs and LSTM unit are jointly trained to act as a gait spatial-temporal feature extractor for the gait recognition system. Finally extensive evaluations are carried out on the CASIA-B dataset. The results turn out that our network performs better than previous state-of-the art method. It shows great potential for the practical application.

Suibing Tong, Yuzhuo Fu, Heifei Ling, Enbang Zhang

A Convolutional Neural Network for Gait Recognition Based on Plantar Pressure Images

This paper proposed a novel gait recognition method that is based on plantar pressure images. Different from many conventional methods where hand-crafted features are extracted explicitly. We utilized Convolution Neural Network (CNN) for automatic feature extraction as well as classification. The peak pressure image (PPI) generated from the time series of plantar pressure images is used as the characteristic image for gait recognition in this study. Our gait samples are collected from 109 subjects under three kinds of walking speeds, and for each subject total 18 samples are gathered. Experimental results demonstrate that the designed CNN model can obtain very high classification accuracy as compared to many traditional methods.

Yanlin Li, Dexiang Zhang, Jun Zhang, Lina Xun, Qing Yan, Jingjing Zhang, Qingwei Gao, Yi Xia

Pose-Based Temporal-Spatial Network (PTSN) for Gait Recognition with Carrying and Clothing Variations

One of the most attractive biometric techniques is gait recognition, since its potential for human identification at a distance. But gait recognition is still challenging in real applications due to the effect of many variations on the appearance and shape. Usually, appearance-based methods need to compute gait energy image (GEI) which is extracted from the human silhouettes. GEI is an image that is obtained by averaging the silhouettes and as result the temporal information is removed. The body joints are invariant to changing clothing and carrying conditions. We propose a novel pose-based gait recognition approach that is more robust to the clothing and carrying variations. At the same time, a pose-based temporal-spatial network (PTSN) is proposed to extract the temporal-spatial features, which effectively improve the performance of gait recognition. Experiments evaluated on the challenging CASIA B dataset, show that our method achieves state-of-the-art performance in both carrying and clothing conditions.

Rijun Liao, Chunshui Cao, Edel B. Garcia, Shiqi Yu, Yongzhen Huang

Windowed DMD for Gait Recognition Under Clothing and Carrying Condition Variations

In this paper, we introduce a method based on Windowed Dynamic Mode Decomposition to enhance the texture of body parts on the Gait Energy Image that are not affected by the clothing and carrying condition variations, in order to improve the gait recognition accuracy under these kinds of variations. We obtain the best accurracy ($$71.37 \%$$71.37%) for large carrying condition variations reported in the literature for CASIA-B dataset. Unlike the deep learning based approaches the proposal method is simple and does not need training.

Jiawei Wang, Edel B. Garcia, Shiqi Yu, Dexin Zhang

Emerging Biometrics

Frontmatter

Research on Dig-Imprint Detection of Three-Dimensional Footprints

In Chinese forensic science, a three-dimensional footprint can provide us lots of information, such as sex, age and gait. Dig-imprint is one of the impressions in three-dimensional footprints that can show the biometric. However, the three-dimensional footprints are still analyzed artificially by forensic investigators, which is inefficient and subjective. In this research an algorithm for the automatic detection of dig-imprints of three-dimensional footprints was developed. Haar-like and LBP features were extracted from the dataset. Next, two classifiers were constructed with Adaboost algorithm using these two features. A dig-imprint database is constructed for evaluating the performance of the proposed method. Pictures of three-dimensional footprints were taken by the way of criminal scene photography. Then the dig-imprints were cut out as positive samples. The negative samples were also cut out from three-dimensional footprints. Experimental results shows that the proposed method achieves good detection accuracy.

Han Sun, Yunqi Tang, Wei Guo

ECG Based Identification by Deep Learning

Strategies were proposed for Electrocardiogram (ECG) based identification. Firstly, a selecting mechanism based on information entropy was used to obtain whole heart beat signal; Secondly, a Depth Neural Network (DNN) based on Denoising AutoEncoder (DAE) was adopted in feature selection unsupervised, by which, the robustness of the recognition system could be improved in recognizing. Finally, 98.10% and 95.67% recognition rate were obtained on self-collected calm and high pressure data sets respectively, and 94.39% rate on combined data sets of MIT arrhythmia database (mitdb) and self-collected data averagely.

Gang Zheng, Shengzhen Ji, Min Dai, Ying Sun

Realtime Human-UAV Interaction Using Deep Learning

In this paper, we propose a realtime human gesture identification for controlling a micro UAV in a GPS denied environment. Exploiting the breakthrough of deep convolution network in computer vision, we develop a robust Human-UAV Interaction (HUI) system that can detect and identify a person gesture to control a micro UAV in real time. We also build a new dataset with 23 participants to train or fine-tune the deep neural networks for human gesture detection. Based on the collected dataset, the state-of-art YOLOv2 detection network is tailored to detect the face and two hands locations of a human. Then, an interpreter approach is proposed to infer the gesture from detection results, in which each interpreted gesture is equivalent to a UAV flying command. Real flight experiments performed by non-expert users with the Bebop 2 micro UAV have approved our proposal for HUI. The gesture detection deep model with a demo will be publicly available to aid the research work.

Ali Maher, Ce Li, Hanwen Hu, Baochang Zhang

Adapting Convolutional Neural Networks on the Shoeprint Retrieval for Forensic Use

Shoeprint is an important evidence for crime investigation. Many automatic shoeprint retrieval methods have been proposed in order to efficiently provide useful information for the identification of the criminals. In the mean time, the convolutional neural network shows great capacity in image classification problem but its application in shoeprint retrieval is not yet investigated. This paper presents an application of VGG16 network as feature extractor in shoeprint retrieval and a data augmentation method to fine-tune the neural network with a very small database. Our method shows a much better performance compared with state-of-the-art methods on a same database with crime-scene-like shoeprints.

Yang Zhang, Huanzhang Fu, Emmanuel Dellandréa, Liming Chen

Motion Analysis Based Cross-Database Voting for Face Spoofing Detection

With the rapid development of face recognition systems in various practical applications, numerous face spoofing attacks under different environment and devices have emerged. The countermeasure of face spoofing attacks in cross-database have caused increasing attention. This paper proposes a face spoofing detection method with motion analysis based cross-database voting. We employ the consistency motion information of different databases like eye-blink, mouth movements and facial expression etc. Then the motion information maps of a video is classified to real or fake by CNN model. Furthermore, cross-database voting strategy is constructed to transfer motion characteristics from a database to another for face spoofing inference. Experimental results demonstrate that the proposed method outperforms its comparisons taking benefits of motion analysis based CNN classification and cross-database voting.

Lifang Wu, Yaowen Xu, Meng Jian, Wei Cai, Chuncan Yan, Yukun Ma

Personal Identification Based on Content-Independent EEG Signal Analysis

Interests in the use of biological signals have been rapidly growing in the past decades. Biometrics recognition based on electroencephalogram (EEG) has become a hotspot. In this paper, we propose a novel EEG biometrics system. The system contains automatic channel selection, wavelet feature extraction and Deep Neural Network (DNN) classifier. The channel selection can not only reduce the computational redundancy, but also improve the accuracy. A strategy of fusing EEG and physiological signal is adopted in the system. As a very useful supplement to other previous work, we specially endeavor to handle content-independent EEG biometrics. The proposed system is validated on a multimodal dataset, i.e. DEAP [1] for the authentication of the identity. We perform data augmentation through splitting the EEG signal by down sampling with different shift. An accuracy of 94% ± 3% is obtained in 10-fold validations. The results demonstrate the possibility of EEG biometrics under content-independent scenario.

Yifan Li, Yinghui Zhao, Taifeng Tan, Ningjie Liu, Yuchun Fang

Handwaving Authentication: Unlocking Your Smartwatch Through Handwaving Biometrics

The increasing usage of smartwatches to access sensitive and personal data while being applied in health monitoring and quick payment, has given rise to the need of convenient and secure authentication technique. However, traditional memory-based authentication methods like PIN are proved to be easily cracked or user-unfriendly. This paper presents a novel approach to unlock smartwatches or authenticate users’ identities on smartwatches by analyzing a users’ handwaving patterns. A filed study was conducted to design typical smartwatch unlocking scenarios and gather users’ handwaving data. Behavioral features were extracted to accurately characterize users’ handwaving patterns. Then a one-class classification algorithm based on scaled Manhattan distance was developed to perform the task of user authentication. Extensive experiments based on a newly established 150-person-time handwaving dataset with a smartwatch, are included to demonstrate the effectiveness of the proposed approach, which achieves an equal-error rate of 4.27% in free-shaking scenario and 14.46% in imitation-attack scenario. This level of accuracy shows that these is indeed identity information in handwaving behavior that can be used as a wearable authentication mechanism.

Zhao Wang, Chao Shen, Yufei Chen

Identification of the Normal and Abnormal Heart Sounds Based on Energy Features and Neural Network

A normal and abnormal heart sound identification method was put forward in the paper. The wavelet packet energy features of the heart sounds were extracted and LM-BP neural network was used as the classifier. Experimental results showed that the proposed algorithm converged much faster than traditional BP neural network, and achieved better results compared with two traditional heart sound processing methods based on STFT and Spectrogram analysis.

Ting Li, Hong Tang, Xiao-ke Xu

The Android-Based Acquisition and CNN-Based Analysis for Gaze Estimation in Eye Tracking

Over the past several years, the demand for eye tracking is increasing across fields of computer vision and pattern recognition, especially in commercial applications. However, the low prediction accuracy and the restriction of datasets and methods for special eye tracking equipment have been obstacles of the wide application of gaze estimation. In this paper, we develop an Android-based acquisition software named EyeTracker, to collect the first Chinese gaze dataset. And then we proposed a convolutional neural network framework for gaze estimation in eye tracking based on a single image. We evaluate our proposed analysis model on our dataset-EyeTrackD (tablet) and Gazecapture (part of phone data). Our model achieves a prediction error of 4.33 cm and 2.25 cm on these two datasets respectively, which are better than the previous method using the same data. Extensive experiments under different network settings show the effectiveness of our convolutional neural network framework.

Wei Wen, Tong Chen, Meng Yang

Voice and Speech

Frontmatter

Efficient Audio-Visual Speaker Recognition via Deep Heterogeneous Feature Fusion

Audio-visual speaker recognition (AVSR) has long been an active research area primarily due to its complementary information for reliable access control in biometric system, and it is a challenging problem mainly attributes to its multimodal nature. In this paper, we present an efficient audio-visual speaker recognition approach via deep heterogeneous feature fusion. First, we exploit a dual-branch deep convolutional neural networks (CNN) learning framework to extract and fuse the high-level semantic features of face and audio data. Further, by considering the temporal dependency of audio-visual data, we embed the fused features into a bidirectional Long Short-Term Memory (LSTM) networks to produce the recognition result, though which the speakers acquired under different challenging conditions can be well identified. The experimental results have demonstrated the efficiency of our proposed approach in both audio-visual feature fusion and speaker recognition.

Yu-Hang Liu, Xin Liu, Wentao Fan, Bineng Zhong, Ji-Xiang Du

Prioritized Grid Highway Long Short-Term Memory-Based Universal Background Model for Speaker Verification

Prioritized grid long short-term memory (pGLSTM) has been shown to improve automatic speech recognition efficiently. In this paper, we implement this state-of-the-art model of ASR tasks for text-independent Chinese language speaker verification tasks in which DNN/i-Vector (DNN-based i-Vector) framework is adopted along with PLDA backend. To fully explore the performance, we compared the presented pGLSTM based UBM to GMM-UBM and HLSTM-UBM. Due to constraint of the amount of Chinese transcribed corpus for ASR training, we also explore an adaptation method by firstly training the pGLSTM-UBM on English language with large amount of corpus and use a PLDA adaptation backend to fit into Chinese language before the final speaker verification scoring. Experiments show that both pGLSTM-UBM model with corresponding PLDA backend and pGLSTM-UBM with adapted PLDA backend achieve better performance than the traditional GMM-UBM model. Additionally the pGLSTM-UBM with PLDA backend achieves performance of 4.94% EER in 5 s short utterance and 1.97% EER in 10 s short utterance, achieving 47% and 51% drop comparing to that of GMM. Experiment results imply that DNN from ASR tasks can expand the advantage of UBM model especially in short utterance and that better DNN model for ASR tasks could achieve extra gain in speaker verification tasks.

Jianzong Wang, Hui Guo, Jing Xiao

Assistance of Speech Recognition in Noisy Environment with Sentence Level Lip-Reading

Acoustic speech recognition, as a technique to decode text from a speech, receives a great success in recent years. The trained model of Ping An Technology (ShenZhen) Co., Ltd results in a word error rate (WER) of 8.4%, which shows competitive performance among popular business products. However, an assumption of the achievement is the quiet environment of the speech. In a noisy environment, the accuracy will decrease 10%–20%. For the improvement in such environment, a multi-modal biometric system integrating acoustic speech-recognition with sentence level lip-reading is designed. In several noisy situations, the 5.7% averaged word error rate (WER) of the results of our integrated system indicates a significant improvement to the pure acoustic speech-recognition system.

Jianzong Wang, Yiwen Wang, Aozhi Liu, Jing Xiao

Video Surveillance

Frontmatter

Rich Features and Precise Localization with Region Proposal Network for Object Detection

Deep Network greatly accelerates the development of object detection. Recent advances in object detection are mainly attributed to the combination of deep network and region proposal methods [1–3]. However, the accuracy of object detection on the complicated datasets is still not satisfied, especially on small object detection. This is mainly because of the coarseness of the convolution feature maps. In this paper, we design a new strategy for generating region proposals and propose a new localization method for object detection. Compared with previous baseline detectors such as Fast R-CNN [4] and Faster R-CNN [5], Our method makes use of the adjacent-level feature maps at all scales to generate region proposals and also adopts the cascaded region proposal network (RPN) to fine-tune the location of the bounding box. Compared with other state-of-the-art methods, our method achieves the best recall and object detection accuracy.

Mengdie Chu, Shuai Wu, Yifan Gu, Yong Xu

Sparse Similarity Learning with a Local Appearance Model for Person Tracking

In this paper, we propose a novel approach to robust person tracking that combines an online bilinear similarity metric learning with a local appearance model in particle filter framework. Due to various appearance and motion changes of the target person in challenging scenarios, conventional pre-defined similarity metrics are prone to drifting in dealing with challenging sequences. To this end, we propose to learn a discriminative metric to distinguish the target object from the background by using a sparse online bilinear similarity function parameterized by a diagonal matrix. In addition, most metric learning based appearance models only consider the holistic representation and hence are sensitive to partial occlusion and cluttered background. To address this issue, we employ a local appearance model and a simple template update strategy to build a robust person tracker. Experimental results on several challenging person videos show that our tracker achieves superior performance to several state-of-the-art trackers.

Xiang Li, Min Xu, Zeqiang Wei, Liming Shao

Research on Temporal Structure for Action Recognition

Cameras may be employed to facilitate data collection, to serve as a data source for controlling actuators, or to monitor the status of a process which includes tracking. We proposed an algorithm to explore the temporal relations between trajectory groups in videos, and applied it to action recognition and intelligent human-machine interaction systems. The trajectory components are application-independent features, and function well as mid-level descriptors of actions in videos. The experiments demonstrated performance improvements compared with a pure bag-of-features method. The success of this semantics-free recognition method provides the potential to define high-level actions using low-level components and temporal the relationships between them. This is similar to the way humans perceive and recognize actions.

Wengang Feng, Huawei Tian, Yanhui Xiao

Real-Time Object Detection Using Efficient Convolutional Networks

While recent object detection approaches have greatly improved the accuracy and robustness, the detection speed remains a Challenge for the community. In this paper, we propose an efficient fully convolutional network (EFCN) for real time object detection. EFCN employs the lightweight MobileNet [1] as the base network to significantly reduce the computation cost. Meanwhile, it detects objects in feature maps with multiple scales, and deploys a refining module on the top of each of these feature maps to alleviate the accuracy loss brought by the simple base network. We evaluate EFCN on the challenging KITTI [2] dataset and compare it with the state-of-the-art methods. The results show that EFCN keeps a good balance between speed and accuracy, it has $$25{\times }$$25× fewer parameters and is up to $$31{\times }$$31× faster than Faster-RCNN [3] while maintaining similar or better accuracy.

Xian Zhou, You-Ji Feng, Xi Zhou

Holistic Crowd Interaction Modelling for Anomaly Detection

Dense crowd motion analysis in surveillance scenario is a daunting task that when occlusion and low resolution happen, it is difficult to make effective use of pedestrian detection and tracking algorithms. In this study, we introduce a crowd interaction modelling framework inspired by physical and social science studies. Instead of taking the pedestrian individual as the unit of analysis, the interaction among individuals could be modeled through the social force model (SFM), and for robust representation, a modified SFM is proposed. Experiments of the visualization and anomaly detection tested on UMN and Web dataset indicate SFM-based interaction modelling outperform optical flow and particle advection.

Jiaxing Pan, Dong Liang

Robust Partial Person Re-identification Based on Similarity-Guided Sparse Representation

In this paper, we study the problem of partial person re-identification (re-id). This problem is more difficult than general person re-identification because the body in probe image is not full. We propose a novel method, similarity-guided sparse representation (SG-SR), as a robust solution to improve the discrimination of the sparse coding. There are three main components in our method. In order to include multi-scale information, a dictionary consisting of features extracted from multi-scale patches is established in the first stage. A low rank constraint is then enforced on the dictionary based on the observation that its subspaces of each class should have low dimensions. After that, a classification model is built based on a novel similarity-guided sparse representation which can choose vectors that are more similar to the probe feature vector. The results show that our method outperforms existing partial person re-identification methods significantly and achieves state-of-the-art accuracy.

Min Ren, Lingxiao He, Haiqing Li, Yunfan Liu, Zhenan Sun, Tieniu Tan

Person Re-identification by Integrating Static Texture and Shape Cues

Person Re-Identification (Re-ID) is a challenging task with wide ranging applications in various fields. This paper presents a novel hand-crafted method for this issue, enhancing the state of the art ones in literature in two aspects. In contrast to most current studies that analyze texture features, we propose a discriminative and compact shape feature by applying Procrustes shape analysis. It not only retains shape distinctiveness of an individual sample, but also alleviates cross-view impacts. Furthermore, we combine the shape feature with some current popular texture features, namely LOMO and mid-level filters, so that the advantages of multiple clues can be jointly used. A score level fusion strategy is finally adopted to optimally integrate their credits. Evaluated on two public benchmarks, i.e. VIPeR and CUHK03, the proposed method achieves very competitive results, indicating its effectiveness in person Re-ID.

Canaan Tinotenda Madongo, Di Huang, Jiaxin Chen

Feature Extraction and Classification Theory

Frontmatter

Decision-Level Fusion Method Based on Deep Learning

We present a highly accurate and very efficient approach for personality traits prediction based on video. Unlike the traditional method, we proposed a decision-level information fusion method based on deep learning. We have separated the video modal into two parts, visual modal and audio model. The two models were processed by improved VGG-16 and LSTM network, respectively, and combined with an Extreme Learning Machine (ELM) to architecture decision-level information fusion. Experiments on challenging Youtube-8M dataset show that our proposed approach significantly outperforms traditional decision-level fusion method in terms of both efficiency and accuracy.

Kejun Wang, Meichen Liu, XueSen Hao, Xianglei Xing

Laplacian Regularized Non-negative Sparse Low-Rank Representation Classification

Recently low-rank becomes a popular tool for face representation and classification. None of these existing low-rank based classification methods are in view of the non-linear geometric structures within data, hence the data during the learning process may lose locality and similarity information. Furthermore, Lin et al. propose a Non-negative Sparse Hyper-Laplacian regularized LRR model (NSHLRR) to improve LRR in the above respect and apply it to image clustering. In this paper, we propose a novel classification method, namely NSHLRR-based Classification (NSHLRRC) for face recognition. Experimental results on public face databases clearly show our method has very competitive classification results, which also show that our method outperforms other state-of-the-art methods.

Jingshan Li, Caikou Chen, Xielian Hou, Rong Wang

The Variants of Weber Local Descriptor and Their Applications for Biometrics

In computer vision and pattern recognition, handcrafted local features play an important role in many tasks. Many effective handcrafted local features have been proposed. Among them, Weber Local Descriptor (WLD) is a successful one. WLD is a simple but powerful descriptor, and a lot of variants of WLD have also been proposed in recent years, which has been broadly used for texture classification as well as biometrics. In this paper, we make a review for WLD and its variants. Generally, the algorithms of WLD and its variants can be divided into categories such as differential excitation-based, orientation-based and multiple features based. We also summarize their applications for biometrics.

Jingting Lu, Hui Ye, Wei Jia, Yang Zhao, Hai Min, Wenxiong Kang, Lunke Fei

Generative Multi-region Segmentation by Utilizing Saliency Information

In traditional method, multi-region segmentation is treated as a pre-operation process of semantic method. A method for automatically partitioning an image into multiple regions is presented in this paper. Motivated by the observation that saliency information can exhibit plentiful meaningful cues for segmentation, we propose a semi-supervised multi-region segmentation method in this paper. Saliency features are applied for seeds location together with color information, then the multiregion segmentation problem is solved using a generative semi-supervised framework in which the selected seeds are treated as initializations. The segmentation results are further refined using a segmentation composition strategy. We demonstrate the effectiveness of our algorithm against the state-of-the-art methods on the benchmark Berkley segmentation dataset.

Lei Zhou, Yubin Xia

Truncated Nuclear Norm Based Low Rank Embedding

Dimensionality reduction, also called feature extraction, is an important issue in pattern recognition. However, many existing dimensionality reduction methods, such as principal component analysis, fail when there exist noises in data, especially for noise caused by outliers or corruption. Recently, a robust method, named low-rank embedding (LRE) is proposed, which uses the nuclear norm for characterizing the low rank structure hided in the data. However, one major limitation of the nuclear norm is that each singular value is treated equally, since the nuclear norm is defined as the sum of all singular values. Thus the rank function may not be well approximated in practice. To overcome this drawback, this paper presents a truncated nuclear norm based low rank embedding (Truncated-LRE). The truncated nuclear norm can approximate the rank function more accurately than nuclear norm. Experimental results show encouraging results of the proposed methods in comparison with the state-of-the-art matrix dimensionality reduction methods.

Fanlong Zhang, Heyou Chang, Guowei Yang, Zhangjing Yang, Minghua Wan

Behavioral Biometrics

Frontmatter

Chinese Sign Language Recognition Based on SHS Descriptor and Encoder-Decoder LSTM Model

This paper presents a novel approach to recognize isolated Chinese sign language. In order to better distinguish different hand shapes, a new Specific Hand Shape (SHS) descriptor is proposed. Based on the SHS descriptor, an encoder-decoder LSTM model is applied to achieve better sign recognition results. A specific hand shape database and an 80 words isolated Chinese sign language database are constructed using Kinect 2.0 to evaluate the proposed methods. Experimental results show the proposed SHS descriptor is more discriminative than the traditional HOG descriptor and the recognition model is more efficient than the HMM based approach.

Xiaoxu Li, Chensi Mao, Shiliang Huang, Zhongfu Ye

SCUT-MMSIG: A Multimodal Online Signature Database

In this paper, we present a multimodal online signature database (SCUT-MMSIG). The database was collected on three different devices: mobile phone, pen tablet and monocular camera, consisting of three subcorpora, namely, mobile, tablet and in-air signature subcorpus. In total, 50 subjects are included in each subcorpus, with 20 genuine samples and 20 skilled forgeries per subject. In addition, we explain the signature acquisition process and several verification protocols for further study. The experimental results of a simple DTW-based verification method are also reported under the proposed verification protocols. The database and evaluation codes will be publicly available online.

Xinyi Lu, Yuxun Fang, Wenxiong Kang, Zhiyong Wang, David Dagan Feng

Orientation-Aware Text Proposals Network for Scene Text Detection

In this paper, we present a novel Orientation-Aware Text Proposals Network (OA-TPN) for detecting text in the wild. The OA-TPN is able to accurately localize arbitrary-oriented text lines in a natural image. Instead of detecting the whole text line at one time, the OA-TPN detects sequences of small-scale orientation-aware text proposals. To handle text lines with different orientations, we utilize deep networks to jointly estimate text proposals with associate directions at the convolutional maps. Final text bounding boxes can be generated from the predicted text proposals by implementing a proposed text-line construction approach. The proposed text detector works reliably on multi-scale and multi-orientation text with single-scale images. Experimental results on the MSRA-TD500 and SWT demonstrate the effectiveness of our methods.

Huijuan Huang, Zhi Tian, Tong He, Weilin Huang, Yu Qiao

Uyghur Off-Line Signature Recognition Based on Local Central Line Features

In this paper, a local central line features based off-line signature recognition method proposed for Uyghur handwritten signature. The signature images were pre-processed based on the nature of Uyghur signature firstly. Then global central line features (GCLF-16, GCLF-24, and GCLF-32), local central line features from two horizontally centers (2LCLF-16H, 2LCLF-24H, and 2LCLF-32H) and local central line features from two vertically centers (2LCLF-16V, 2LCLF-24V, and 2LCLF-32V) were extracted respectively. Experiments were performed using Euclidean distance based similarity measuring method and non-linear SVM classifier for Uyghur signature samples from 75 different people with 1500 signatures, two kinds of experiments were performed for and variations in the number of training and testing datasets, and a high recognition rate of 96.8% was achieved with 2LCLF-32H. The experimental results indicated that modified corner curve features in this paper can efficiently capture the writing style of Uyghur signature.

Kurban Ubul, Ya-li Zhu, Mutallip Mamut, Nurbiya Yadikar, Tuergen Yibulayin

Backmatter

Titel: Biometric Recognition
herausgegeben von: Jie Zhou
Yunhong Wang
Zhenan Sun
Yong Xu
Linlin Shen
Dr. Jianjiang Feng
Shiguang Shan
Yu Qiao
Zhenhua Guo
Shiqi Yu
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-69923-3
Print ISBN: 978-3-319-69922-6
DOI: https://doi.org/10.1007/978-3-319-69923-3