Skip to main content
main-content

Über dieses Buch

​This three-volume set LNCS 10361, LNCS 10362, and LNAI 10363 constitutes the refereed proceedings of the 13th International Conference on Intelligent Computing, ICIC 2017, held in Liverpool, UK, in August 2017.

The 221 full papers and 15 short papers of the three proceedings volumes were carefully reviewed and selected from 639 submissions. This second volume of the set comprises 74 papers. The papers are organized in topical sections such as Pattern Recognition; Image Processing; Virtual Reality and Human-Computer Interaction; Healthcare Informatics Theory and Methods; Genetic Algorithms; Blind Source Separation; Intelligent Fault Diagnosis; Machine Learning; Knowledge Discovery and Data Mining; Gene Expression Array Analysis; Systems Biology; Modeling, Simulation, and Optimization of Biological Systems; Intelligent Computing in Computational Biology; Computational Genomics; Computational Proteomics; Gene Regulation Modeling and Analysis; SNPs and Haplotype Analysis; Protein-Protein Interaction Prediction; Protein Structure and Function Prediction; Next-Gen Sequencing and Metagenomics; Structure Prediction and Folding; Biomarker Discovery; Applications of Machine Learning Techniques to Computational Proteomics, Genomics, and Biological Sequence Analysis; Biomedical Image Analysis; Human-Machine Interaction: Shaping Tools Which Will Shape Us; Protein and Gene Bioinformatics: Analysis, Algorithms and Applications; Special Session on Computer Vision based Navigation; Neural Networks: Theory and Application.

Inhaltsverzeichnis

Frontmatter

Pattern Recognition

Frontmatter

Face Recognition via Domain Adaptation and Manifold Distance Metric Learning

A novel approach for face recognition via domain adaptation and manifold distance metric learning is presented in this paper. Recently, unconstrained face recognition is becoming a research hot in computer vision. For the non-independent and identically distributed data set, the maximum mean discrepancy algorithm in domain adaption learning is used to represent the difference between the training set and the test set. At the same time, assume that the same type of face data are distributed on the same manifold and the different types of face data are distributed on different manifolds, the face image set is used to model multiple manifolds and the distance between affine hulls is used to represent the distance between manifolds. At last, a projection matrix will be explored by maximizing the distance between manifolds and minimizing the difference between the training set and test set. A large number of experimental results on different face data sets show the efficiency of the proposed method.

Bo Li, Ping-Ping Zheng, Jin Liu, Xiao-Long Zhang

Image Processing

Frontmatter

Generalized Cubic Hermite Interpolation Based on Perturbed Padé Approximation

Generalized cubic Hermite interpolation was constructed by using perturbed Padé approximation in this paper. We generalize our method to the $$ 2n + 1 $$2n+1 times Hermite interpolation of $$ n + 1 $$n+1 points and study its barycentric form. Numerical example is given to show the effectiveness of our method. Finally, we further generalize the proposed method to generalized cubic Hermite interpolation based on perturbed Chebyshev-Padé approximation.

Le Zou, Liang-Tu Song, Xiao-Feng Wang, Yan-Ping Chen

Automatic License Plate Recognition Using Local Binary Pattern and Histogram Matching

This paper proposes new real time license plate recognition (LPR) system that is capable of motion tracking and recognition of license plate. The best frame taken from the video has been chosen which is found to be about 4 m apart from camera position. For further processing, lower half section of vehicle image has been cropped of sized (450 × 140) while tracking. Local Binary Pattern (LBP) and histogram matching technique are used to detect license plate. Due to the robustness of LBP features, this method can adaptively deal with various changes such as rotation, scaling, and illumination in the license plate. Segmentation of the plate region into disjoint characters has been done with bounding box technique with some modifications. Recognition has been done by calculating histogram features. Minimum distance classifier has been used for features matching. The system is tested on more than 300 images and it gives 96.14% detection and 89.35% of recognition accuracy. This system is designed to recognize license plate of small, medium as well as large vehicles. It is also capable to detect single line and two line license plates format.

Ashutosh Kumar Bachchan, Apurba Gorai, Phalguni Gupta

Leg Ulcer Long Term Analysis

Ulcers on legs and feet usually require long-term clinical treatment and follow-up. To facilitate the monitoring, we propose a fully automatic and low-cost method for ulcers detection and analysis. The ulcer segmentation is performed using an automatic processing based on pixel’s classification into background or not background classes. Features used to perform the classification are the values of three channels that define each pixel in the RGB color map and in the HSV color map. We tested the algorithm on a dataset of 92 images, acquired from 14 different patients. The segmentation performances were evaluated in terms of overlap, recall and precision, by comparing the automatic segmentation with the manually one. The results show good average values of overlap, recall and precision.Then, a Self-Organizing Map (SOM) was used for tissue classification. The SOM was trained in order to identify six colorimetric classes associated to different type of tissues.

Eros Pasero, Cristina Castagneri

Virtual Reality and Human-Computer Interaction

Frontmatter

Assessing Learners’ Reasoning Using Eye Tracking and a Sequence Alignment Method

In this paper we aim to assess students’ reasoning in a clinical problem-solving task. We propose to use students’ eye movements to measure the scan path followed while resolving medical cases, and a sequence alignment method, namely, the pattern searching algorithm to evaluate their analytical reasoning. Experimental data were gathered from 15 participants using an eye tracker. We present by using gaze data that the proposed approach can be reliably applied to eye movement sequence comparison. Our results have implications for improving novice clinicians’ reasoning abilities in particular and enhancing students’ learning outcomes in general.

Asma Ben Khedher, Imène Jraidi, Claude Frasson

Healthcare Informatics Theory and Methods

Frontmatter

An Intelligent Systems Approach to Primary Headache Diagnosis

In this study, the problem of primary headache diagnosis is considered, referring to multiple frames of reference, including the complexity characteristics of living systems, the limitation of human information processing, the enduring nature of headache throughout history, and the potential for intelligent systems paradigms to both broaden and deepen the scope of such diagnostic solutions. In particular, the use of machine learning is recruited for this study, for which a dataset of 836 primary headache cases is evaluated, originating from two medical centres located in Turkey. Five primary headache classes were derived from the data obtained, namely Tension Type Headache (TTH), Chronic Tension Type Headache (CTTH), Migraine with Aura (MwA), Migraine without Aura (MwoA), followed by Trigeminal Autonomic Cephalalgia (TAC). A total of 9 machine learning based classifiers, ranging from linear to non-linear ensembles, in addition to 1 random baseline procedure, were evaluated within a supervised learning setting, yielding highest performance outcomes of AUC 0.985, sensitivity 1, and specificity 0.966. The study concludes that modern computing platforms represent a promising setting through which to realise intelligent solutions, enabling the space of analytical operations needed to drive forward diagnostic capability in the primary headache domain and beyond.

Robert Keight, Ahmed J. Aljaaf, Dhiya Al-Jumeily, Abir Jaafar Hussain, Aynur Özge, Conor Mallucci

Genetic Algorithms

Frontmatter

Benchmarking and Evaluating MATLAB Derivative-Free Optimisers for Single-Objective Applications

MATLAB® builds in a number of derivative-free optimisers (DFOs), conveniently providing tools beyond conventional optimisation means. However, with the increase of available DFOs and being compounded by the fact that DFOs are often problem dependent and parameter sensitive, it has become challenging to determine which one would be most suited to the application at hand, but there exist no comparisons on MATLAB DFOs so far. In order to help engineers use MATLAB for their applications without needing to learn DFOs in detail, this paper evaluates the performance of all seven DFOs in MATLAB and sets out an amalgamated benchmark of multiple benchmarks. The DFOs include four heuristic algorithms - simulated annealing, particle swarm optimization (PSO), the genetic algorithm (GA), and the genetic algorithm with elitism (GAe), and three direct-search algorithms - Nelder-Mead’s simplex search, pattern search (PS) and Powell’s conjugate search. The five benchmarks presented in this paper exceed those that have been reported in the literature. Four benchmark problems widely adopted in assessing evolutionary algorithms are employed. Under MATLAB’s default settings, it is found that the numerical optimisers Powell is the aggregative best on the unimodal Quadratic Problem, PSO on the lower dimensional Scaffer Problem, PS on the lower dimensional Composition Problem, while the extra-numerical genotype GAe is the best on the Varying Landscape Problem and on the other two higher dimensional problems. Overall, the GAe offers the highest performance, followed by PSO and Powell. The amalgamated benchmark quantifies the advantage and robustness of heuristic and population-based optimisers (GAe and PSO), especially on multimodal problems.

Lin Li, Yi Chen, Qunfeng Liu, Jasmina Lazic, Wuqiao Luo, Yun Li

Blind Source Separation

Frontmatter

Dependent Source Separation with Nonparametric Non-Gaussianity Measure

Separating statistically dependent source signals from their linear mixtures is a challenging problem in signal processing society. Firstly, we show that maximization of the non-Gaussianity (NG) measure among the separated signals can realize dependent source signals separation. Then, based on cumulative distribution function (CDF) instead of traditional probability density function (PDF), the NG measure is defined by utilizing statistical distances between different distributions. After that, the CDF based objective function is estimated by utilizing nonparametric order statistics (OS). At last, by consulting the stochastic gradient rule of constrained optimization problem, the efficiently nonparametric dependent sources separation algorithm is derived and termed as nonpNG. Simulation results demonstrate the validity of the proposed statistically dependent sources separation algorithm.

Fasong Wang, Li Jiang, Rui Li

Intelligent Fault Diagnosis

Frontmatter

Convolutional Neural Network Based Bearing Fault Diagnosis

In this paper, we propose a new bearing fault diagnosis method without the feature extraction, based on Convolutional Neural Network (CNN). The 1-D vibration signal is converted to 2-D data called vibration image. Then, the vibration images are fed into the CNN for bearing fault classification. Experiments are carried out with bearing data from the Case Western Reserve University Bearing Fault Database and its result are compared with the results of other methods to show the effectiveness of the proposed algorithm.

Duy-Tang Hoang, Hee-Jun Kang

Machine Learning

Frontmatter

A Performance Evaluation of Systematic Analysis for Combining Multi-class Models for Sickle Cell Disorder Data Sets

Machine learning approach is considered as a field of science aiming specifically to extract knowledge from the data sets. The main aim of this study is to provide a sophisticate model to difference applications of machine learning models for medically related problems. We attempt for classifying the amount of medications for each patient with Sickle Cell disorder. We present a new technique to combine two classifiers between the Levenberg-Marquartdt training algorithm and the k-nearest neighbours algorithm. In this paper, we introduce multi-class label classification problem in order to obtain training and testing methods for each models along with other performance evaluations. In machine learning, the models utilise a training sets in association with building a classifier that provide a reliable classification. This research discusses different aspects of machine learning approaches for the classification of biomedical data. We are mainly focus on the multi-class label classification problem where many number of classes are available in the data sets. Results have indicated that for the machine learning models tested, the combination classifiers were found to yield considerably better results over the range of performance measures that been selected for this research.

Mohammed Khalaf, Abir Jaafar Hussain, Dhiya Al-Jumeily, Robert Keight, Russell Keenan, Ala S. Al Kafri, Carl Chalmers, Paul Fergus, Ibrahim Olatunji Idowu

Knowledge Discovery and Data Mining

Frontmatter

Feature Selection Based on Density Peak Clustering Using Information Distance Measure

Feature selection is one of the most important data preprocessing techniques in data mining and machine learning. A new feature selection method based on density peak clustering is proposed. The new method applies an information distance between features as clustering distance metric, and uses the density peak clustering method for feature clustering. The representative feature of each cluster is selected to generate the final result. The method can avoid selecting the irrelevant representative feature from one cluster, where most features are irrelevant to class label. The comparison experiments on ten datasets show that the feature selection results of the proposed method exhibit improved classification accuracies for different classifiers.

Jie Cai, Shilong Chao, Sheng Yang, Shulin Wang, Jiawei Luo

Gene Expression Array Analysis

Frontmatter

Joint Sample Expansion and 1D Convolutional Neural Networks for Tumor Classification

Since tumors seriously endanger human health, early detection of the tumor is especially critical for the treatment of patients. How to effectively differentiate the tumor samples from normal samples is becoming a notable topic. In this paper, a joint Sample Expansion and 1D Convolutional Neural Network method (SE1DCNN) is proposed for tumor classification. In our method, inspired by the denoising idea, a Sample Expansion (SE) method is proposed. In addition to maintaining the merits of corrupted data, the expanded samples can deal with the problem of insufficiently training samples of gene expression data to a certain extent when using deep learning models. Since CNN models have an excellent performance in classification tasks, the applicability of 1DCNN on gene expression data is analyzed. Finally, we design a 7-layer 1DCNN model to classify the tumor gene expression data by using the expanded samples and raw samples. Experimental studies indicate that SE1DCNN is quite useful in tumor classification task.

Jian Liu, Yuhu Cheng, Xuesong Wang, Yi Kong

Systems Biology

Frontmatter

2DIs: A SBML Compliant Web Platform for the Design and Modeling of Immune System Interactions

We present 2DIs, a web platform that allows the easy design of extracellular models of the immune system function, including the possibility to describe the most important immune system entities and interactions and to produce a validated SBML file of the model. 2DIs permits immunologists to directly describe and share their knowledge with other colleagues and, more importantly, with modelers that can therefore obtain an SBML-compliant modelling template of the immunological process that can be used to develop the computational model. This could introduce a novel way of communicating among immunologists and modelers, reducing the risk of errors and misinterpretations.

Marzio Pennisi, Giulia Russo, Giuseppe Sgroi, Giuseppe Parasiliti, Francesco Pappalardo

Modeling, Simulation, and Optimization of Biological Systems

Frontmatter

Modeling Neuron-Astrocyte Interactions: Towards Understanding Synaptic Plasticity and Learning in the Brain

Spiking neural networks represent a third generation of artificial neural networks and are inspired by computational principles of neurons and synapses in the brain. In addition to neuronal mechanisms, astrocytic signaling can influence information transmission, plasticity and learning in the brain. In this study, we developed a new computational model to better understand the dynamics of mechanisms that lead to changes in information processing between a postsynaptic neuron and an astrocyte. We used a classical stimulation protocol of long-term plasticity to test the model functionality. The long-term goal of our work is to develop extended synapse models including neuron-astrocyte interactions to address plasticity and learning in cortical synapses. Our modeling studies will advance the development of novel learning algorithms to be used in the extended synapse models and spiking neural networks. The novel algorithms can provide a basis for artificial intelligence systems that can emulate the functionality of mammalian brain.

Riikka Havela, Tiina Manninen, Ausra Saudargiene, Marja-Leena Linne

Modeling PI3K/PDK1/Akt and MAPK Signaling Pathways Using Continuous Petri Nets

Malignant melanoma is an invasive skin cancer commonly resistant to conventional therapeutic approaches. Genetic and molecular alterations as mutations of BRAF gene, able to constitutively activate MAPK and PI3K/PDK1/Akt signalling pathways, seem to be responsible of malignant melanocytic transformation and lead to aberrant cellular physiological processes. Specific regulators and modulators of both signaling pathways may represent promising therapeutic targets to investigate drug resistance typical of BRAF-inhibitors such as Dabrafenib. We developed a continuous Petri Net model that simulates both MAPK and PI3K/PDK1/Akt pathways and their interactions in order to analyze the complex kinase cascades in melanoma and to predict new crucial nodes involved in drug resistance like in the Ras arm.

Giulia Russo, Marzio Pennisi, Roberta Boscarino, Francesco Pappalardo

Keratoconus Diagnosis by Patient-Specific 3D Modelling and Geometric Parameters Analysis

The aim of this study is to describe a new technique for diagnosing keratoconus based on Patient-specific 3D modelling. This procedure can diagnose small variations in the morphology of the cornea due to keratoconus disease. The posterior corneal surface was analysed using an optimised computational geometric procedure and raw data provided by a corneal tomographer. A retrospective observational case series study was carried out. A total of 86 eyes from 86 patients were obtained and divided into two groups: one group composed of 43 healthy eyes and the other of 43 eyes diagnosed with keratoconus. The predictive value of each morphogeometric variable was established through a receiver operating characteristic (ROC) analysis. The posterior apex deviation variable showed the best keratoconus diagnosis capability (area: 0.9165, p < 0.000, std. error: 0.035, 95% CI: 0.846-0.986), with a cut-off value of 0.097 mm and an associated sensitivity and specificity of 89% and 88%, respectively. Patient-specific geometric models of the cornea can provide accurate quantitative information about the morphogeometric properties of the cornea on several singular points of the posterior surface and describe changes in the corneal anatomy due to keratoconus disease. This accurate characterisation of the cornea enables new evaluation criteria in the diagnosis of this type of ectasia and demonstrates that a device-independent approach to the diagnosis of keratoconus is feasible.

Laurent Bataille, Francisco Cavas-Martínez, Daniel G. Fernández-Pacheco, Francisco J. F. Cañavate, Jorge L. Alio

On Checking Linear Dependence of Parametric Vectors

Checking linear dependence of a finite number of vectors is a basic problem in linear algebra. We aim to extend the theory of linear dependence to parametric vectors where the entries are polynomials. This dependency depends on the specifications of the parameters or values of the variables in the polynomials. We propose a new method to check if parametric vectors are linearly dependent. Furthermore, this new method can also give the maximal linearly independent subset, and by which the remaining vectors are expressed in a linear combination. The new method is based on the computation of comprehensive Gröbner system for a finite set of parametric polynomials.

Xiaodong Ma, Yao Sun, Dingkang Wang, Yushan Xue

Intelligent Computing in Computational Biology

Frontmatter

Effective Identification of Hot Spots in PPIs Based on Ensemble Learning

The experiment of alanine scanning has shown that most of the binding energies in protein-protein interactions are contributed by a few significant residues at the protein-protein interfaces, and those important residues are called hot spot residues. On the basis of protein-protein interaction, hot spot residues tend to get together to form modules, and those modules are defined as hot regions. So, hot spot residues play an important role in revealing the life activities of organisms. Therefore, how to predict hot spot residues and non-spot residues effectively and accurately is a vital research direction. A new method is proposed combining protein amino acid physicochemical features and structural features to predict the hot spot residues based on the ensemble learning. The experimental results demonstrate that this method of prediction hot spot residues has a good effect.

Xiaoli Lin, QianQian Huang, Fengli Zhou

Fast Significant Matches of Position Weight Matrices Based on Diamond Sampling

Position weight matrices are important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper, we present techniques for increasing the speed of sequence analysis using position weight matrices. Our techniques also permit the user to specify a p threshold to indicate the desired trade-off between sensitivity and speed for a particular sequence analysis. The resulting increase in speed should allow our algorithm to be used more widely in searching with large-scale sequence and annotation projects.

Liang-xin Gao, Hong-bo Zhang, Lin Zhu

Combination of EEG Data Time and Frequency Representations in Deep Networks for Sleep Stage Classification

Almost all of the studies in the literature of sleep stage classification are based on traditional statistical learning techniques from a set of extracted features, which need a relative amount of time and effort. Deep learning offers approaches able to automatically extract patterns and abstractions from different types of data (images, sound, biomedical signals, etc.) to perform classification. However, the application of these techniques in the automatic sleep stage scoring field is less widespread to date. This paper proposes a new approach based on a multi-state deep learning neural network architecture, which we named Asymmetrical Multi-State Neural Network. This new network is able to merge two different neural networks, based on two different architectures receiving different input data: single-channel EEG raw signal in time and the respective spectrum. The proposed Asymmetrical Multi-State Neural Network shows to enhance the separated networks’ performance for the given problem on a complete well-known sleep database.

Martí Manzano, Alberto Guillén, Ignacio Rojas, Luis Javier Herrera

An Effective Strategy for Trait Combinations in Multiple-Trait Genomic Selection

Multiple-trait genomic selection (MTGS) is a recently developed method of genomic selection for satisfying the requirements of actual breeding, which usually aims to improve multiple traits simultaneously. Although many efforts have been made to develop MTGS prediction models, how to set the trait combination for the best performance of MTGS prediction models is still under exploration. In this study, we first classified the traits into two groups according to the single-trait genomic selection predictions: traits with a relatively high and low prediction performance. Then, we constructed three trait combinations (High & High, Low & Low, and High & Low) and evaluated their effects on the performance of a state-of-the-art MTGS prediction model using phenotypic and genotypic data from a maize diversity panel. Cross-validation experimental results indicate that single trait predictions could be used as reference for trait combinations in multi-trait genomic selection.

Zhixu Qiu, Yunjia Tang, Chuang Ma

A Multiway Semi-supervised Online Sequential Extreme Learning Machine for Facial Expression Recognition with Kinect RGB-D Images

This paper aims to develop a facial expression recognition algorithm for a personal digital assistance application. Based on the Kinect RGB-D images, we propose a multiway extreme learning machine (MW-ELM) for facial expression recognition, which reduces the computing complexity significantly by processing the RGB and Depth channels separately at the input layer. Referring to our earlier work on semi-supervised online sequential extreme learning machine (SOS-ELM) that enhances the application to do the fast and incremental learning based on a few labeled samples together with some un-labeled samples of the specific user, we propose to do the parameter training with semi-supervising and on-line sequential methods for the higher hidden layer. The experiment of our proposed multiway semi-supervised online sequential extreme learning machine (MW-SOS-ELM) applying in the facial expression recognition, shows that our proposed approach achieves almost the same recognition accuracy with SOS-ELM, but reduces recognition time significantly, under the same configuration of hidden nodes. Additionally, the experiments show that our semi-supervised learning scheme reduces the requirement of labeled data sharply.

Xibin Jia, Xinyuan Chen, Jun Miao

Protein Hot Regions Feature Research Based on Evolutionary Conservation

The hot regions of protein interactions refer to the activity scope where hot spots are found to be buried and tightly packing with other residues. The discovery and understanding of hot region is an important way to uncover protein functional activities, such as cell metabolism and signaling pathway, immune recognition and DNA replication, protein synthesis. In this study, machine learning method is used to discover the three aspects features of hot region from sequence conservation, structure conservation and energy conservation, which create conservation scoring algorithm though multiple sequence alignment, module substitute matrix, structural similarity and molecular dynamics simulation. This study has important theoretical and practical significance on promoting hot region research, which also provides a useful way to deeply investigate the functional activities of proteins.

Jing Hu, Xiaoli Lin, Xiaolong Zhang

CMFHMDA: Collaborative Matrix Factorization for Human Microbe-Disease Association Prediction

The research on microorganisms indicates that microbes are abundant in human body, which have closely connection with various human noninfectious diseases. The deep research of microbe-disease associations is not only helpful to timely diagnosis and treatment of human diseases, but also facilitates the development of new drugs. However, the current knowledge in this domain is still limited and far from complete. Here, we proposed the computational model of Collaborative Matrix Factorization for Human Microbe-Disease Association prediction (CMFHMDA) by integrating known microbe-disease associations and Gaussian interaction profile kernel similarity for microbes and diseases. A special matrix factorization algorithm was introduced here to update the correlation matrix about microbes and diseases for inferring the most possible disease-related microbes. Leave-one-out Cross Validation (LOOCV) and k-fold cross Validation were implemented to evaluate the prediction performance of this model. As a result, CMFHMDA obtained AUCs of 0.8858 and 0.8529 based on 5-fold cross validation and Global LOOCV, respectively. It is no doubt that CMFHMDA could be used to identify more potential microbes associated with important noninfectious human diseases.

Zhen Shen, Zhichao Jiang, Wenzheng Bao

Computational Genomics

Frontmatter

Accurately Estimating Tumor Purity of Samples with High Degree of Heterogeneity from Cancer Sequencing Data

Tumor purity is the proportion of tumor cells in the sampled admixture. Estimating tumor purity is one of the key steps for both understanding the tumor micro-environment and reducing false positives and false negatives in the genomic analysis. However, existing approaches often lose some accuracy when analyzing the samples with high degree of heterogeneity. The patterns of clonal architecture shown in sequencing data interfere with the data signals that the purity estimation algorithms expect. In this article, we propose a computational method, EMPurity, which is able to accurately infer the tumor purity of the samples with high degree of heterogeneity. EMPurity captures the patterns of both the tumor purity and clonal structure by a probabilistic model. The model parameters are directly calculated from aligned reads, which prevents the errors transferring from the variant calling results. We test EMPurity on a series of datasets comparing to three popular approaches, and EMPurity outperforms them on different simulation configurations.

Yu Geng, Zhongmeng Zhao, Ruoyu Liu, Tian Zheng, Jing Xu, Yi Huang, Xuanping Zhang, Xiao Xiao, Jiayin Wang

Identifying Heterogeneity Patterns of Allelic Imbalance on Germline Variants to Infer Clonal Architecture

It is suggested that the evolution of somatic mutations may be significant impacted by inherited polymorphisms, while the clonal somatic copy-number mutations may contribute to the potential selective advantages of heterozygous germline variants. A fine resolution on clonal architecture of such cooperative germline-somatic dynamics provides insight into tumour heterogeneity and offers clinical implications. Although it is reported that germline allelic imbalance patterns often play important roles, existing approaches for clonal analysis mainly focus on single nucleotide sites. To address this need, we propose a computational method, GLClone that identifies and estimates the clonal patterns of the copy-number alterations on germline variants. The core of GLClone is a hierarchical probabilistic model. The variant allelic frequencies on germline variants are modeled as observed variables, while the cellular prevalence is designed as hidden states and estimated by Bayesian posteriors. A variational approximation algorithm is proposed to train the model and estimate the unknown variables and model parameters. We examine GLClone on several groups of simulation datasets, which are generated by different configurations, and compare to three popular state-of-the-art approaches, and GLClone outperforms on accuracy, especially a complex clonal structure exists.

Yu Geng, Zhongmeng Zhao, Jing Xu, Ruoyu Liu, Yi Huang, Xuanping Zhang, Xiao Xiao, Maomao, Jiayin Wang

Computational Proteomics

Frontmatter

Predicting Essential Proteins Using a New Method

Essential proteins are indispensable for the survival of organisms. Computational methods for predicting essential proteins in terms of the global protein-protein interaction (PPI) networks is severely restricted due to the insufficiency of the PPI data, but fortunately the subcellular localization information helps to make up the deficiency. In the study, a new method named CNC is developed to detect essential proteins. First, the subcellular localization information is incorporated into the PPI networks, so each interaction in the networks is weighted. Meanwhile the edge clustering coefficient of each pair interacting proteins is calculated and the second weighted value of each interaction in the networks is gained. The two kinds of weighted values are integrated to build a new weighted PPI networks. The proteins in the new weighted networks are scored by the weighted degree centrality (WDC) and sorted in descending order of their scores. Six methods, i.e., CNC, CIC, DC, NC, PeC and WDC are used to prioritize the proteins in the yeast PPI networks. The results demonstrate that CNC outperforms other state-of-the-art ones. At the same time, the analysis also mean that CNC is an effective technology to identify essential proteins by integrating different biological data.

Xi-wei Tang

Gene Regulation Modeling and Analysis

Frontmatter

Combining Gene Expression and Interactions Data with miRNA Family Information for Identifying miRNA-mRNA Regulatory Modules

It is well known that microRNAs (miRNAs) play pivotal roles in gene expression, transcriptional regulation and other important biological processes. An impressive body of literature indicates that miRNAs and mRNAs work cooperatively to form an important part of gene regulatory modules which are extensively involved in cancer. However, with the accumulation of available data, it is a great challenge to identify cancer-related miRNA regulatory modules and uncover their precise regulatory mechanism. This paper proposed a novel computational framework by combining gene expression and interaction data with miRNA family information to identify miRNA-mRNA regulatory modules (GIFMRM), which was evaluated on three heterogeneous datasets. Literature survey, biological significance and functional enrichment analysis were used to validate the obtained results. The analysis results show that the modules identified are highly correlated with the biological conditions in their respective datasets, and they enrich in GO biological processes and KEGG pathways.

Dan Luo, Shu-Lin Wang, Jianwen Fang

SNPs and Haplotype Analysis

Frontmatter

Association Mapping Approach into Type 2 Diabetes Using Biomarkers and Clinical Data

The global growth in the incidence of Type 2 Diabetes (T2D) has become a major international health concern. As such, understanding the etiology of Type 2 Diabetes is vital. This paper investigates a variety of statistical methodologies at various level of complexity to analyze genotype data and identify biomarkers that show evidence of increased susceptibility to T2D and related traits. A critical overview of several selected statistical methods for population-based association mapping particularly case-control genetic association analysis is presented. A discussion on a dataset accessed in this paper that includes 3435 female subjects for cases and controls with genotype information across 879071 Single Nucleotide Polymorphism (SNPs) is presented. Quality control steps into the dataset through pre-processing phase are performed to remove samples and markers that failed the quality control test. Association analysis is discussed to address which statistical method is appropriate for the dataset. Our genetic association analysis produced promising results and indicated that Allelic association test showed one SNP above the genome-wide significance threshold of $$ 5 \times 10^{ - 8} $$5×10-8 which is rs10519107 $$ \left( {{\text{Odds }}\,{\text{Ratio}}\, \left( {\text{OR}} \right) = 0.7409, \,{\text{P}} - {\text{Value }}({\text{P}}) = 1.813 \times 10^{ - 9} } \right) $$OddsRatioOR=0.7409,P-Value(P)=1.813×10-9. While there are several SNPs above the suggestive association threshold of $$ 5 \times 10^{ - 6} $$5×10-6, these SNPs should be considered for further investigation. Furthermore, Logistic Regression analysis adjusted for multiple confounder factors indicated that none of the genotyped SNPs had passed genome-wide significance threshold of $$ 5 \times 10^{ - 8 } $$5×10-8. Nevertheless, four SNPs (rs10519107, rs4368343, rs6848779, rs11729955) have passed suggestive association threshold.

Basma Abdulaimma, Abir Hussain, Paul Fergus, Dhiya Al-Jumeily, Casimiro Aday Curbelo Montañez, Jade Hind

An Ant-Colony Based Approach for Identifying a Minimal Set of Rare Variants Underlying Complex Traits

Identifying the associations between genetic variants and observed traits is one of the basic problems in genomics. Existing association approaches mainly adopt the collapsing strategy for rare variants. However, these approaches largely rely on the quality of variant selection, and lose statistical power if neutral variants are collapsed together. To overcome the weaknesses, in this article, we propose a novel association approach that aims to obtain a minimal set of candidate variants. This approach incorporates an ant-colony optimization into a collapsing model. Several classes of ants are designed, and each class is assigned to one particular interval in the solution space. An ant prefers to build optimal solution on the region assigned, while it communicates with others and votes for a small number of locally optimal solutions. This framework improves the performance on searching globally optimal solutions. We conduct multiple groups of experiments on semi-simulated datasets with different configurations. The results outperform three popular approaches on both increasing the statistical powers and decreasing the type-I and II errors.

Xuanping Zhang, Zhongmeng Zhao, Yan Chang, Aiyuan Yang, Yixuan Wang, Ruoyu Liu, Maomao, Xiao Xiao, Jiayin Wang

Evaluation of Phenotype Classification Methods for Obesity Using Direct to Consumer Genetic Data

Direct-to-Consumer genetic testing services are becoming more ubiquitous. Consumers of such services are sharing their genetic and clinical information with the research community to facilitate the extraction of knowledge about different conditions. In this paper, we build on these services to analyse the genetic data of people with different BMI levels to determine the immediate and long-term risk factors associated with obesity. Using web scraping techniques, a dataset containing publicly available information about 230 participants from the Personal Genome Project is created. Subsequent analysis of the dataset is conducted for the identification of genetic variants associated with high BMI levels via standard quality control and association analysis protocols for Genome Wide Association Analysis. We applied a combination of Random Forest based feature selection algorithm and Support Vector Machine with Radial Basis Function Kernel learning method to the filtered dataset. Using a robust data science methodology our approach identified obesity related genetic variants, to be used as features when predicting individual obesity susceptibility. The results reveal that the subset of features obtained through the Random Forest based algorithm improve the performance of the classifier when compared to the top statistically significant genetic variants identified in logistic regression. Support Vector Machine showed the best results with sensitivity=81%, specificity=83% and area under the curve=92% when the model was trained with the top fifteen features selected by Boruta.

Casimiro Aday Curbelo Montañez, Paul Fergus, Abir Hussain, Dhiya Al-Jumeily, Mehmet Tevfik Dorak, Rosni Abdullah

Protein-Protein Interaction Prediction

Frontmatter

Classification of Hub Protein and Analysis of Hot Regions in Protein-Protein Interactions

Proteins are fundamental to most biological processes, which accomplish a vast amount of functions by interacting with other proteins. The research of PPI (protein-protein interaction) and its network has developed into a great importance part in bioinformatics. In the protein-protein interaction networks, most proteins interact with only a few partners, and small number of proteins interact with many partners, these proteins are called hub proteins. The hub proteins can be divided into party hub and date hub. Therefore, in this paper, we do some works about hub proteins. In addition, this paper uses the connectivity and betweenness to classify the hub protein in protein-protein interaction network. On the other hand, the paper studies hub proteins from another perspective (interfaces conformation), which reflects the organization of hot spot residues in hub protein interface.

Xiaoli Lin, Xiaolong Zhang, Jing Hu

Genome-Wide Identification of Essential Proteins by Integrating RNA-seq, Subcellular Location and Complexes Information

Essential proteins are significant for understanding the cellular survival and practical purpose, such as the disease diagnosis and drug design. Besides biological experimentally methods, previous computational methods are proposed to predict essential proteins based on topological property of protein-protein interaction (PPI) network. However, these methods ignored the temporal and spatial features of the PPI networks. Moreover, researches show that essentiality is closely tied to the protein complexes to which that protein belongs. Therefore, improving the performance of predicting essential proteins is still a challenging task. In this study, by integrating the RNA-seq data, subcellular location compartments and protein complexes together in the PPI network, we proposed a method called IUS. IUS is applied to the PPI network of Saccharomyces cerevisiae, results based on the multiple evaluate methods show that IUS outperform other eight existing methods including DC, BC, EC, IC, SoECC, LAC, PeC and WDC.

Chunyan Fan, Xiujuan Lei

Protein Structure and Function Prediction

Frontmatter

Similarity Comparison of 3D Protein Structure Based on Riemannian Manifold

As the representative technology of protein spatial structure exploration, NMR technology provides an unprecedented opportunity for modern life science research. But subsequent large data analysis has become a major problem. It is an important means to study protein structure and functional relationship by known information proteins’ three-dimensional structures to predict the unknown spatial structure of proteins. A method for similarity comparison of 3D protein structures based on Riemannian manifold theory is proposed in this paper. By constructing Cα frames and extracting geometric feature of protein, 3D coordinates of proteins are converted into one dimension sequences with rotation and translation invariance. The Riemann distance is used as the three-dimensional structure similarity degree index. Spatial transformation on protein structure is not needed in this method, which avoiding errors when matching two proteins in the traditional method for registration by the least squares fitting. This method is independent of sequence information completely. It has realistic significance for proteins which do not have a similarity between sequences. Three experiments are designed according to 3 sets of data: proteins of different similarity, ten pairs whose protein structures are more difficult to identify proposed by Fischer, 700 proteins in the HOMSTRAD database. Compared with the traditional method, the experiment results show that the matching accuracy of this method has been greatly enhanced.

Zhou Fengli, Lin Xiaoli

Protein-Protein Binding Affinity Prediction Based on Wavelet Package Transform and Two-Layer Support Vector Machines

Precisely inferring the affinities of protein-protein interaction is essential for evaluating different methods of protein-protein docking and their outputs and also opens a door to inferring real status of cellular protein-protein complex. Accumulation of measured affinities of determined protein complex structures with high resolution facilitate the realization of this ambitious goal. Previous physical model based scoring functions failed to predict the affinities of diverse protein complexes. Therefore, accurate method for binding affinity prediction is still extremely challenging. Machine learning methods are promising to address this problem. However, current machine learning methods are not compatible to this task, which obstructs the effective application of these methods. We propose a Wavelet Package Transform (WPT) combined with two-layer support vector regression (TLSVR-WPT) model to implicitly capture binding contributions that are hard to model explicitly. Wavelet package transform greatly reduced the dimension of input features into machine learning model. The TLSVR circumvents both the descriptor compatibility problem and the need for problematic modeling assumptions. Input features for TLSVR first layer are eight features transformed by Wavelet Transform Package from scores of 2209 interacting atom pairs within each distance bin. The output of the first layer is combined by the next layer to infer the final affinities. A satisfactory result of R = 0.81 and SD = 1.40 was achieved when 2209 features were reduced to eight ones by 3-level Wavelet Package Transform. Results demonstrate that wavelet package transform greatly reduced the dimension of the input features into SVR without reducing the accuracy in predicting the protein binding affinity.

Min Zhu, Xiaolai Li, Bingyu Sun, Jinfu Nie, Shujie Wang, Xueling Li

Prediction of Lysine Pupylation Sites with Machine Learning Methods

Post translational modification is a crucial type of protein post-translational modification, which is involved in many important cellular processes and serious diseases. In practice, identification of protein pupylated sites through traditional experiment methods is time-consuming and laborious. Computational methods are not suitable to identify a large number of acetylated sites quickly. Therefore, machine learning methods are still very valuable to accelerate lysine acetylated site finding. Post translational modification of protein is one of the most important biological processions in the field of proteomics and bioinformatics. In this work, the random forest algorithm is employed as the classification model and the PseAAC has been employed as the classification features. Considering the different feature types of PseAAC playing different role in the classification model, the random forest voting method has been proposed in this framework. The results demonstrate that such method will work well in such classification issue.

Wenzheng Bao, Zhichao Jiang

Next-Gen Sequencing and Metagenomics

Frontmatter

An Integrative Approach for the Functional Analysis of Metagenomic Studies

Metagenomics is one of the most prolific “omic” sciences in the context of biological research on environmental microbial communities. The studies related to metagenomics generate high-dimensional, sparse, complex, and biologically rich datasets. In this research, we propose a framework which integrates omics-knowledge to identify suitable-reduced set of microbiome features for gaining insights into functional classification of metagenomic sequences. The proposed approach has been applied to two Use Case studies on: - (1) cattle rumen microbiota samples, differentiating nitrate and vegetable oil treated feed for improving cattle performance and (2) human gut microbiota and classifying them in functionally annotated categories of leanness, obesity, or overweight. A high accuracy of 97.5% and Area Under Curve performance value (AUC) of 0.972 was achieved for classifying Bos taurus, cattle rumen microbiota using Logistic Regression (LR) as classification model as well as feature selector in wrapper based strategy for Use Case 1 and 94.4% accuracy with AUC of 1.000, for Use Case 2 on human gut microbiota. In general, LR classifier with wrapper - LR learner as feature selector, proved to be most robust in our analysis.

Jyotsna Talreja Wassan, Haiying Wang, Fiona Browne, Paul Wash, Brain Kelly, Cintia Palu, Nina Konstantinidou, Rainer Roehe, Richard Dewhurst, Huiru Zheng

LSLS: A Novel Scaffolding Method Based on Path Extension

While aiming to determine orientations and orders of fragmented contigs, scaffolding is an essential step of assembly pipelines and can make assembly results more complete. Most existing scaffolding tools adopt the scaffold graph approach. However, constructing an accurate scaffold graph is still a challenge task. Removing potential false relationships is a key to achieve a better scaffolding performance, while most scaffolding approaches neglect the impacts of uneven sequencing depth that may cause more sequencing errors, and finally result in many false relationships. In this paper, we present a new scaffolding method LSLS (Loose-Strict-Loose Scaffolding), which is based on path extension. LSLS uses different strategies to extend paths, which can be more adaptive to different sequencing depths. For the problem of multiple paths, we designed a score function, which is based on the distribution of read pairs, to evaluate the reliability of path candidates and extend them with the paths which have the highest score. Besides, LSLS contains a new gap estimation method, which can estimate gap sizes more precisely. The experiment results on the two standard datasets show that LSLS can get better performance.

Min Li, Li Tang, Zhongxiang Liao, Junwei Luo, Fangxiang Wu, Yi Pan, Jianxin Wang

Structure Prediction and Folding

Frontmatter

The Hasp Motif: A New Type of RNA Tertiary Interactions

RNA structural motifs are recurrent structural elements occurring in RNA molecules. They play essential roles in consolidating RNA tertiary structures and in binding proteins. Recently, we discovered a new type of RNA structural motif, namely the hasp motif, from 27 RNA molecules. The hasp motif comprises three nucleotides which form a structure similar to a hasp. Two consecutive nucleotides in the motif come from a double helix and the third one comes from a remote stand. The hasp motif makes two helices approach each other, which leads to RNA structure folding. All the identified hasp motifs reveal a consensus structural pattern although their sequences are not conserved. Hasp motifs are observed to reside both inside and on the surface of RNA molecules. Those inside RNA molecules help consolidate RNA tertiary structures while the others locating on the surface are evidenced to interact with proteins. The wide existence of hasp motifs indicates that hasp motifs are quite essential in both keeping RNA structures’ stableness and helping RNA perform their functions in biological processes.

Ying Shen, Lin Zhang

Biomarker Discovery

Frontmatter

SPYSMDA: SPY Strategy-Based MiRNA-Disease Association Prediction

Developing computational models to identify potential miRNA-disease associations in large scale, which could provide better understanding of disease pathology and further boost disease diagnostic and prognostic, has attracted more and more attention. Considering various disadvantages of previous computational models, we proposed the model of SPY Strategy-based MiRNA-Disease Association (SPYSMDA) prediction to infer potential miRNA-disease associations by integrating known miRNA-disease associations, disease semantic similarity network and miRNA functional similarity network. Due to the large amount of ‘missing’ associations in the unlabeled miRNA-disease pairs, simply regarding unlabeled instances as negative training samples would lead to high false negative rates of predicted associations. In this paper, we introduced the concept of ‘spy instances’ to identify reliable negatives for model performance improvement. As a result, SPYSMDA achieved excellent AUCs of 0.8827, 0.8416, and 0.8802 in global leave-one-out cross validation, local leave-one-out cross validation and 5-fold cross validation, respectively. Furthermore, Esophageal Neoplasms was taken as a case study, where 47 out of top 50 predicted miRNAs were successfully confirmed by recent biological experimental literatures.

Zhi-Chao Jiang, Zhen Shen, Wenzheng Bao

Applications of Machine Learning Techniques to Computational Proteomics, Genomics, and Biological Sequence Analysis

Frontmatter

SOFM-Top: Protein Remote Homology Detection and Fold Recognition Based on Sequence-Order Frequency Matrix

Protein remote homology detection and fold recognition are critical for the studies of protein structure and function. Currently, the profile-based methods showed the state-of-the-art performance in this field, which are based on widely used sequence profiles, such as Position-Specific Frequency Matrix (PSFM) and Position-Specific Scoring Matrix (PSSM). However, these approaches ignore the sequence-order effects along protein sequence. In this study, we proposed a novel profile, called Sequence-Order Frequency Matrix (SOFM), which can incorporate the sequence-order information and extract the evolutionary information from Multiple Sequence Alignment (MSA). Statistical tests and experimental results demonstrated its effects. Combined with a previously proposed approach Top-n-grams, the SOFM was then applied to remote homology detection and fold recognition, and a computational predictor called SOFM-Top was proposed. Evaluated on four benchmark datasets, it outperformed other state-of-the-art methods in this filed, indicating that SOFM-Top would be a more useful tool, and SOFM is a richer representation than PSFM and PSSM. SOFM will have many potential applications since profiles have been widely used for constructing computational predictors in the studies of protein structure and function.

Junjie Chen, Mingyue Guo, Xiaolong Wang, Bin Liu

Biomedical Image Analysis

Frontmatter

A Supervised Breast Lesion Images Classification from Tomosynthesis Technique

In this paper, we propose a deep learning approach for breast lesions classification, by processing breast images obtained using an innovative acquisition system, the Tomosynthesis, a medical instrument able to acquire high-resolution images using a lower radiographic dose than normal Computed Tomography (CT). The acquired images were processed to obtain Regions Of Interest (ROIs) containing lesions of different categories. Subsequently, several pre-trained Convolutional Neural Network (CNN) models were evaluated as feature extractors and coupled with non-neural classifiers for discriminate among the different categories of lesions. Results showed that the use of CNNs as feature extractor and the subsequent classification using a non-neural classifier reaches high values of Accuracy, Sensitivity and Specificity.

Vitoantonio Bevilacqua, Daniele Altini, Martino Bruni, Marco Riezzo, Antonio Brunetti, Claudio Loconsole, Andrea Guerriero, Gianpaolo Francesco Trotta, Rocco Fasano, Marica Di Pirchio, Cristina Tartaglia, Elena Ventrella, Michele Telegrafo, Marco Moschetta

Human-Machine Interaction: Shaping Tools Which Will Shape Us

Frontmatter

Computer Vision and EMG-Based Handwriting Analysis for Classification in Parkinson’s Disease

Handwriting analysis represents an important research area in different fields. From forensic science to graphology, the automatic dynamic and static analyses of handwriting tasks allow researchers to attribute the paternity of a signature to a specific person or to infer medical and psychological patients’ conditions. An emerging research field for exploiting handwriting analysis results is the one related to Neurodegenerative Diseases (NDs). Patients suffering from a ND are characterized by an abnormal handwriting activity since they have difficulties in motor coordination and a decline in cognition.In this paper, we propose an approach for differentiating Parkinson’s disease patients from healthy subjects using a handwriting analysis tool based on a limited number of features extracted by means of both computer vision and ElectroMyoGraphy (EMG) signal-processing techniques and processed using an Artificial Neural Network-based classifier.Finally, we report and discuss the results of an experimental test conducted with both healthy and Parkinson’s Disease patients using the proposed approach.

Claudio Loconsole, Gianpaolo Francesco Trotta, Antonio Brunetti, Joseph Trotta, Angelo Schiavone, Sabina Ilaria Tatò, Giacomo Losavio, Vitoantonio Bevilacqua

A Novel Approach in Combination of 3D Gait Analysis Data for Aiding Clinical Decision-Making in Patients with Parkinson’s Disease

The most common methods used by neurologist to evaluate Parkinson’s Disease (PD) patients are rating scales, that are affected by subjective and non-repeatable observations. Since several research studies have revealed that walking is a sensitive indicatorfor the progression of PD. In this paper, we propose an innovative set of features derived from three-dimensional Gait Analysis in order to classify motor signs of motor impairment in PD and differentiate PD patients from healthy subjects or patients suffering from other neurological diseases. We consider kinematic data from Gait Analysis as Gait Variables Score (GVS), Gait Profile Score (GPS) and spatio-temporal data for all enrolled patients. We then carry out experiments evaluating the extracted features using an Artificial Neural Network (ANN) classifier. The obtained results are promising with the best classifier score accuracy equal to 95.05%.

Ilaria Bortone, Gianpaolo Francesco Trotta, Antonio Brunetti, Giacomo Donato Cascarano, Claudio Loconsole, Nadia Agnello, Alberto Argentiero, Giuseppe Nicolardi, Antonio Frisoli, Vitoantonio Bevilacqua

Protein and Gene Bioinformatics: Analysis, Algorithms and Applications

Frontmatter

Identification of Candidate Drugs for Heart Failure Using Tensor Decomposition-Based Unsupervised Feature Extraction Applied to Integrated Analysis of Gene Expression Between Heart Failure and DrugMatrix Datasets

Identifying drug target genes in gene expression profiles is not straightforward. Because a drug targets not mRNAs but proteins, mRNA expression of drug target genes is not always altered. In addition, the interaction between a drug and protein can be context dependent; this means that simple drug incubation experiments on cell lines do not always reflect the real situation during active disease. In this paper, I apply tensor decomposition-based unsupervised feature extraction to the integrated analysis of gene expression between heart failure and the DrugMatrix dataset where comprehensive data on gene expression during various drug treatments of rats were reported. I found that this strategy, in a fully unsupervised manner, enables us to identify a combined set of genes and compounds, for which various associations with heart failure were reported.

Y-h. Taguchi

Calculating Kolmogorov Complexity from the Transcriptome Data

Information entropy is used to summarize transcriptome data, but ignoring zero count data contained them. Ignoring zero count data causes loss of information and sometimes it was difficult to distinguish between multiple transcriptomes. Here, we estimate Kolmogorov complexity of transcriptome treating zero count data and distinguish similar transcriptome data.

Panpaki Seekaki, Norichika Ogata

Influence of Amino Acid Properties for Characterizing Amyloid Peptides in Human Proteome

Amyloidosis denotes the medical disorders associated with deposition of insoluble protein fibrillar aggregates and it is associated with various human diseases. Presence of aggregation prone regions plays an important role in determining the aggregation propensity of a protein, hence understanding the characteristics of these regions is of keen interest in academia and industry. In this work, we have identified 465 aggregation prone regions with 353 unique peptides in human proteome. Evaluation of the performance of available methods for identifying these 353 peptides showed a sensitivity in the range of 15% to 90%. Further, we identified the amino acid properties enthalpy, entropy, free energy and hydrophobicity are important for promoting aggregation. Utilizing these properties, we have developed a model for distinguishing between amyloid forming and non-amyloid peptides, which showed an accuracy of 71% with a balance between sensitivity and specificity. We suggest that the results obtained in this work could be effectively used to improve the prediction performance of existing methods.

R. Prabakaran, Rahul Nikam, Sandeep Kumar, M. Michael Gromiha

Link Mining for Kernel-Based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

Virtual screening (VS) is widely used during computational drug discovery to reduce costs. Chemogenomics-based virtual screening (CGBVS) can be used to predict new compound-protein interactions (CPIs) from known CPI network data using several methods, including machine learning and data mining. Although CGBVS facilitates highly efficient and accurate CPI prediction, it has poor performance for prediction of new compounds for which CPIs are unknown. The pairwise kernel method (PKM) is a state-of-the-art CGBVS method and shows high accuracy for prediction of new compounds. In this study, on the basis of link mining, we improved the PKM by combining link indicator kernel (LIK) and chemical similarity and evaluated the accuracy of these methods. The proposed method obtained an average area under the precision-recall curve (AUPR) value of 0.562, which was higher than that achieved by the conventional Gaussian interaction profile (GIP) method (0.425), and the calculation time was only increased by a few percent.

Masahito Ohue, Takuro Yamazaki, Tomohiro Ban, Yutaka Akiyama

Investigating Alzheimer’s Disease Candidate Genes Based on Combined Network Using Subnetwork Extraction Algorithms

There is increasing need for accurate Alzheimer’s disease (AD) related genes prediction to inform study design, but available genes estimates are limited. In this study, the subnetwork extraction algorithms were applied to extract subnetworks and mine candidate genes based on a combined network, which was constructed by integrating the information of protein-protein interactions and gene-gene co-expression network. We obtained seven candidate genes with high possibility during AD progression. The application of subnetwork extraction algorithms based on combined network would provide a new insight into predicting the AD-related genes.

Xiaojuan Wang, Hua Yan, Di Zhang, Le Zhao, Yannan Bin, Junfeng Xia

Special Session on Computer Vision based Navigation

Frontmatter

A Comparative Analysis Among Dual Tree Complex Wavelet and Other Wavelet Transforms Based on Image Compression

Recently, the demand for efficient image compression algorithms have peeked due to storing and transmitting image requirements over long distance communication purposes. Image applications are now highly prominent in multimedia production, medical imaging, law enforcement forensics and defense industries. Hence, effective image compression offers the ability to record, store, transmit and analyze images for these applications in a very efficient manner. This paper offers a comparative analysis between the Dual Tree Complex Wavelet Transform (DTCWT) and other wavelet transforms such as Embedded Zerotree Wavelet (EZW), Spatial orientation Transform Wavelet (STW) and Lifting Wavelet Transform (LWT) for compressing gray scale images. The performances of these transforms will be compared by using objective measures such as peak signal to noise ratio (PSNR), mean squared error (MSE), compression ratio (CR), bit per pixel (BPP) and computational time (CT). The experimental results show that DTCWT provides better performance in term of PSNR and MSE and better reconstruction of image than other methods.

Inas Jawad Kadhim, Prashan Premaratne, Peter James Vial, Brendan Halloran

Distributed One Dimensional Calibration and Localisation of a Camera Sensor Network

Metric calibration and localisation are crucial requirements for many higher-level robotic vision tasks, such as visual navigation and tracking. Furthermore, distributed algorithms are being increasingly used to create scalable camera sensor networks (CSN) which are resistant to node failure. We present a distributed algorithm for the calibration and localisation of a CSN. Our method involves a robust local calibration at each node using a 1D calibration object, consisting of collinear points moving about a single fixed point. Next, each node builds a vision graph and performs cluster-based bundle adjustment, utilising the structure of calibration object to produce pose estimates for its cluster. Finally, these estimates are brought to global consensus through Gaussian belief propagation. Experimental results validate our algorithm, showing that it has comparable performance to centralised algorithms, despite being distributed in nature.

Brendan Halloran, Prashan Premaratne, Peter Vial, Inas Kadhim

Neural Networks: Theory and Application

Frontmatter

CAPTCHA Recognition Based on Faster R-CNN

In this paper, Faster R-CNN was employed to recognize the CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). Unlike traditional method, the proposed method is based on deep learning object detection framework. By inputting the database into the network and training the Faster R-CNN, the feature map can be obtained through the convolutional layers. The proposed method can recognize the character and it is location. Experiments show that Faster R-CNN can be used in CAPTCHA recognition with promising speed and accuracy. The experimental results also show that the mAP (mean average precision) value will improve with the depth of the network increasing.

Feng-Lin Du, Jia-Xing Li, Zhi Yang, Peng Chen, Bing Wang, Jun Zhang

Prediction of Subcellular Localization of Multi-site Virus Proteins Based on Convolutional Neural Networks

Prediction of subcellular localization is critical for the analysis of mechanism and functions of proteins and biological research. A series of efficient methods have been proposed to identify subcellular localization, but challenges still exist. In this paper, a novel feature extraction method, denoted as F-Dipe, is proposed to identify subcellular localization. F-Dipe, which is based on dipeptide pseudo amino acid composition method, improves the performance of multi-site prediction by increasing the focus information of proteins. Besides, convolution neural networks, denoted as CNN, is utilized to predict the subcellular localization of multi-site virus proteins. The multi-label k-nearest neighbor algorithm, denoted as MLKNN, is a base classifier to verify the performance of F-Dipe and CNN. The best overall accuracy of F-Dipe on dataset S from the predictor of MLKNN is 59.92%, higher than the accuracy of pseudo amino acid based features method, denoted as PseAAC, 57.14% and the best overall accuracy of F-Dipe on database S from the predictor of CNN is 62.3%, better than from the predictor of MLKNN 59.92%.

Lei Wang, Dong Wang, Yaou Zhao, Yuehui Chen

Improved Convolutional Neural Networks for Identifying Subcellular Localization of Gram-Negative Bacterial Proteins

Prediction of subcellular localization of Gram-negative bacterial proteins plays a vital role in the development of antibacterial drugs. Computational approaches have made remarkable progress in bacterial protein subcellular localization, but disadvantages still exist. Recently, deep learning has received significant attention in bioinformatics and one of the key steps in prediction of subcellular localization is developing a powerful predictor. Therefore, improved convolutional neural networks (ICNN) is used to improve the performance of multi-site prediction. First of all, Amphiphilic pseudo amino acid based features (Ampseaac) is used to extract features. Then, compared to the multi-label k-nearest neighbor algorithm (MLKNN), ICNN is developed to identify the subcellular localization of Gram-negative bacterial proteins. The best overall accuracy of Ampseaac from ICNN predictor is 65.25%, better than MLKNN predictor 58.58%.

Lei Wang, Dong Wang, Yaou Zhao, Yuehui Chen

Emotion Recognition from Noisy Mandarin Speech Preprocessed by Compressed Sensing

Noisy speech emotion recognition is significant in Artificial Intelligence (AI) and Human-Computer Interaction (HCI). In this paper, Compressed Sensing (CS) theory is adopted in preprocessing procedure to remove the added noise on the samples in a mandarin emotional speech corpus. A novel binary tree structure is utilized in the designing of the multi-class classifier. Acoustic features are selected to build feature subset with better emotional recognizability. The recognition accuracies and corresponding confusion matrices of the original, noisy and reconstructed speech samples are compared. The recognition performance of the reconstructed samples is better than the samples contaminated by noise and similar as the performance of original samples. The experimental results show that Compressed Sensing is feasible and effective in noisy speech emotion recognition as a preprocess method.

Xiaoqing Jiang, Dapeng He, Xinghai Yang, Lingyin Wang

A Novel Adaptive Beamforming with Combinational Algorithm in Wireless Communications

A novel combinational adaptive beamforming algorithm is proposed for wireless communication applications. The significant advantage of the LMS (Least mean square) algorithm is its simplicity. Nevertheless its defect is that it has got relatively slow rate of convergence. The convergence rate of the RLS (Recursive least Squares) algorithm is faster than the LMS algorithm by an order of magnitude. However this advantage is gained at the cost of an increase in computational complexity. Considering the characteristics of two classic adaptive algorithms, a combinational algorithm is investigated in this paper by using combining merits of different algorithms as well as avoiding defects of them. The simulation was carried out and results show that the algorithm has comparable performance compared with above algorithms and faster convergence speed than LMS algorithm.

Yue Zhao, Bo Ai, Yiru Liu

Learning Bayesian Networks Structure Based Part Mutual Information for Reconstructing Gene Regulatory Networks

As a kind of high-precision correlation measurement method, Part Mutual Information (PMI) was firstly introduced into Bayesian Networks (BNs) structure learning algorithm in the paper. Compared to the general search scoring algorithm which set the initial network as an empty network without edge, our training algorithm initialized the network structure as an undirected network. That meant that our initial network identified the genes related to each other. And then the following algorithm only needed to determine the direction of the edges in the network. In the paper, we quoted the classic K2 algorithm based on Bayesian Dirichlet Equivalence (BDE) scoring function to search the direction of the edges. To test the proposed method, We carried out our experiment on two networks: the simulated gene regulatory network and the SOS DNA Repair network of Ecoli bacterium. And via comparison of different methods for SOS DNA Repair network, our proposed method was proved to be effective.

Qingfei Meng, Yuehui Chen, Dong Wang, Qingfang Meng

Bilateral Filtering NIN Network for Image Classification

A novel deep architecture bilateral filter NIN for classification tasks is proposed in the paper, in which the input image pixels using the bilateral filter and a multi-path convolution neural network are reconstructed. This network has two input paths, one is the original image and the other is the reconstructed image which independent on and complement each other. Therefore, the loss of foreground object texture and shape information can be reduced during the process of feature extraction from the complex background images. Then, the softmax classifier is employed to classify the extracted features. Experiments are demonstrated on CAFIR-100 dataset, in which some object’s feature gradually disappear after pass through a series of convolution layers and average pooling layers. The results show that, Compared with NIN(network in net- work), the classification accuracy rate increased 0.6% on CIFAR-10 database, accuracy rate increased 0.27% on cifar-100 database.

Jiwen Dong, Yunxing Gao, Hengjian Li, Tianmei Guo

A High and Efficient Sparse and Compressed Sensing-Based Security Approach for Biometric Protection

We propose a highly efficient sparse code with compressive sensing security algorithm based on the Dual-tree Complex Wavelet Transform (DT-CWT) and Hadamard measurement matrix in this paper for biometric protection. Firstly, we use DT-CWT to translate the image into frequency domain and use chaotic systems to encrypt measurement matrices. Also noise shaping is employed in the DT-CWT coefficients to represent the image sparsely. Then, we use compression sensing algorithm to improve the compression rate of encrypted images, and reduce the storage space occupied by images. Finally, in order to improve the algorithm’s capability of handle contaminated images, we use the robustness of the double random phase encoding based on 4f optics system algorithm as secondary encryption. In the image decryption, we use the OMP algorithm. Finally, we can see that our proposed algorithm achieves 37.9863 dB in PSNR, 0.0245 in ERROR and 0.9977 in NC.

Changzhi Yu, Hengjian Li, Ziru Zhao, Jiwen Dong

Robust Real-Time Head Detection by Grayscale Template Matching Based on Depth Images

Head detection conducted on color images has been an active research topic in the computer vision community. Recently, depth sensors have made a new type of data available, which demonstrate good invariance against illumination changes. Head detection based on depth images can be significantly simplified as background subtraction and segmentation are no longer critical issues. In this paper, a robust head detection algorithm is proposed. Firstly, a grayscale template is employed for better modeling and precise detection of human head. Meanwhile, statistical analysis of the correlation coefficients is presented and the optimal threshold is deducted. Secondly, candidate head regions are further examined by seed point selection based on a novel feature taking both correlation and local standard deviation into consideration. Finally, the detected head area is obtained by region-growing and computation efficiency issues are discussed. In order to test the validity of the proposed algorithm, we constructed a Microsoft Kinect depth database with 670 images which includes extreme conditions such as complex background and 180° rotation. Experimental results shows that the proposed algorithm achieves robust real-time head detection.

Yun-Xia Liu, Yang Yang, Min Li

A Data Stream Clustering Algorithm Based on Density and Extended Grid

Based on the traditional grid density clustering algorithm, proposing A Data Stream Clustering Algorithm Based on Density and Extended Grid(DEGDS). The algorithm combines the advantages of grid clustering algorithm and density clustering algorithm, by improving the defects of clustering parameters by artificially set, get any shape of the cluster. The algorithm uses the local density of each sample point and the distance from the other sample points, determining the number of clustering centers in the grid, and realizing the automatic determination of the clustering center, which avoids the influence of improper selection of initial centroid on clustering results. And in the process of combining the Spark parallel framework for partitioning the data to achieve its parallelization. For data points clustered outside the grid, the clustering within the grid has been effectively expanded by extending the grid, to ensure the accuracy of clustering. Introduced density estimation is connected and grid boundaries to merging grid, saving memory consumption. Using the attenuation factor to incremental update grid density, reflect the evolution of spatial data stream. The experimental results show that compared with the traditional clustering algorithm, the DEGDS algorithm has a large performance improvement in accuracy and efficiency, and can be effectively for large data clustering.

Zheng Hua, Tao Du, Shouning Qu, Guodong Mou

Credit Risk Assessment Based on Long Short-Term Memory Model

At present, with continuously expanding of Chinese credit market, thus large amounts of P2P (person-to-person borrow or lend money in Internet Finance) platform were born and have been in development. Most of P2P platform in China carries out the credit risk evaluation of loan applicant by data mining method. As an emerging data mining tool, the artificial neural network has better classification capability. The improvement of risk assessment capabilities of applicant can effectively reduce the overdue rate of analysis, thus in this paper, a kind of credit risk evaluation model based on the Long Short-Term Memory (LSTM) model is presented. The sample data of overdue and non-overdue credits are provided by Hengxin Investment Consulting Co., Ltd. in Jinan, by which the model is established. After the trial, this model is applied to the aspect of overdue classification of credit evaluation with higher accuracy.

Yishen Zhang, Dong Wang, Yuehui Chen, Huijie Shang, Qi Tian

The Feature Extraction Method of EEG Signals Based on the Loop Coefficient of Transition Network

High accuracy of epilepsy EEG automatic detection has important clinical research significance. The combination of nonlinear time series analysis and complex network theory made it possible to analyze time series by the statistical characteristics of complex network. In this paper, based on the transition network the feature extraction method of EEG signals was proposed. Based on the complex network, the epileptic EEG data were transformed into the transition network, and the loop coefficient was extracted as the feature to classify the epileptic EEG signals. Experimental results show that the single feature classification based on the extracted feature obtains classification accuracy up to 98.5%, which indicates that the classification accuracy of the single feature based on the transition network was very high.

Mingmin Liu, Qingfang Meng, Qiang Zhang, Hanyong Zhang, Dong Wang

Safety Inter-vehicle Policy Based on the Longitudinal Dynamics Behaviors

The analysis of the safety inter-vehicle distance plays important roles for driving assistant system, which can give the warning signal to drivers timely. In order to provide the drivers a warning signal about an impending collision reasonable and timely, safety inter-vehicle policies between host vehicle and preceding vehicle are proposed based on the longitudinal dynamics behaviors in this paper. First, by analyzing the driving force and resistance force generated from the road surface, the basis safety inter-vehicle distance is designed, in which the surface friction coefficient and the air’s visibility are considered. Taken the driving state of preceding vehicle into consideration, including braking hard until to a complete stop and braking with constant deceleration, the safety inter-vehicle policies are derived from the basis safety inter-vehicle distance, which is composed of the sliding distance, duration distance and deceleration distance. Finally, by comparing with the classic safety distances, the effectiveness and elasticity of the proposed inter-vehicle policies are illustrated.

Xiao-Fang Zhong, Ning Yuan, Shi-Yuan Han, Yue-Hui Chen, Dong Wang

Global Adaptive and Local Scheduling Control for Smart Isolated Intersection Based on Real-Time Phase Saturability

Linking real-time phase saturability directly to the traffic signal control, the global adaptive control scheme for traffic light loop and the local scheduling control strategy for phase green time are proposed in this paper for real-time traffic signal control systems with multiple phases. First, applying the real-time phase saturability of each phase as the time-varying weight factor, an elastic scheduling model is designed to describe the competitive relationship among the different phase in a traffic light loop. Then the traffic green time scheduling problem in a traffic light loop is formulated as a trade-off optimization problem between the green light time and real-time phase saturability for each phase. By solving a quadratic programming problem that seeks the minimum value of the sum of the squared deviation between the green time and the maximum allowable green time in each phase, the allocated green time in next traffic loop is obtained. If the total real-time traffic load exceeds the allocable maximum value or unreaches the allocable minimum value, the proportional global adaptive control schemes are triggered for rearranging traffic light loop time. Undertaking different traffic flow conditions, the effectiveness of proposed adaptive control schemes and scheduling strategies are illustrated compared with the classic average allocating method.

Shi-Yuan Han, Fan Ping, Qian Zhang, Yue-Hui Chen, Jin Zhou, Dong Wang

Classifying DNA Microarray for Cancer Diagnosis via Method Based on Complex Networks

Performing microarray expression data classification can improve the accuracy of a cancer diagnosis. The varying technique including Support Vector Machines (SVMs), Neuro-Fuzzy models (NF), K-Nearest Neighbor (KNN), Neural Network (NN), and etc. have been applied to analyze microarray expression data. In this investigation, a novel complex network classifier is proposed to do such thing. To build the complex network classifier, we tried a hybrid method based on the Particle Swarm Optimization algorithm (PSO) and Genetic Programming (GP), of which GP aims at finding an optimal structure and PSO accomplishes the fine tuning of the parameters encoded in the proposed classifier. The experimental results conducted on Leukemia and Colon data sets are comparable to the state-of-the-art outcomes.

Peng Wu, Likai Dong, Yuling Fan, Dong Wang

Predicting Multisite Protein Sub-cellular Locations Based on Correlation Coefficient

With the development of proteomics and cell biology, protein sub-cellular location has become a hot topic in bioinformatics. As the time goes on, more and more researchers make great efforts on studying protein sub-cellular location. But they only do research on single-site protein sub-cellular location. However, some proteins can belong to two or more sub-cellulars. So, we should transfer the line of sight to multisite protein sub-cellular location. In this article, we use Virus-mPLoc data set and choose pseudo amino acid composition and correlation coefficient two effective feature extraction methods. Then, putting these features into multi-label k-nearest neighbor classifier to predict protein sub-cellular location. The experiment proves that this method is reasonable and the precision reached 68.65% through the Jack-knife test.

Peng Wu, Dong Wang, Xiao-Fang Zhong, Qing Zhao

Using a Hierarchical Classification Model to Predict Protein Tertiary Structure

To predict protein tertiary structure accurately is helpful for understanding the functions of proteins. In this study, a hierarchical classification method based on flexible neural tree was proposed to predict the structures, in which the tier classifiers were flexible neural trees due to their excellent performances. In order to classify the structures, three types of feature are used, i.e. the tripeptide composed of dimension reduction, the pseudo amino acid composition and the position information of amino acid residues. To evaluate our method, the 640 data set was used in this investigation. The experimental results suggest that our method overwhelms several representative approaches to predicting protein tertiary structure.

Peng Wu, Dong Wang, Xiao-Fang Zhong, Fanliang Kong

The Study on Grade Categorization Model of Question Based on on-Line Test Data

To tackle with the blindness of random questions choosing for exercise and test of the on-line learning system, this paper clusters questions exploiting various feature subsets and parameters via K-means. For the test data of ACM Online Judge system, the features of temporal fluctuations mean of time consumption and repeat submission rate are used to make the question categorization and automatic recommendation come true. The experimental results suggest that the proposed method is simple but effective, and by which an on-line test platform can realize functions such as individuation teaching, intelligently questions choosing, teaching instruction, automatically paper constructing and paper difficult prediction.

YuLing Fan, Tao Xu, Likai Dong, Dong Wang

2D Human Parsing with Deep Skin Model and Part-Based Model Inference

Human parsing plays an important role in action understanding, clothing recommendation and human-computer interaction, etc. However, variations of human pose, clothes, viewpoint and cluttered background make the segmentation and pose estimation of body parts more difficult. In this paper, a human parsing framework is proposed based on a combination of deep skin model and part based model inference. First, a deep skin model is trained via deep belief networks, which will be used to reduce the pose searching spaces and enhance the efficiency of model inference. Secondly, pictorial structure model parses human body more accurate with the fusion maps of skin detection and HOG based part detectors. The experimental results demonstrate that the fusion of skin detection improves the detection and pose estimation of human body parts, especially for the parts such as head, arms and legs.

Tao Xu, Zhiquan Feng, Likai Dong, Xiaohui Yang

Terrain Visualization Based on Viewpoint Movement

In real-time rendering of large-scale terrain, often using frame coherence to optimize terrain visualization algorithm, to reduce the amount of calculation of each frame, to improve the efficiency of the algorithm. But using frame coherence optimization algorithm has a premise, it is between two consecutive frames, the viewpoint movement is small. Therefore, you can make use of the changes between the two frames, only modify the part of the display node can get the next frame to display the terrain node. However, in practical applications, however, there is a large difference between the two consecutive frames, which is caused by the discontinuity of the viewpoint and the fast movement speed. In this paper, it is considering the viewpoint of jumping movement, using the threshold of viewpoint movement to optimize the terrain visualization algorithm. To improve the effectiveness and generality of terrain visualization algorithm based on frame coherence. The experimental results show that the use of threshold of viewpoint movement revised visualization algorithm, can effectively reduce the amount of calculation per frame, to enhance the efficiency of the algorithm.

Ping Yu, Songjiang Wang

Distributed Processing of Continuous Range Queries Over Moving Objects

With the widespread usage of wireless network and mobile devices, the scale of spatial-temporal data is dramatically increasing and a good deal of real world applications can be formulated as processing continuous queries over moving objects. Most existing works investigating this problem mainly concern about the centralized search algorithm for dealing with range queries over a limited volume of objects, but these approaches hardly can scale well in a cluster of servers. Additionally, the existing approaches seldom process the situation that the locations of objects and queries are simultaneously changing. To address this challenge, we propose a distributed grid index and a distributed incremental search approach to handle concurrent continuous range queries over an ocean of moving objects. As to the distributed grid index, it can be deployed on a distributed computing framework to well support the real-time maintenance of moving objects. Further, we take fully into account the condition that locations of objects and queries are both changing at the same time, and put forward a parallel search approach based on the publish/subscribe mechanism to achieve incrementally searching results of each continuous range queries with a cluster of servers. Finally, we conduct extensive experiments to sufficiently evaluate the performance of our proposal.

Jin Zhou, Hao Teng, Ziqiang Yu, Dong Wang, Jiaqi Wang

Identity Authentication Technology of Mobile Terminal Based on Cloud Face Recognition

The face recognition of mobile terminal plays an important role in the identity authentication technology. But there are some problems such as long detection time and low recognition rate due to the performance of mobile devices. This paper presents a mobile terminal identity authentication technology based on cloud face recognition. The whole system consists of cloud server, check-in and checked-in. Checked-in part includes simple face detection and face image acquisition, which sends the acquired face image to the cloud server for face recognition. The wifi hotspot based on mobile phone can determine whether the sign-in end is near the check-in end to achieve the purpose of distance authentication. The face recognition module, which is heavily influenced by hardware performance, is deployed on the cloud server. The information of each part exchange through the network, and human can efficiently authenticate the entire process. Experiment result shows that the method proposed in this paper can effectively improve the speed and accuracy of face recognition on the mobile side, and achieves an average accuracy rate of 92%.

Likai Dong, Zhikang Ma, Tao Xu, Dong Wang

Prediction and Analysis of Mature microRNA with Flexible Neural Tree Model

miRNA is a class of small non-coding RNA molecules, length of about 20–24 nucleotides. It combines with mRNA by the principle of complementary base pairing to achieve the objective of cracking or suppressing mRNA, which has the function of gene regulation. Therefore, study on the prediction of miRNA is always the hot topic in bioinformatics. In this paper, we drew on a new method of feature extraction and combined the flexible neural tree (FNT) to predict miRNA. For comparison, we adopted XUE dataset, used the training dataset to train the classifier, and then used the classifier to test on testing dataset. The final average accuracy rate of our experiment that is 93.7% is higher than the prediction method of XUE triple-SVM. So our method achieves a better classification effect.

Rongbin Xu, Huijie Shang, Dong Wang, Gaoqiang Yu, Yunguang Lin

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise