Skip to main content
Top

2018 | Book

Advances in Machine Learning and Data Science

Recent Achievements and Research Directives

Editors: Dr. Damodar Reddy Edla, Prof. Dr. Pawan Lingras, Dr.  Venkatanareshbabu K.

Publisher: Springer Singapore

Book Series : Advances in Intelligent Systems and Computing

insite
SEARCH

About this book

The Volume of “Advances in Machine Learning and Data Science - Recent Achievements and Research Directives” constitutes the proceedings of First International Conference on Latest Advances in Machine Learning and Data Science (LAMDA 2017). The 37 regular papers presented in this volume were carefully reviewed and selected from 123 submissions.

These days we find many computer programs that exhibit various useful learning methods and commercial applications. Goal of machine learning is to develop computer programs that can learn from experience. Machine learning involves knowledge from various disciplines like, statistics, information theory, artificial intelligence, computational complexity, cognitive science and biology. For problems like handwriting recognition, algorithms that are based on machine learning out perform all other approaches. Both machine learning and data science are interrelated. Data science is an umbrella term to be used for techniques that clean data and extract useful information from data. In field of data science, machine learning algorithms are used frequently to identify valuable knowledge from commercial databases containing records of different industries, financial transactions, medical records, etc.

The main objective of this book is to provide an overview on latest advancements in the field of machine learning and data science, with solutions to problems in field of image, video, data and graph processing, pattern recognition, data structuring, data clustering, pattern mining, association rule based approaches, feature extraction techniques, neural networks, bio inspired learning and various machine learning algorithms.

Table of Contents

Frontmatter
Optimization of Adaptive Resonance Theory Neural Network Using Particle Swarm Optimization Technique

With the advancement of computers and its computational enhancement over several decades of use, but with the growth in the dependencies and use of these systems, more and more concerns over the risk and security issues in networks have raised. In this paper, we have proposed approach using particle swarm optimization to optimize ART. Adaptive resonance theory is one of the most well-known machine-learning-based unsupervised neural networks, which can efficiently handle high-dimensional dataset. PSO on the other hand is a swarm intelligence-based algorithm, efficient in nonlinear optimization problem and easy to implement. The method is based on anomaly detection as it can also detect unknown attack types. PSO is used to optimize vigilance parameter of ART-1 and to classify network data into attack or normal. KDD ’99 (knowledge discovery and data mining) dataset has been used for this purpose.

Khushboo Satpute, Rishik Kumar
Accelerating Airline Delay Prediction-Based P-CUDA Computing Environment

Machine learning techniques have enabled machines to achieve human-like thinking and learning abilities. The sudden surge in the rate of data production has enabled enormous research opportunities in the field of machine learning to introduce new and improved techniques that deal with the challenging tasks of higher level. However, this rise in size of data quality has introduced a new challenge in this field, regarding the processing of such huge chunks of the dataset in limited available time. To deal such problems, in this paper, we present a parallel method of solving and interpreting the ML problems to achieve the required efficiency in the available time period. To solve this problem, we use CUDA, a GPU-based approach, to modify and accelerate the training and testing phases of machine learning problems. We also emphasize to demonstrate the efficiency achieved via predicting airline delay through both the sequential as well as CUDA-based parallel approach. Experimental results show that the proposed parallel CUDA approach outperforms in terms of its execution time.

Dharavath Ramesh, Neeraj Patidar, Teja Vunnam, Gaurav Kumar
IDPC-XML: Integrated Data Provenance Capture in XML

In the contemporary world, data provenance is an acute issue in the world of www due to its openness of the Web and the ease of copying and combining interlinked data from different database sources. The term data provenance is defined as lineage of data and movement between databases. Scientists and enterprises use their own analytical tools to process the data provenance. In the current scenario, workflow management systems are popular in scientific domains due to the level of standardization of data formats and analysis. Using graph visualizations, scientists can easily view the data provenance associated with a scientific workflow of any data to understand the methodology and to validate the results. In this paper, we emphasize on a tool-based PROV-DM for collecting provenance data in the XML file and visualizing it as directed graph. We also propose an approach named IDPC-XML for processing and managing the internal data using XML file. This tool collects data provenance obtrusively in a local system using self-generated log and also collects provenance data in XML format which can be visualized as a directed graph to understand the convergence. Relevant case studies of IDPC-XML are discussed and further research scope is pinpointed.

Dharavath Ramesh, Himangshu Biswas, Vijay Kumar Vallamdas
Learning to Classify Marathi Questions and Identify Answer Type Using Machine Learning Technique

One of the budding fields of artificial intelligence is Question Answering (QA). QA is a type of information retrieval in which a set of documents is given, and a QA system attempts to search for the correct answer to the question posed in natural language. Question classification (QC), which is a part of QA system, helps to categorize each question. In QC, the entity type of the answering sentence for a given question in natural language is predicted. QC is a very crucial step in QA system as it helps to take the important decision. For example, QC helps to reduce the possible options of the answer, and thus the answers that match the question class are to be considered. This research takes the first step toward the development of QC system for English–Marathi QA system. This system analyzes the user’s question and deduces the expected Answer Type (AType), for which a dataset of 1000 questions from Kaun Banega Crorepati (KBC) was scrapped and manually translated into Marathi. Right now, the result for translation approach for the coarse-grained class is 73.5% and the fine-grained class is 47.5%, and for the direct approach, it is 56.5 and 30.5% for coarse and fine, respectively. Experiments are going on to improve the results.

Sneha Kamble, S. Baskar
A Dynamic Clustering Algorithm for Context Change Detection in Sensor-Based Data Stream System

Sensor-based monitoring systems are growing enormously which lead to generation of real-time sensor data to a great extent. The classification and clustering of this data are a challenging task within the limited memory and time constraints. The overall distribution of data is changing over the time, which makes the task even more difficult. This paper proposes a dynamic clustering algorithm to find and detect the different contexts in a sensor-based system. It mines dynamically changing sensor streams for different contexts of the system. It can be used for detecting the current context as well as in predicting the coming context of a sensor-based system. The algorithm is able to find context states of different length in an online and unsupervised manner which plays a vital role in identifying the behavior of sensor-based system. The experiments results on real-world high-dimensional datasets justify the effectiveness of the proposed clustering algorithm. Further, discussion on how the proposed clustering algorithm works in sensor-based system is provided which will be helpful for domain experts.

Nitesh Funde, Meera Dhabu, Umesh Balande
Predicting High Blood Pressure Using Decision Tree-Based Algorithm

High blood pressure, also called as hypertension, is a state developed in biological system of human beings by knowingly or unknowingly. It may occur due to varied biological and psychological reasons. If high blood pressure state is sustained for a longer cycle, then the person may be the victim of heart attack or brain stroke or kidney disease. This paper uses a decision tree-based J48 algorithm, to predict whether a person is prone to high blood pressure (HBP). In our experimental analysis, we have taken certain biological parameters such as age, obesity level, and total blood cholesterol level. We have taken the real-time data set of 1045 diagnostic records of patients in the age between 18 and 65. These are collected from a medical diagnosis center Doctor C, Hyderabad. Records (66%) are used to train the model, and remaining 34% records are used to test the model. Our results showed 88.45% accuracy.

Satyanarayana Nimmala, Y. Ramadevi, Srinivas Naik Nenavath, Ramalingaswamy Cheruku
Design of Low-Power Area-Efficient Shift Register Using Transmission Gate

The shift register is sequential logic circuit to store the digital data, also basic structure block in VLSI circuit. This proposes a low-power and area-efficient shift register using transmission gate. Shift register is used small number of pulse clock signal by alignment latches to several shift register and additional temporary latches. The non-stop flow of pulse signal from input side due to this unnecessary of signal flow so the power and delay will essential more, as to overcome this Clock Gating technique is used. The area and power consumption are compact by replacing latches with transmission gate. The area, power, and transistor count have been compared and designed using several latches and flip-flop stages. This technique solves the timing problem between pulsed latches through the use of multiple non-overlap delayed pulsed clock signals instead of the conventional single pulsed clock signal. Clock Gating technique is used for power consumption also delay factor as much as low and latches should be replaced by transmission gate. The static sense amp shared pulse latch (SSASPL) which is the smallest latch has been selected and also power PC Style flip-flop (PPCFF) which is used for calculating power, area, and delay. The shift register uses a small number of the pulsed clock signals by grouping the latches to several sub-shift registers and using additional temporary storage latches. The analysis is carried out using Tanner EDA–Industry Standard EDA design environment using 180 nm technologies. The simulation results are shown. A four-bit shift register using transmission gate in tanner tools used VDD = 3.3 V and power consumption is 1.063 μW.

Akash S. Band, Vishal D. Jaiswal
Prediction and Analysis of Liver Patient Data Using Linear Regression Technique

In the current scenario, it is very difficult for the doctors to diagnose liver patient and there should be some kind of automated support based on machine intelligence that can help to diagnose in advance so that doctors start the treatment faster and save time. The machine intelligence is a way to predict the liver-related problems; in this study, the linear regression is used to predict the same, more accurately. The albumin levels are highly related in diagnosing these kinds of liver problems. The proposed model worked efficiently on 583 observations provided as well as on new datasets. The total average accuracy achieved in this proposed model was 89.34% which is much more than the previously identified research work of Wold et al. (SIAM J Sci Stat Comput, 5(3), 735–743, 1984, [1]) of 84.22%.

Deepankar Garg, Akhilesh Kumar Sharma
Image Manipulation Detection Using Harris Corner and ANMS

Due to the availability of media editing software, the authenticity and reliability of digital images are important. Region manipulation is a simple and effective method for digital image forgeries. Hence, the potential to identify the image manipulation is current research issue these days and copy–move forgery detection (CMFD) is a main domain in image authentication. In copy–move forgery, one region is simply copied and pasted over other region in the same image for manipulating the image. In this paper, we have proposed a method based on Harris corner and adaptive non-maximal Suppression (ANMS). Initially, the input image is taken, and then Harris corner detection algorithm is used to detect the interest points, and ANMS is adopted to control the number of Harris points in an image. This gives an appropriate number of interest points for different size of images and gives the assurance for finding the manipulated region in manageable time. For each extracted interest point, SIFT is used for calculating the descriptors. Now obtained descriptors are matched using the outlier rejection with nearest neighbour. Here RANSAC is used to find the best set of matches to identify the manipulated regions. Experimental results show the robustness against different transformation and post-processing operations.

Choudhary Shyam Prakash, Sushila Maheshkar, Vikas Maheshkar
Spatial Co-location Pattern Mining Using Delaunay Triangulation

Spatial data mining is the process of finding interesting patterns that may implicitly exist in spatial database. The process of finding the subsets of features that are frequently found together in a same location is called co-location pattern discovery. Earlier methods to find co-location patterns focuses on converting neighbourhood relations to item sets. Once item sets are obtained then can apply any method for finding patterns. The criteria to know the strength of co-location patterns is participation ratio and participation index. In this paper, Delaunay triangulation approach is proposed for mining co-location patterns. Delaunay triangulation represents the closest neighbourhood structure of the features exactly which is a major concern in finding the co-location patterns. The results show that this approach achieves good performance when compared to earlier methodologies.

G. Kiran Kumar, Ilaiah Kavati, Koppula Srinivas Rao, Ramalingaswamy Cheruku
Review on RBFNN Design Approaches: A Case Study on Diabetes Data

Radial Basis Function Neural Networks (RBFNNs) are more powerful machine learning technique as it requires non-iterative training. However, the hidden layer of RBFNN grows on par with the growing dataset size. This results in increase in network complexity, training time, and testing times. It is desirable to design appropriate RBFNN which balance between simplicity and accuracy. In the literature, many approaches are proposed for reducing the neurons in the RBFNN hidden layer. In this paper, a comprehensive survey is performed on hidden layer reduction techniques with respect to Pima Indians Diabetes (PID) dataset.

Ramalingaswamy Cheruku, Diwakar Tripathi, Y. Narasimha Reddy, Sathya Prakash Racharla
Keyphrase and Relation Extraction from Scientific Publications

This paper proposes a detailed view of extracting keyphrases and its relations from scientifically published articles such as research papers using conditional random fields (CRF). Keyphrase is a word or set of words that describe the close relationship of content and context in particular documents (Sharan, International conference on advances in computing communications and informatics (ICACCI), 2014) [1]. Keyphrases may be the topics of the document which represent the key logic of the document. Automatic keyphrase extraction has a major role in automatic systems like independent summarization, query or topic generation, question-answering system, search engine, information retrieval, document classification, etc. The relationships of the keyphrases are also extracted. Two types of relations are considered—synonym and hyponyms. The result shows that our proposed system outperforms the existing systems.

R. C. Anju, Sree Harsha Ramesh, P. C. Rafeeque
Mixing and Entrainment Characteristics of Jet Control with Crosswire

This paper aims to study the effect of passive control on elliptical jet at different levels of nozzle pressure ratio. This experiment is carried out for three different types of configurations at two, four, five, and six NPRs. The results are captured and compared to one another. The rectangular crosswire is used as a passive control and tested at Mach number of two. The crosswire running along the major axis of the elliptical jet exits. The pitot pressure decay and the pressure profiles are plotted for various nozzle expansions. The crosswire is placed at three different positions ¼, ½, and ¾ to alter the shock wave successfully and to promote the mixing of jet. The shock waves are captured using numerical simulations. Due to the introduction of passive control at the exit of issuing jet, the shock wave weakens effectively, which stimulates the mixing promotion of jet by providing a shorter core length. It is witnessed that the efficiency of the mixing is superior when the crosswire is placed at ½ positions than ¼ and ¾. In addition, we also had seen a notable change in axis switching of the jets.

S. Manigandan, K. Vijayaraja, G. Durga Revanth, A. V. S. C. Anudeep
GCV-Based Regularized Extreme Learning Machine for Facial Expression Recognition

Extreme learning machine (ELM) with a single-layer feed-forward network (SLFN) has acquired overwhelming attention. The structure of ELM has to be optimized through the incorporation of regularization to gain convenient results, and the Tikhonov regularization is frequently used. Regularization benefits in improving the generalized performance than traditional ELM. The estimation of regularization parameter mainly follows heuristic approaches or some empirical analysis through prior experience. When such a choice is not possible, the generalized cross-validation (GCV) method is one of the most popular choices for obtaining optimal regularization parameter. In this work, a new method of facial expression recognition is introduced where histogram of oriented gradients (HOG) feature extraction and GCV-based regularized ELM are applied. Experimental results on facial expression database JAFFE demonstrate promising performance which outperforms the other two classifiers, namely support vector machine (SVM) and k-nearest neighbor (KNN).

Shraddha Naik, Ravi Prasad K. Jagannath
Prediction of Social Dimensions in a Heterogeneous Social Network

Advancements in communication and computing technologies allow people located geographically apart to meet on a common platform to share information with each other. Social networking sites play an important role in this aspect. A lot of information can be inferred from such networks if the data is analyzed appropriately by applying a relevant data mining method. The proposed work concentrates on leveraging the connection information of the nodes in a social network for the prediction of social dimensions of new nodes joining the social network. In this work, an edge clustering algorithm and a multilabel classification algorithm are proposed to predict the social dimensions of the nodes joining an existing social network. The results of the proposed algorithms are found out to be satisfactory.

Aiswarya, Radhika M. Pai
Game Theory-Based Defense Mechanisms of Cyber Warfare

Threat faced by wireless network users is not only dependant on their own security stance but is also affected by the security-related actions of their opponents. As this interdependence continues to grow in scope, the need to devise an efficient security solution has become challenging to the security researchers and practitioners. We aim to explore the potential applicability of game theory to model the strategic interactions between these agents. In this paper, the interaction between the attacker and the defender is modeled as both static and dynamic game and the optimal strategies for the players are obtained by computing the Nash equilibrium. Our goal is to refine the key insights to illustrate the current state of game theory, concentrating on areas relevant to security analysis in cyber warfare.

Monica Ravishankar, D. Vijay Rao, C. R. S. Kumar
Challenges Inherent in Building an Intelligent Paradigm for Tumor Detection Using Machine Learning Algorithms

Machine learning is at the heart of the big data rebellion sweeping the world today. It is the science of getting the computers to learn without being explicitly programmed as most of the technological systems are in an insurrection to be operated by intelligent machines capable to make the human like verdict to automatically solve human task with perfect results. Artificial intelligence is the heart of every major technological system in the world today. This paper presents the challenges faced to develop a model to acquiesce excellent results and the different techniques of machine learning; here, we also present the broad view of the current techniques used for detection of Brain tumor in computer-aided diagnosis and an innovative method for detection of Brain tumor by artificial intelligence using the algorithm of k-nearest neighbor which is established on the training a model with different values of k and the appropriate distance metrics are used for the distance calculation between pixels.

A. S. Shinde, V. V. Desai, M. N. Chavan
Segmentation Techniques for Computer-Aided Diagnosis of Glaucoma: A Review

Glaucoma is an eye disease in which the optic nerve head (ONH) is damaged, leading to irreversible loss of vision. Vision loss due to glaucoma can be prevented only if it is detected at an early stage. Early diagnosis of glaucoma is possible by measuring the level of intra-ocular pressure (IOP) and the amount of neuro-retinal rim (NRR) area loss. The diagnosis accuracy depends on the experience and domain knowledge of the ophthalmologist. Hence, automated extraction of features from the retinal fundus images can play a major role for screening of glaucoma. The main aim of this paper is to review the different segmentation algorithms used to develop a computer-aided diagnostic (CAD) system for the detection of glaucoma from fundus images, and additionally, the future work is also highlighted.

Sumaiya Pathan, Preetham Kumar, Radhika M. Pai
Performance Analysis of Information Retrieval Models on Word Pair Index Structure

This paper analyzes the performance of word pair index structure for various information retrieval models. Word pair index structure is the most efficient for solving contextual queries and it is a precision-enhancing structure. The selection of information retrieval model is very important as it precisely influences the outcome of information retrieval system. This paper analyzes the performance of different information retrieval models using word pair index structure. It is found that there is an increase in precision of 18% when compared with traditional inverted index structure, and recall is 8% in the inverted word pair index structure.The mean average precision is increased by 26%, and R-precision is increased by 20%.

N. Karthika, B. Janet
Fast Fingerprint Retrieval Using Minutiae Neighbor Structure

This paper proposes a novel fingerprint identification system using minutiae neighborhood structure. First, we construct the nearest neighborhood for each minutia in the fingerprint. In the next step, we extract the features such as rotation invariant distances and orientation differences from the neighborhood structure. Then, we use these features to compute the index keys for each fingerprint. During identification of a query, a nearest neighbor algorithm is used to retrieve the best matches. Further, this approach enrolls the new fingerprints dynamically. This approach has been experimented on different benchmark Fingerprint Verification Competition (FVC) databases and the results are promising.

Ilaiah Kavati, G. Kiran Kumar, Koppula Srinivas Rao
Key Leader Analysis in Scientific Collaboration Network Using H-Type Hybrid Measures

In research community, most of the research work is done by the group of researchers and the evaluation of scientific impact of individual is based on either citation-based metrics or centrality measures of social network. But both type of measures have its own impact in scientific evaluation, and the centrality measures are based on number of collaborators and their impact, whereas the citation-based metrics are based on the citation count of articles published by individual. For the evaluation of scientific impact of individual required a hybrid approach of citation-based index and centrality measure of social network analysis. In this article, we have discussed some of the h-type hybrid measures which is the combination of citation-based index and the centrality-based measures for scientific evaluation and find out the prominent leader in scientific collaboration network.

Anand Bihari, Sudhakar Tripathi
A Graph-Based Method for Clustering of Gene Expression Data with Detection of Functionally Inactive Genes and Noise

Noise that presents in gene expression data creates trouble in clustering for many clustering algorithms, and it is also observed that some non-functional genes may be present in the gene expression data that should not be the part of any cluster. A solution of this problem first removes the functionally inactive genes or noise and then clusters the remaining genes. Based on this solution, a graph-based clustering algorithm is proposed in this article which first identified the functionally inactive genes or noise and after that clustered the remaining genes of gene expression data. The proposed method is applied to a cell cycle data of yeast, and the results show that it performs well in identification of highly co-expressed gene clusters in the presence of functionally inactive genes and noise.

Girish Chandra, Akshay Deepak, Sudhakar Tripathi
OTAWE-Optimized Topic-Adaptive Word Expansion for Cross Domain Sentiment Classification on Tweets

The enormous growth of Internet usage, number of social interactions, and activities in social networking sites results in users adding their opinions on the products. An automated system, called sentiment classifier, is required to extract the sentiments and opinions from social media data. Classifier that is trained using the labeled tweets of one domain may not efficiently classify the tweets from another domain. This is a basic problem with the tweets as twitter data is very diverse. Therefore, Cross Domain Sentiment Classification is required. In this paper, we propose a semi-supervised domain-adaptive sentiment classifier with Optimized Topic-Adaptive Word Expansion (OTAWE) model on tweets. Initially, the classifier is trained on common sentiment words and mixed labeled tweets from various topics. Then, OTAWE algorithm selects more reliable unlabeled tweets from a particular domain and updates domain-adaptive words in every iteration. OTAWE outperforms existing domain-adaptive algorithms as it saves the feature weights after every iteration. This ensures that moderate sentiment words are not missed out and avoids the inclusion of weak sentiment words.

Savitha Mathapati, Ayesha Nafeesa, S. H. Manjula, K. R. Venugopal
DCaP—Data Confidentiality and Privacy in Cloud Computing: Strategies and Challenges

Cloud computing is one of the revolutionary technology for individual users due to its availability of on-demand services through the sharing of resources. This technology avoids the in-house infrastructure construction, cost of infrastructure and its maintenance. Hence individual users and organizations are moving towards cloud to outsource their data and applications. But many individuals are worried about the privacy and confidentiality of their data before sharing into cloud. The Cloud service provider has to ensure data security against unauthorized access in and out of the cloud computing infrastructure. Hence this paper identifies and describes the possible threats on Data confidentiality and privacy. The author also presents the anatomy and summary of general observations made on the available mechanisms to counter these threats.

Basappa B. Kodada, Demian Antony D’Mello
Design and Development of a Knowledge-Based System for Diagnosing Diseases in Banana Plants

Farmers usually find it difficult to identify and treat various diseases in banana plants (BPs) because it demands a wide spectrum of tacit knowledge. This situation motivated the authors to design and develop a technology-assisted knowledge base (KB) system for farmers, in order to diagnose and treat diseases in BPs. As a preliminary step towards building the KB, a set of images of diseases in BPs were taken from the manual published by Vegetable and Fruit Promotion Council Keralam (VFPCK). These sets of images were used to collect data from the agricultural experts and experienced farmers about the symptoms and remedies of various diseases in BPs. The data was collected from the participants by conducting semi-structured interview and then analysed to design the KB system. Since the diagnosis of diseases was a subjective process, an inter-rater reliability check was done on the data, using Cohen’s Kappa method. Then using this data, a KB system has been designed and developed as a mobile app named as ‘Ban-Dis’. An initial usability study has been conducted on this prototype among a few farmers, and their feedbacks have been recorded. The study results are promising and warrant further enhancements to the system. The KB system would be more beneficial as indicated by the farmers if the interface was in vernacular language.

M. B. Santosh Kumar, V. G. Renumol, Kannan Balakrishnan
A Review on Methods Applied on P300-Based Lie Detectors

Deceit identification or detection has become a topic of study from past few decades. Detecting lie is not only a legal issue, but also moral and ethical issue. Various lie detectors like polygraph have been developed which check body temperature, heart rate, pulse rate, blood pressure, etc., to detect whether a person is telling truth or not. But these polygraph tests give an indirect and incomplete knowledge of deception, so directly measuring mental activity of subject using brain–computer interface (BCI) was adopted which identifies the mental state of subject and detects lie. These lie detectors use P300 component of event-related potential (ERP) generated during mental task and acquired using electroencephalography (EEG). In our paper we have presented a survey on state-of-the-art techniques applied for analyzing and classifying innocent and guilty subjects.

Annushree Bablani, Diwakar Tripathi
Implementation of Spectral Subtraction Using Sub-band Filtering in DSP C6748 Processor for Enhancing Speech Signal

Implementation of novel algorithms on Digital Signal Processing [DSP] processor to extract the speech signal from a noisy signal is always of immense interest. Speech signals are usually complex that requires processing of signal in short frames, and thus DSP processors are widely used to process the speech signals in mobile phones. The performance of these devices is comparatively very well in noisy conditions, as compared with traditional processors. The speech signal is degraded by either echo or background noise and as a result, there exists a requirement of digital voice processor for human–machine interfaces. The chief objective of speech enhancement algorithms is to improve the performance of voice communication devices by boosting the speech quality and increasing the intelligibility of voice signal. Popular speech enhancement algorithms rely on frequency domain approaches to estimate the spectral density of noise. This paper proposes a method of assessment, wherein frequency components of noisy speech signal are filtered out using multiband filters. The multiband filters are developed in C6748 DSP processor. Experimental results demonstrate an improved Signal to Noise Ratio [SNR] with fewer computations.

U. Purushotham, K. Suresh
In-silico Analysis of LncRNA-mRNA Target Prediction

Long noncoding RNAs (lncRNAs) constitutes a class of noncoding RNAs which are versatile molecules and perform various regulatory functions. Hence, identifying its target mRNAs is an important step in predicting the functions of these molecules. Current lncRNA target prediction tools are not efficient enough to identify lncRNA-mRNA interactions accurately. The reliability of these methods is an issue, as interaction site detections are inaccurate quite often. In this paper our aim is to predict the lncRNA-mRNA interactions efficiently, incorporating the sequence, structure, and energy-based features of the lncRNAs and mRNAs. A brief study on the existing tools for RNA-RNA interaction helped us to understand the different binding sites, and after compiling the tools, we have modified the algorithms to detect the accessible sites and their energies for each interacting RNA sequence. Further RNAstructure tool is used to get the hybrid interaction structure for the accessible lncRNA and mRNA sites. It is found that our target prediction tool gives a better accuracy over the existing tools, after encompassing the sequence, structure, and energy features.

Deepanjali Sharma, Gaurav Meena
Energy Aware GSA-Based Load Balancing Method in Cloud Computing Environment

Cloud computing environment is based on a pay-as-you-use model, and it enables hosting of prevalent applications from scientific, consumer, and enterprise domains. Cloud data centers consume huge amounts of electrical energy, contributing to high operational costs to the organization. Therefore, we need energy efficient cloud computing solutions that cannot only minimize operational costs but also ensures the performance of the system. In this paper, we define a framework for energy efficient cloud computing based on nature inspired meta-heuristics, namely gravitational search algorithm. Gravitational search algorithm is an optimization technique based on Newton’s gravitational theory. The proposed energy-aware virtual machine consolidation provision data center resources to client applications improve energy efficiency of the data center with negotiated quality of service. We have validated our approach by conducting a performance evaluation study, and the results of the proposed method have shown improvement in terms of load balancing, energy consumption under dynamic workload environment.

Vijayakumar Polepally, K. Shahu Chtrapati
Relative Performance Evaluation of Ensemble Classification with Feature Reduction in Credit Scoring Datasets

Extensive research has been done on feature selection and data classification. But it is not clear which feature selection approach may result in better classification performance on which dataset. So, the comparative performance analysis is required to test the classification performance on the dataset along with feature selection approach. Main aim of this work is to use various feature selection approaches and classifiers for the evaluation of performances of respective classifier along with feature selection approach. Obtained results are compared in terms of accuracy and G-measure. As in many studies, it is shown that ensemble classifier has better performance as compared to individual base classifiers. Further, five heterogeneous classifiers are aggregated with the four ensemble frameworks as majority voting and weighted voting in single and multiple layers as well and results are compared in terms of accuracy, sensitivity, specificity, and G-measure on Australian credit scoring and German loan approval datasets obtained from UCI repository.

Diwakar Tripathi, Ramalingaswamy Cheruku, Annushree Bablani
Family-Based Algorithm for Recovering from Node Failure in WSN

Wireless sensor network (WSN) is an emerging technology. A sensor node runs on battery power. If power drains, then there is a possibility of node failure. Node failure in wireless sensor networks is considered as a significant phenomenon. It will affect the performance of the entire network. Recovering the network from node failure is a challenging mission. Existing papers proposed number of techniques for detecting node failure and recovering network from node failure. But the proposed work consists of an innovative family-based solution for node failure in WSN. In a family, there are n numbers of persons. If anyone is in sick, then another person/s of the same family will take over the responsibility until he/she recovered from illness. In the same way, the proposed family tree based algorithm is written in three level of hierarchy. It deals with who is taking care of the task until the low-power node is recovered. This work is simulated using Network Simulator 2 (NS2) and is compared with two existing algorithms, namely LeDiR algorithm and MLeDir algorithm. The simulation results show that the proposed method gives better performance than the existing methods in terms of delivery ratio, end-to-end delay, and dropping ratio.

Rajkumar Krishnan, Ganeshkumar Perumal
Classification-Based Clustering Approach with Localized Sensor Nodes in Heterogeneous WSN (CCL)

In wireless sensor network, random and dense deployment of sensor nodes results in difficulties for sink node to detect the location of them without GPS. The inclusion of GPS for all sensor nodes increases the deployment cost. Energy is another constraint in wireless sensor network during data forwarding. In this paper, the proposed protocol CCL has applied the modified version of DV-hop technique to detect the location of sensor nodes without using GPS. Here, event-based clustering is designed to save the energy of nodes, which is classified using support vector machine. Packet is forwarded to the sink node by greedy forwarding technique. Packet loss is also removed by involving an antivoid approach called twin rolling ball technique. Simulation results show that the performance of CCL is enhanced with compared to LEACH, HEED, EEHC, DV-hop, and advanced DV-hop.

Ditipriya Sinha, Ayan Kumar Das, Rina Kumari, Suraj Kumar
Multimodal Biometric Authentication System Using Local Hand Features

In this work, the hand-based multimodal biometric system is presented using score-level fusion of hand geometry and local palmprint features. Initially, a palm ROI of fixed size has been cropped on the basis of finger base points. However, these images are not well aligned and reduce the matching accuracy. To better align them, L-K tracking-based palm image alignment method has been presented. Following this, the poor contrast ROI image is enhanced using novel fractional G-L filter. Then, local keypoints of aligned ROI images are extracted using Block–SIFT descriptor. Secondly, a set of novel geometrical features has been computed from Palmer region of hand image. Further, the highly uncorrelated features are selected from palm and hand geometry using Dia-FLD. In order to handle robust classification, a high-performance method Linear SVM has been used. Finally, score-level fusion rule has been employed which has shown the increased performance of combined approach in terms of Correct Recognition Rate (99.34%), Equal Error Rate (2.16%), and Computation Time (2084 ms). The proposed system has been tested on largest publicly available contact based and contactless databases: Bosphorus hand database, CASIA, and IITD palmprint databases to validate the results.

Gaurav Jaswal, Amit Kaul, Ravinder Nath
Automatic Semantic Segmentation for Change Detection in Remote Sensing Images

Change detection (CD) mainly focuses on the extraction of change information from multispectral remote sensing images of the same geographical location for environmental monitoring, natural disaster evaluation, urban studies, and deforestation monitoring. While capturing the Landsat imagery, there may occur data missing issues such as occlusion of cloud, camera sensor, and aperture artifacts. The existing machine learning approaches do not provide significant results. This paper proposes a DeepLab Dilated convolutional neural network (DL-DCNN) for semantic segmentation with the goal to occur the change map for earth observation applications. Experimental results reveal that the accuracy of the proposed change detection results provides improved results as compared with the existing algorithms and maps the semantic objects within the predefined class as change or no change.

Tejashree Kulkarni, N Venugopal
A Model for Determining Personality by Analyzing Off-line Handwriting

Handwriting analysis is the scientific method or way of determining or understanding or predicting the personality or behavior of a writer. Graphology or graph analysis is the scientific name of handwriting analysis. Handwriting often called as brain writing or mind writing, since it is a system of studying the frozen graphic structures which have been generated in the brain and placed on paper in a printed or cursive style. Many things can be revealed from handwriting such as anger, morality, fears, past experience, hidden talents, mental problems. Handwriting is different from person to person. People are facing various psychological problems. Teenagers also face so many mental problems. Criminals can be detected by using handwriting analysis. Counselor and mentor can also use this tool for giving advice to clients. Proposed work contains three main steps: image preprocessing, identification of handwriting features, and mapping of identified features with personality traits. Image pre-processing is the technique in which the handwriting sample is translated into a format which can be easily and efficiently processed in further steps. These steps involve noise removal, grayscale, thresholding, and image morphology. In feature identification, there is an extraction of handwriting features. Three features of handwriting are extracted that are left margin, right margin, and word spacing. Lastly, extracted features are mapped with personality using the rule-based technique. The personality of a writer with respect to three handwriting features is displayed. The proposed system can predict 90% accurate personality of the person.

Vasundhara Bhade, Trupti Baraskar
Wavelength-Convertible Optical Switch Based on Cross-Gain Modulation Effect of SOA

All-optical switching based on wavelength conversion using cross-gain modulation (XGM) effect of semiconductor optical amplifier (SOA) has been proposed and demonstrated for 10 Gbps NRZ modulated data signals. Error-free operation is successfully achieved for converted signal with Q-factor of >28.96 at optimum input probe power of −8 dBm. The proposed simple and cost-effective structure of optical switch can be utilized for future ultra-fast optical switching circuit and to expand the optical network.

Sukhbir Singh, Surinder Singh
Contrast Enhancement Algorithm for IR Thermograms Using Optimal Temperature Thresholding and Contrast Stretching

IR thermography is a noninvasive and non-contact type radiometric technique which creates the 2D thermal images based on infrared radiations. Usually, these are gray-level images which provide poor color contrast. However, various pseudo-coloring algorithms are available to transform these images into RGB space, but contrast enhancement is still required for better visualization of thermograms. In this study, the non-training contrast enhancement algorithm is proposed for IR thermograms. The contrast enhancement in this proposed methodology is achieved by: (i) eliminating the background interference using optimal temperature thresholding and (ii) color enhancement using decorrelation contrast stretching. The performance of proposed methodology has been evaluated based on variations in entropy values. The increasing trend in entropy values indicates the contrast enhancement achieved by using this method.

Jaspreet Singh, Ajat Shatru Arora
Data Deduplication and Fine-Grained Auditing on Big Data in Cloud Storage

The computing expedient and indulgence are made available in cloud servers by redistributing innumerable resources over the cyberspace. The utmost hefty on-demand services in cloud are data storage. In the world of technocrats, there is a colossal use of national information infrastructure from where an immense amount of data is produced in day-to-day life. To handle those prodigious data on demand is a challenging chore for current data storage systems. The prominence of data deduplication (dedupe) is pointed out by data explosion and colossal slew in redundant data. In the proposed scheme, source-based deduplication is used to eliminate duplicate data, where the client check for the unique data in local (or) remote index through the backup of lower network bandwidth with fast and lower computation overhead. Firstly, in source-based deduplication the data are stored in the physical memory and the fragments of the data are cuckoo hashed before storing the data their physical memory. Secondly, the cloud correctness of data and security is a prime concern, and it is achieved by signing the data block before sending it to the server. And the proposed scheme guarantees the data integrity by fine-grained auditing using Boneh–Lynn–Shacham (BLS) algorithm for signing process, which is one of the secured algorithms. The homomorphic authentication with random masking technique is used to attain privacy-preserving and public auditing.

RN. Karthika, C. Valliyammai, D. Abisha
Retraction Note to: In-silico Analysis of LncRNA-mRNA Target Prediction
Deepanjali Sharma, Gaurav Meena
Backmatter
Metadata
Title
Advances in Machine Learning and Data Science
Editors
Dr. Damodar Reddy Edla
Prof. Dr. Pawan Lingras
Dr. Venkatanareshbabu K.
Copyright Year
2018
Publisher
Springer Singapore
Electronic ISBN
978-981-10-8569-7
Print ISBN
978-981-10-8568-0
DOI
https://doi.org/10.1007/978-981-10-8569-7

Premium Partner