Skip to main content

2019 | Buch

Computational Intelligence in Data Mining

Proceedings of the International Conference on CIDM 2017

herausgegeben von: Prof. Dr. Himansu Sekhar Behera, Dr. Janmenjoy Nayak, Dr. Bighnaraj Naik, Prof. Dr. Ajith Abraham

Verlag: Springer Singapore

Buchreihe : Advances in Intelligent Systems and Computing

insite
SUCHEN

Über dieses Buch

The International Conference on “Computational Intelligence in Data Mining” (ICCIDM), after three successful versions, has reached to its fourth version with a lot of aspiration. The best selected conference papers are reviewed and compiled to form this volume. The proceedings discusses the latest solutions, scientific results and methods in solving intriguing problems in the fields of data mining, computational intelligence, big data analytics, and soft computing. The volume presents a sneak preview into the strengths and weakness of trending applications and research findings in the field of computational intelligence and data mining along with related field.

Inhaltsverzeichnis

Frontmatter
BER Performance Analysis of Image Transmission Using OFDM Technique in Different Channel Conditions Using Various Modulation Techniques

It can be clearly seen in the modern world of digital era that we depend a lot on mobile/smartphones which in return depends on data/information to proof its worthy. To meet this increased demand of data rate, new technology is needed as for old technology like GPRS and EDGE which is beyond their capacity. So orthogonal frequency division multiplexing (OFDM) takes this place as it increases the data rate in the same bandwidth, as it is also fixed. In this paper, we present a comparison between data handling using different modulations, which in turn can be assumed for bit error rate (BER) and signal-to-noise ratio (SNR) curve, of non-OFDM and OFDM channel under different fading environments.

Arun Agarwal, Binayak Satish Kumar, Kabita Agarwal
On Understanding the Release Patterns of Open Source Java Projects

Release length is of great significance to companies as well as to researchers as it provides a deeper insight into the rules and practices followed by the applications. It has been observed that many Open Source projects follow agile practices of parallel development and Rapid Releases (RR) but, very few studies till date, have analyzed release patterns of these Open Source projects. This paper analyzes ten Open Source Java projects (Apache Server Foundation) comprising 718 releases to study the evolution of release lengths. The results of the study show that: (1) eight out of ten datasets followed RR models. (2) None of these datasets followed RR models since their first release. (3) The average release length was found to be four months for major versions and one month for minor versions (exceptions removed). (4) There exists a negative correlation between number of contributors and release length.

Arvinder Kaur, Vidhi Vig
A Study of High-Dimensional Data Imputation Using Additive LASSO Regression Model

With the rapid growth of computational domains, bioinformatics finance, engineering, biometrics, and neuroimaging emphasize the necessity for analyzing high-dimensional data. Many real-world datasets may contain hundreds or thousands of features. The common problem in most of the knowledge-based classification problems is quality and quantity of data. In general, the common problem with many high-dimensional data samples is that it contains missing or unknown attribute values, incomplete feature vectors, and uncertain or vague data which have to be handled carefully. Due to the presence of a large segment of missing values in the datasets, refined multiple imputation methods are required to estimate the missing values so that a fair and more consistent analysis can be achieved. In this paper, three imputation (MI) methods, mean, imputations predictive mean, and imputations by additive LASSO, are employed in cloud. Results show that imputations by additive LASSO are the preferred multiple imputation (MI) method.

K. Lavanya, L. S. S. Reddy, B. Eswara Reddy
An Efficient Multi-keyword Text Search Over Outsourced Encrypted Cloud Data with Ranked Results

Cloud computing offers efficient deployment options that motivate large enterprises to outsource the data to the cloud. However, outsourcing sensitive information may compromise the privacy of the data. To enable keyword-based search over encrypted data, we proposed a multi-keyword search scheme on a tree-based encrypted index data structure to retrieve information from encrypted cloud data. In this model, the document collection is clustered using a hierarchical k-means method. A vector space model was used to create an encrypted index and query vectors, and a depth-first search algorithm is proposed for efficient search mechanism. The results were ranked based on relevance score between the encrypted index and query vectors. Rigorous experiments show the performance and efficiency of the proposed methods.

Prabhat Keshari Samantaray, Navjeet Kaur Randhawa, Swarna Lata Pati
Robust Estimation of IIR System’s Parameter Using Adaptive Particle Swarm Optimization Algorithm

This paper introduces a novel method of robust parameter estimation of IIR system. When training signal contains strong outliers, the conventional squared error-based cost function fails to provide desired performance. Thus, a computationally efficient robust Hubers cost function is used here. As we know that the IIR system falls in local minima, gradient-based algorithm cannot be used. Therefore, the parameters of the IIR system are estimated using adaptive particle swarm optimization algorithm with Hubers cost function. The simulation results show that the proposed algorithm provides better performance than Wilcoxon norm-based robust algorithm and conventional error squared based PSO algorithm.

Meera Dash, Trilochan Panigrahi, Renu Sharma
A Shallow Parser-based Hindi to Odia Machine Translation System

This paper describes a Hindi to Odia machine translation system developed using a popular open-source platform called Apertium. With population of over 1.27 billion, 18 officially recognized languages, 30 regional languages, and over 2000 dialects, the multilingual society of India needs well-developed ICT tools for the citizens to exchange and share information and knowledge between them easily. Though Hindi is the national language of India, still a lot of people of Odisha are unable to understand the information written in Hindi. In this scenario, a suitable Hindi to Odia machine translation system will help the people to understand and use Hindi in a more productive way. For development of such a machine translation system, we decided to use the Apertium platform due to several reasons. It is well suited for building machine translation systems between closely related language pairs, such as Hindi and Odia due to its shallow parser level transfer modules. The use of FST in all the modules makes this much faster as compared to other shallow parser-based platforms. Also, it is available in GPL license under free open-source software. In this paper, we have also demonstrated the linguistic and computational challenges in building linguistic resources for both Hindi and Odia languages. Specifically, the use of TAM (Tense, Aspect, and Modality) concept in transfer module is a unique approach for building transfer rules between Hindi and Odia in Apertium platform. This work can be easily extended to develop MT systems for other Indian language pairs easily.

Jyotirmayee Rautaray, Asutosh Hota, Sai Sankar Gochhayat
Coherent Method for Determining the Initial Cluster Center

Several aspects of research works are now carried out on clustering of objects where the main focus is on finding the near-optimal cluster centers and obtaining the best possible clusters into which the objects fall into so that the desired expectations are met. This is because a bad selection of cluster center may result in dragging a data very far away from its actual cluster resulting in deficient clustering. Hence, we have accentuated on determining the near-optimal cluster centers and also position the data in their real clusters. We have explored three kinds of clustering techniques, viz. K-Means, FEKM-, and TLBO-based clusterings applied on quite a few data sets. Analysis was made considering two factors, namely cluster validation and average quantization error. Dunn’s index, Davies–Bouldin index, silhouette coefficient, and C index were used for quantitative evaluation of the clustering results. As per our anticipation, almost all validity indices provide promising outcome for both FEKM- and TLBO-based clusterings than K-Means inferring superior cluster formation. Further tests support that FEKM- and TLBO-based clustering has smaller value of quantization error than K-Means.

Bikram Keshari Mishra, Amiya Kumar Rath
Digital Image Watermarking Using (2, 2) Visual Cryptography with DWT-SVD Based Watermarking

It has become earnest necessity to protect the digital multimedia content for the possessors of the documents and service providers. Watermarking is such a technique which helps us to attain copyright protection. The concerned literature includes various methods which help to embed information into various multimedia elements like images, audio, and video. In the given paper, we have reviewed DWT-SVD watermarking technique for image watermarking. We have projected a new algorithm for image watermarking using visual cryptography that generates two shares with DWT-SVD. The scheme is highly protected and vital to the image processing assault. The following material also gives an insight into implementation of the contained algorithm step-by-step and shows the future prospects.

Kamal Nayan Kaur, Divya, Ishu Gupta, Ashutosh Kumar Singh
Modeling of Nexa-1.2kW Proton Exchange Membrane Fuel Cell Power Supply Using Swarm Intelligence

The heuristic approach of simulator design based on swarm intelligence of Nexa-1.2kW Ballard proton exchange membrane fuel cell (PEMFC) has been presented. The parameters of the Nexa-1.2kW PEMFC simulator are determined using particle swarm optimization (PSO) algorithm. The results of PEMFC simulator are experimentally verified. Further, the discrete PI controlled SEPIC converter has been used for interconnecting a fuel cell to a load. The fuel cell simulator, converter integration, and its control are implemented in MATLAB/SIMULINK environment. Finally, the effect of load variation and stack temperature on fuel cell power conditioning unit has been investigated. The rise in stack temperature results in slight reduction in cell current and considerable rise in terminal voltage of the fuel cell.

Tata Venkat Dixit, Anamika Yadav, Shubhrata Gupta
Survey of Different Load Balancing Approach-Based Algorithms in Cloud Computing: A Comprehensive Review

The Internet has become the basic necessity of day-to-day activity. It has a greater impact in modernizing the digital world. Consequently, cloud computing is one of the promising technical advancements in recent days. It is widely adopted by the different community for its abundant opportunities. It provides services and resources on ad hoc basis. Still, it has numerous issues related to resource provisioning, security, real-time data access, event content dissemination, server consolidation, virtual machine migration. These issues are to be addressed and resolved to provide a better quality of service in this computing paradigm. Load balancing is one of the vexing issues in the cloud platform. It ensures reliability and availability in this computing environment. It increases the efficiency of the system by equally distributing the workload among competing processes. The primary goal of load balancing is to minimize response time, cost, and maximize throughput. In the past decades, researchers have proposed different methodologies in order to resolve this issue. However, different load balancing parameters are yet to be optimized. This survey paper presents a comprehensive and comparative study of various load balancing algorithms. The study also portrays the merits and demerits of all the state-of-the-art-schemes which may prompt the researchers for further improvement in load balancing algorithms.

Arunima Hota, Subasish Mohapatra, Subhadarshini Mohanty
Analysis of Credit Card Fraud Detection Using Fusion Classifiers

Credit card fraud detection is a critical problem that has been faced by online vendors at the finance marketplace every now and then. The rapid and fast growth of the modern technologies causes the fraud and heavy financial losses for many financial sectors. Different data mining and soft computing-based classification algorithms have been used by most of the researchers, and it plays an essential role in fraud detection. In this paper, we have analyzed some ensemble classifiers such as Bagging, Random Forest, Classification via Regression, Voting and compared them with some effective single classifiers like K-NN, Naïve Bayes, SVM, RBF Classifier, MLP, Decision Tree. The evaluation of these algorithms is carried out through three different datasets and treated with SMOTE, to deal with the class imbalance problem. The comparison is based on some evaluation metrics like accuracy, precision, true positive rate or recall, and false positive rate.

Priyanka Kumari, Smita Prava Mishra
An Efficient Swarm-Based Multicast Routing Technique—Review

Multicast routing is emerging as a popular communication format for networks where a sender sends the same data packet to multiple nodes in the network simultaneously. To support this, it is important to construct a multicast tree having minimal cost for every communication session. But, because of dynamic and unpredictable environment of the network, multicast routing turns into a combinatorial issue to locate a best path connecting a source node and destination node having minimum distance, delay and congestion. To overcome this, various multicast conventions have been proposed. As of late, swarm and evolutionary techniques such as ant colony optimization (ACO), particle swarm optimization (PSO), artificial bee colony (ABC) and genetic algorithm (GA) have been adopted by the researchers for multicast routing. Out of these, ACO and GA are most popular. This paper shows an important review of existing multicast routing techniques along with their advantages and limitations.

Priyanka Kumari, Sudip Kumar Sahana
A New Howard–Crandall–Douglas Algorithm for the American Option Problem in Computational Finance

The unavailability of a closed-form formula for the American option price means that the price needs to be approximated by numerical techniques. The valuation problem can be formulated either as a linear complementarity problem or a free-boundary value problem. Both approaches require a discretisation of the associated partial differential equation, and it is common to employ standard second-order finite difference approximations. This work develops a new procedure for the linear complementarity formulation. Howard’s algorithm is used to solve the discrete problem obtained through a higher-order Crandall–Douglas discretisation. Speed and error comparisons indicate that this approach is more efficient than the procedures for solving the free-boundary value problem.

Nawdha Thakoor, Dhiren Kumar Behera, Désiré Yannick Tangman, Muddun Bhuruth
Graph Anonymization Using Hierarchical Clustering

Privacy preserving data publication of social network is an emerging trend that focuses on the dual concerns of information privacy and utility. Privacy preservation is essential in social networks as social networks are abundant source of information for studying the behavior of the social entities. Social network disseminates its information through social graph. Anonymization of social graph is essential in data publication to preserve the privacy of participating social entities. In this paper, we propose a hierarchical clustering-based approach for k-degree anonymity. The attack model focuses on identity disclosure problem. Our approach unlike other approach discussed in Liu and Terzi (Proceedings of ACM SIGMOD, 2008, [1]) generates k-degree anonymous sequence with the k value. Havel–Hakimi algorithm is used to check the sequence is graphic or not. Subsequently, the construction phase takes place with the help of edge addition operation.

Debasis Mohapatra, Manas Ranjan Patra
Reevaluation of Ball-Race Conformity Effect on Rolling Element Bearing Life Using PSO

Longest fatigue life is one of the most decisive criteria for design of rolling element bearing. However, the lifetime of bearing will depend on more than one numbers of explanations like fatigue, lubrication, and thermal traits. Within the present work goals, specifically the dynamic load capability, life factors, and life of bearing have been optimized utilizing a optimization algorithm centered upon particle swarm optimization (PSO). Here, life factors are being represented based on reliability, materials, and processing and operating conditions. Also from the reliability concepts, strict series system is considered which depicts the total bearing system. A convergence study has been performed to make certain the most desirable factor in the design. The most suitable design outcome shows the effectiveness and efficiency of algorithm.

S. N. Panda, S. Panda, D. S. Khamari, P. Mishra, A. K. Pattanaik
Static Cost-Effective Analysis of a Shifted Completely Connected Network

The computational power challenges have been increased in the contemporary era, and it motivated the scientist community to find alternative choices to replace the current ones. Systems using the conventional computing power became infeasible to cope with the grand computing problems. Therefore, building a system with special characteristics became the main concern of the research work in this area. As a result, multiprocessor systems have been revealed to manipulate the computing tasks in parallel and concurrently, leading to massively parallel computers (MPCs) which have been spread widely as an adopted solution to be used in solving the complex computing challenges. The structure of underlying interconnection network of these systems plays the main role in improving the overall performance, and in controlling the cost of the system. Thus, many topologies of these networks have been presented in order to find the optimal one. In this paper, we present the architecture of a new hierarchical interconnection network (HIN) called shifted completely connected network (SCCN). This network has been described previously, and the static network performance of this network has been evaluated in previous studies. The main focus of this paper is to analyze the static cost-effective parameter of SCCN which can be calculated from the relation between the static parameters.

Mohammed N. M. Ali, M. M. Hafizur Rahman, Dhiren K. Behera, Yasushi Inoguchi
Dynamic Notifications in Smart Cities for Disaster Management

The smart city is how citizens are shaping the city by using technology, and how citizens are enabled to do so by getting the support of city’s government. Diverse data are collected on a regular basis by satellites, wireless and remote sensors, national meteorological and geological departments, NGOs, and various other international, government, and private bodies, before, during, and after the disaster. Data analytics can leverage such data deposit and produce insights which can then be transformed into enhanced services. Disasters are sudden and calamitous events that can cause severe and pervasive negative impacts on society and huge human losses. It causes enormous evil impact on society. The proposed system is based on disaster management scenario for avoiding negative impacts on society and huge human losses. The system is providing an alert to people leaving in particular area as well as in nearby area. The system is based on the activities on social media during disaster. This system tries to help society by using the information revealed by them only by collecting the data or messages spread by the people suffering from the disaster or the people who have an idea about its occurrence. It will help people to save themselves as well as possibly other living and non-living things that come in society. The system will send the alert message to a particular area or the people who come under that particular area so that people can save their lives as well as their time and other things depending on the types of disaster occur.

Sampada Chaudhari, Amol Bhagat, Nitesh Tarbani, Mahendra Pund
Application of Classification Techniques for Prediction and Analysis of Crime in India

Due to dramatic increase of crime rate, human skills for accessing the massive volume of data is about to diminish. So application of several data mining techniques can be beneficial for achieving insights on the crime patterns which will help the law enforcement prevent the crime with proper crime prevention strategies. This present work collects crime records for kidnapping, murder, rape and dowry death and analyses the crime trend in Indian states and union territories by applying various classification techniques. Analysing the crime would be much easier by the prediction rates shown in this work, and the effectiveness of these techniques is evaluated by accuracy, precision, recall and F-measure. This work also describes a comparative study for different classification algorithms used.

Priyanka Das, Asit Kumar Das
Improving Accuracy of Classification Based on C4.5 Decision Tree Algorithm Using Big Data Analytics

C4.5 is an algorithm of decision tree that broadly used classification technique. There are many challenges in the era of big data like size, time, and cost for building a decision tree. Aim of the decision tree construction is to boost up the accuracy on the training data. In predictive modeling, it requires to split the training datasets for this MATLAB is a good choice. Also analysis of data is done easily by decision tree instead of heterogeneous data. In this paper, C4.5 is implemented with the help of MATLAB using four different datasets which provides a confusion matrix in terms of target and output classes. At the end, it compared the features of datasets. The main objective of this research is to boost up the classification accuracy and roll back timing to build a classification model. We have reduced input space using Bhattacharya distance. The proposed method shows better performance for the data file. With the help of BD, improved C4.5 is performing better than original C4.5 in every test case.

Bhavna Rawal, Ruchi Agarwal
Comparative Study of MPPT Control of Grid-Tied PV Generation by Intelligent Techniques

A grid-tied photovoltaic (PV) system with boost converter is considered for study here. The maximum power point tracking (MPPT) control on the duty cycle of the boost converter is achieved by intelligent techniques such as grey wolf optimization (GWO), Moth-Flame optimization (MFO) and compared with perturb and observe (P&O) method. The proposed approach of MFO reduces the ripples in power, voltage and current and imparts better efficiency under different configurations as compared to latest literature for a similar approach.

S. Behera, D. Meher, S. Poddar
Stability Analysis in RECS-Integrated Multi-area AGC System with SOS Algorithm Based Fuzzy Controller

This paper aims toward coordination between generation and demand of electric power, which is termed as automatic generation control (AGC). A wind energy conversion system (WECS)-based doubly fed induction generator (DFIG) integrated with two equal areas conventional thermal generation was proposed. A fuzzy-Proportional Integral Derivative (fuzzy-PID) controller was used for stabilizing deviation in frequency (∆f) and tie-line power (∆Ptie). The gains of fuzzy-PID and DFIG controller are tuned optimally using a multi-objective optimization technique called symbiotic organism search (SOS) algorithm. In addition, the dynamic response and accuracy of system under study was investigated using integral of time multiplied absolute error (ITAE). The performance of fuzzy-PID controller was compared with conventional PID, PI, and fuzzy-PI controller in terms of settling time and peak overshoot. Finally, it was observed experimentally that the proposed SOS optimized fuzzy-PID controller gives superior dynamic and robust performance as compared to other controllers under various operating conditions.

Prakash Chandra Sahu, Ramesh Chandra Prusty, Sidhartha Panda
Log-Based Reward Field Function for Deep-Q-Learning for Online Mobile Robot Navigation

Path planning is one of the major challenges while designing a mobile robot. In this paper, we implemented Deep-Q-Learning algorithms for autonomous navigation task in wheel mobile robot. We proposed a log-based reward field function to incorporate with Deep-Q-Learning algorithms. The performance of the proposed algorithm is verified in simulated environment and physical environment. Finally, the accuracy of the performance of the obstacle avoidance ability of the robot is measured based on hit rate metrics.

Arun Kumar Sah, Prases K. Mohanty, Vikas Kumar, Animesh Chhotray
Integrated Design for Assembly Approach Using Ant Colony Optimization Algorithm for Optimal Assembly Sequence Planning

To reduce the assembly efforts and cost of the assembly, researchers are motivated to reduce the part number by applying design for assembly (DFA) concept. The so far existed literature review has no generalized method to obtain optimum assembly sequence by incorporating the DFA concept. Even though the DFA concept is applied separately, still it demands high-skilled user intervention to obtain optimum assembly sequence. As the assembly sequence planning (ASP) is NP-hard and multi-objective optimization problem, it requires more computational time and huge search space. In this paper, an attempt is made to combine DFA concept along with ASP problem to obtain optimum assembly sequence. Ant colony optimization algorithm (ACO) is used for combining DFA and ASP problem by considering directional changes as fitness function to obtain optimum feasible assembly sequences. Generally, the product with ‘N’ parts consists of N − 1 levels during assembly, which are reduced by applying DFA concept. Later on, optimum assembly sequence can be obtained for the reduced levels of assembly using different assembly predicates.

G. Bala Murali, B. B. V. L. Deepak, B. B. Biswal, Bijaya Kumar Khamari
Design and Performance Evaluation of Fractional Order PID Controller for Heat Flow System Using Particle Swarm Optimization

The purpose of this paper is to apply a natured inspired algorithm called as Particle Swarm Optimization (PSO) for the design of fractional order proportional-integrator-derivative (FOPID) controller for a heat flow system. For the design of FOPID controller, the PSO algorithm is considered as a designing tool for obtaining the optimal values of the controller parameter. To obtain the optimal computation, different performance indices such as IAE (Integral Absolute Error), ISE (Integral Squared Error), ITAE (Integral Time Absolute Error), ITSE (Integral Time Squared Error) are considered for the optimization. All the simulations are carried out in Simulink/Matlab environment. The proposed method has shown better result in both in transient and frequency domain as compared to other published works.

Rosy Pradhan, Susmita Pradhan, Bibhuti Bhusan Pati
Land Records Data Mining: A Developmental Tool for Government of Odisha

‘Data mining’ is the method of extracting valuable information from the large data sets. It may be called as knowledge mining from data. Nowadays, Data Analytics and Business Intelligence are focused on exploring useful information from the databases created for different purposes. One such database created for Land Records System of Odisha is ‘Bhulekh.’ The data of land properties are safeguarded by the government in Revenue and Disaster Management department. These data are very sensitive, voluminous, and quite unstructured in nature. Regional language ‘Odia’ is used for preparation of Record of Rights (RoR). Thus, Bhulekh Database of Odisha contains data in Odia language. Government of India at national level takes steps to provide better service in Land Records area to the public through its Digital India Land Records Management Programme (DILRMP). Earlier this programme was known as National Land Records Modernisation Programme (NLRMP). With the support from Government of India, Government of Odisha started computerizing its Land Records. The Bhulekh database created for the purpose contains 1.47 crore Khatiyans, 3.23 crore Tenants, and 5.47 crore Plots for 51681 villages of Odisha. Besides, the textual data, it also contains cadastral maps in another database known as ‘BhuNaksha.’ There is linkage between Bhulekh and BhuNaksha for spatial and non-spatial data integration for better service of the citizens. This helps to get the data easily from any corner in the globe. This paper discusses how data mining approach is used on Bhulekh for socioeconomic development of the society. Further, this helps the Government to take decisions, better manage government lands and resolving issues in time.

Pabitrananda Patnaik, Subhashree Pattnaik, Prashant Kumar Pramanik
Piecewise Modeling of ECG Signals Using Chebyshev Polynomials

An electrocardiogram (ECG) signal measures electrical activity of the heart which is used for cardiac-related issues. The morphology of these signals is affected by artifacts during acquisition and transmission which prevents accurate diagnosis. Also a typical ECG monitoring device generates massive volume of digital data which require huge memory and large bandwidth. So there is a need to effectively compress these signals. In this paper, a piecewise efficient model to compress ECG signals is proposed. The model is designed to perform three successive steps: denoising, segmentation, and approximation. Preprocessing is done through total variation denoising technique to reduce noise, while bottom-up time-series approach is implemented to divide the signals into various segments. The individual segments are then approximated using Chebyshev polynomials. The proposed model is compared with other compression models in terms of maximum error, root mean square error, percentage root mean difference, and normalized percentage root mean difference showing significant improvements in performance parameters.

Om Prakash Yadav, Shashwati Ray
Analysis of Supplementary Excitation Controller for Hydel Power System GT Dynamic Using Metaheuristic Techniques

There are different types of disturbances in power system such as switching, transient, load variations, which affect stability and efficiency of the power system. These disturbances cause fluctuation at low frequency that are unacceptable, which decreases the power transfer capability in the transmission line and unstable mechanical shaft load. In order to compress low-frequency oscillations, a common solution is to use the Power System Stabilizer (PSS). The Proportional, Derivative, and Integral (PID) controller has the ability to minimize both settling time and the maximum overshoot. In this paper, design of a Proportional, Derivative, and Integral (PID)-based Power System Stabilizer (PSS) and different techniques for tuning of PID-PSS controller are proposed. The parameter of the PID-PSS has been tuned by the Genetic Algorithm (GA), Ant Colony Optimization (ACO), and Firefly Algorithm (FFA) based optimization techniques. Solution results indicate that the performance of Firefly Algorithm (FFA) based PID-PSS controller is much better than the GA and Ant Colony Optimization based PID-PSS controller.

Mahesh Singh, R. N. Patel, D. D. Neema
Farthest SMOTE: A Modified SMOTE Approach

Class imbalance problem comprises of uneven distribution of data/instances in classes which poses a challenge in the performance of classification models. Traditional classification algorithms produce high accuracy rate for majority classes and less accuracy rate for minority classes. Study of such problem is called class imbalance learning. Various methods are used in imbalance learning applications, which modify the distribution of the original dataset by some mechanisms in order to obtain a relatively balanced dataset. Most of the techniques like SMOTE and ADASYN proposed in the literature use oversampling approach to handle class imbalance learning. This paper presents a modified SMOTE approach, i.e., Farthest SMOTE to solve the imbalance problem. FSMOTE approach generates synthetic samples along the line joining the minority samples and its ‘k’ minority class farthest neighbors. Further, in this paper, FSMOTE approach is evaluated on seven real-world datasets.

Anjana Gosain, Saanchi Sardana
DKFCM: Kernelized Approach to Density-Oriented Clustering

In this chapter, we have proposed a new clustering algorithm: density-oriented kernel-based FCM (DKFCM). It uses kernelized approach for clustering after identifying outliers using density-oriented approach. We have used two types of kernel functions for the implementation of DKFCM—Gaussian function and RBF function—and compared its result with other fuzzy clustering algorithms such as fuzzy C-means (FCM), kernel fuzzy C-means (KFCM), and density-oriented fuzzy C-means (DOFCM) to show the effectiveness of the proposed algorithm. We have demonstrated the experimental performance of these algorithms on two standard datasets: DUNN and D15.

Anjana Gosain, Tusharika Singh
Comparative Study of Optimal Overcurrent Relay Coordination Using Metaheuristic Techniques

For the protection of power system, Directional Overcurrent Relays (DOCRS) are generally used as an economical means for protection in sub-transmission and distribution systems. Thus, it is required that relay has minimum operating time on the occurrence of fault keeping the operation coordinated with other relays. In our paper, a comparative analysis for optimal overcurrent relay coordination based on different metaheuristic techniques such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Cuckoo Search Algorithm (CSA), and Firefly Algorithm (FFA) is analyzed. The various heuristic techniques are executed on different number of relays for constrained systems and the method which gives the operating time with minimum iterations and time is found.

Shimpy Ralhan, Richa Goswami, Shashwati Ray
Prediction of Gold Price Movement Using Discretization Procedure

Accurate prediction of commodity prices by using machine learning techniques is considered as a significant challenge by the researchers and investors alike. The main objective of the proposed work is to highlight that discretized features provide more accuracy compared to the continuous features for predicting the gold price movement in either positive or negative direction. This work utilizes three unique techniques for measuring performance of the discretization procedure. These techniques are “percentage of accuracy”, “receiver operating characteristics or ROC” and “the area under the ROC curve or AUC.”

Debanjan Banerjee, Arijit Ghosal, Imon Mukherjee
Language Discrimination from Speech Signal Using Perceptual and Physical Features

Humans are the most authoritative language identifiers in the province of speech recognition. They can determine within a glimpse of second whether the language of the hearing speech is known to them or not. This eventuality has come true because of the basic discrimination in the sound pattern characteristics based on frequency, time and perceptual domain. This certitude has provoked the motivation to introduce a potent scheme for the proposed plan. The proposed work includes the identification of three well-spoken languages in India that is English, Hindi and Bengali. The scheme has been encountered using some well known perceptual feature such as pitch along with some physical features like zero-crossing rate (ZCR) of the audio signal. To generate the feature set more efficient, the proposed effort has adopted mel-frequency cepstral coefficients (MFCCs) and the statistical textural features by calculating co-occurrence matrix from MFCC.

Ghazaala Yasmin, Ishani DasGupta, Asit K. Das
Constructing Fuzzy Type-I Decision Tree Using Fuzzy Type-II Ambiguity Measure from Fuzzy Type-II Datasets

One of the most tools of data mining techniques is decision trees for both learning and reasoning from the crisp dataset. In a case of fuzzy dataset, the fuzzy decision tree must be established to extracted fuzzy rules. The paper illustrates an approach to establish fuzzy type-I decision tree from fuzzy type-II dataset using the ambiguity measure in fuzzy type-II form.

Mohamed A. Elashiri, Ahmed T. Shawky, Abdulah S. Almahayreh
Aspect-Level Sentiment Analysis on Hotel Reviews

Sentimental analysis is a part of natural language processing which extracts and analyzes the opinions, sentiments, and emotions from written language. In today’s world, every organization always wants to know public and customer’s feedback about their products and also about their services that gives very important for business or organization about their product in the market and their services to perform better. Aspect-level sentiment analysis is one of the techniques which find and aggregate sentiment on entities mentioned within documents or aspects of them. This paper converts unstructured data into structural data by using scrappy and selection tool in Python, then Natural Language Tool Kit (NLTK) is used to tokenize and part-of-speech tagging. Next the reviews are broken into single-line sentence and identify the lists of aspects of each sentence. Finally, we have analyzed different aspects along with its scores calculated from a sentiment score algorithm, which we have collected from the hotel Web sites.

Nibedita Panigrahi, T. Asha
Node Grouping and Link Segregation in Circular Layout with Edge Bundling

Every industry is producing a huge amount of data today, which is analyzed and used for future predictions and making business decisions. Networked data can be analyzed node-link diagrams, which give different trends with different layouts to analyze network data. Many of these layouts have complex algorithms. Thus, construction of alternative layouts used is the topic of research for many organizations and industries. Many real-time examples require grouping of nodes, separation of links, simple layout, and abstract visuals of data. This paper proposes a technique which will tend to meet the above requirements of real data. The essence of this technique is the use of simple circular layout with node grouping and link segregation. View level abstraction is achieved with the concepts of edge bundling and node abstraction. Edge-bundling algorithm also reduces the clutter in the graph. Thus, above techniques will lead to viewing the networked data with new trends coming out by grouping nodes, link segregation, and compare data by focusing on different attributes of data at different levels of view (i.e. abstract and detailed).

Surbhi Dongaonkar, Vahida Attar
Fuzzy-Based Mobile Base Station Clustering Technique to Improve the Wireless Sensor Network Lifetime

A wireless sensor network is an emerging paradigm in the present era of computer communication technology. Sensor nodes are minute, lightweight, and autonomously distributed over the network; these nodes are not rechargeable. So energy consumption of the sensor node is a crucial constraint in the wireless sensor network. Sensor nodes are clustered to reduce the communication overhead. This paper proposes a new fuzzy-based mobile base station clustering technique. This technique uses fuzzy approach for the base station movement to decrease energy consumption of the sensor nodes and increases the lifetime of the network. Proposed work is implemented in the MATLAB software. Comparatively, it reduces the energy consumption of the sensor nodes.

R. Sunitha, J. Chandrika
Hydropower Generation Optimization and Forecasting Using PSO

Deriving optimal operation rules for maximizing the hydropower generation in a multi-purpose reservoir is relatively challenging among the various other purposes such as irrigation and flood control. This paper addresses the optimal functioning of a multi-purpose reservoir for improving hydropower generation. Efficient bio-inspired optimization techniques were proposed for hydropower optimization and hydrological variables forecasting. A particle swarm optimization (PSO)-based methodology is proposed for maximal hydropower generation through optimal reservoir release policies of Aliyar reservoir, located in Coimbatore district of TamilNadu state in India. The reservoir release is also optimized by Global Solver LINGO and compared with PSO, and it is explored that PSO-based model is powerful in hydropower maximization. To handle the uncertain behavior of hydrologic variables, artificial neural networks model is also applied for forecasting reservoir inflow and hydropower generation. The results obtained through the optimal reservoir release patterns suggested in this work have shown that the Aliyar Mini Hydel Power Station has a huge potential in generating considerably more hydropower than the actual generation observed from the power plant over the past years.

D. Kiruthiga, T. Amudha
Automatic Identification and Classification of Microaneurysms, Exudates and Blood Vessel for Early Diabetic Retinopathy Recognition

Diabetic retinopathy (DR) is vital concern that leads to blindness in adults around the world. In this paper, we proposed a system for early identification and classification of retinal fundus images as DR or non-DR. The ophthalmic features like blood vessels, microaneurysms and exudates are extracted and calculated by applying morphological of 2D median filter, multilevel histogram analysis and intensity transformation, respectively. The proposed system is executed on DIARETDB0 130 and DIARETDB1 89 fundus images dataset using artificial neural networks (ANNs). Result analysis is completed by calculating mean, variance, standard deviation, and correlation. We trained the proposed system model by multilayer perceptron with back-propagation, and system achieved sensitivity 0.83 and specificity 0.045 for DIARETDB0 and sensitivity 0.95 and specificity 0.2 for DIARETDB1.

Vaibhav V. Kamble, Rajendra D. Kokate
Performance Analysis of Tree-Based Approaches for Pattern Mining

Extracting meaningful patterns from databases has become a significant field of research for the data mining community. Researchers have skillfully taken up this task, contributing a range of frequent and rare pattern mining techniques. Literature subdivides the pattern mining techniques into two broad categories of level-wise and tree-based approaches. Studies illustrate that tree-based approaches outshine in terms of performance over the former ones at many instances. This paper aims to provide an empirical analysis of two well-known tree-based approaches in the field of frequent and rare pattern mining. Through this paper, an attempt has been made to let the researchers analyze the factors affecting the performance of the most widely accepted category of pattern mining techniques: the tree-based approaches.

Anindita Borah, Bhabesh Nath
Discovery of Variables Affecting Performance of Athlete Students Using Data Mining

Contemporary researches in stress performance analysis have given a lot of emphasis on the issues of college-going students. These studies discovered social, emotional, and financial conditions at large affect the academic performance of students. Similarly, the academic stress and sports performance have been associated with various factors belonging to personality attributes, cognitive competencies, concentration level, socioeconomic background, locality, etc. However, these were hidden and no attempts were made to discover them. In the underlined research work, these aspects were discovered using data mining techniques. We have devised out our own dataset for the work, and experimentations were carried out in SPSS platform.

Rahul Sarode, Aniket Muley, Parag Bhalchandra, Sinku Kumar Singh, Mahesh Joshi
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellular Automata

The CMOS-based integrated circuit may scale down to nanometer range. The primary challenge is to further downscale the device and high-energy dissipation. Reversible logic does not dissipate energy and no information loss. In this way, the state-of-the-art technology such as QCA was forced toward high-speed computing with negligible energy dissipation in the physical foreground. This work targets the design of non-restoring reversible divider circuit and its implementation in QCA. We have utilized few 2 × 2 FG and 4 × 4 HNG gates as the block construction and also show the QCA implementation having cost-efficient approach. Further, the divider circuit has synthesized with FG and HNG gates and QCA implementation. This divider circuit inherits many benefits such as fewer garbage outputs, reduce quantum cost are achieved, and also reduced QCA primitives can be improved by using efficient QCA layout scheme. Simulation investigations have been verified by QCA Designer. The proposed non-restoring divider also compares the reversible metrics results with some of other existing works.

Ritesh Singh, Neeraj Kumar Misra, Bandan Bhoi
Depth Estimation of Non-rigid Shapes Based on Fibonacci Population Degeneration Particle Swarm Optimization

In this paper, we address the problem of recovering the shape and motion parameters of non-rigid shape from the 2D observations considering orthographic projection camera model. This problem is nonlinear in nature and the gradient-based optimization algorithms may easily stick in local minima on the other hand and the generic model fitting may result inexact shape. We propose Fibonacci population degeneration particle swarm optimization (fpdPSO) algorithm and used to estimate the shape and motion. We report the shape estimation results on face and shark data set. Pearson Correlation Coefficient is used to measure the accuracy of depth estimation.

Kothapelli Punnam Chandar, Tirumala Satya Savithri
Connecting the Gap Between Formal and Informal Attributes Within Formal Learning with Data Mining Techniques

Formal and informal attributes are two distinct forms of learning which famed on the basis of the learning content, by where, when, and how learning happened. Formal attributes is a traditional learning which has official course work which should be completed in specified time. This study aimed at evaluating the challenges that students face while working for achieving good grades in exams. Data mining techniques are used to identify the challenges. The methods of collection working in this study were qualitative which involved testing and comparing.

Shivanshi Goel, A. Sai Sabitha, Abhay Bansal
Multiple Linear Regression-Based Prediction Model to Detect Hexavalent Chromium in Drinking Water

This paper discusses the dependency between various water quality parameters (WQPs), namely pH, TDS, and conductivity that are determined to estimate the presence of hexavalent chromium compounds in drinking water. Multiple linear regression (MLR)-based prediction model is proposed to estimate the above parameters. The changes in WQPs are analyzed under both instant and stable conditions. The deviation between the measured and the estimated WQP is computed and added as the correction factor in order to improve the detection accuracy.

K. Sri Dhivya Krishnan, P. T. V. Bhuvaneswari
Data Engineered Content Extraction Studies for Indian Web Pages

The recent innovations in the Internet and cellular communications have opened many interesting and exciting areas of social and research activity, and one of the basic driving forces for this is the Web page containing data in different forms. Data can be in mobile or Internet based and can be online or off-line and normally of sizes ranging from kilo to terabytes. In the Indian context, these can relate to computer-generated, printed, or archived data in different languages and dialects. The present study is focused on applying engineering aspects to data so that a smart set is used to generate content in a short period, so that further developments can be easier. After a brief overview on the complexities of Indian Web pages and current approaches in data mining, a basic pixel-based approach is developed along with data reduction and abstraction to be used with classification approaches for content extraction. During data reduction, engineering approach based on organizing and adapting for suitable inputs for classification is highlighted, and a case study is given here for analysis.

Bhanu Prakash Kolla, Arun Raja Raman
Steganography Using FCS Points Clustering and Kekre’s Transform

Steganography is the process of concealing one form of data within the same or another form of data. A cover medium is used to hide information within itself. Steganography is done using the Fuzzy C Strange Points Clustering Algorithm and the Kekre’s Transform in this paper. Fuzzy C Strange Points Clustering Algorithm is used to provide security and robustness as this algorithm is found to give a better quality of clusters. Kekre’s Transform is performed on the image, and the secret message is hidden in the LSBs of the transform coefficients. These two provide better hiding capacity and successful retrieval of the secret information.

Terence Johnson, Susmita Golatkar, Imtiaz Khan, Vaishakhi Pilankar, Nehash Bhobe
Anomaly Detection in Phonocardiogram Employing Deep Learning

Phonocardiogram (PCG) is the recording of heart sounds and murmurs. PCG compliments electrocardiogram in detection of heart diseases especially in the initial screenings due to its simplicity and low cost. Detecting abnormal heart sounds by algorithms is important for remote health monitoring and other scenarios where having an experienced physician is not possible. While several studies exist, we explore the possibility of detecting anomalies in heart sounds and murmurs using Deep-learning algorithms on well-known Physionet Dataset. We performed the experiments by employing various algorithms such as RNN, LSTM, GRU, B-RNN, B-LSTM and CNN. We achieved 80% accuracy in CNN 3 layer Deep learning model on the raw signals without performing any preprocessing methods. To our knowledge this is the highest reported accuracy that employs analyzing the raw PCG data.

V. G. Sujadevi, K. P. Soman, R. Vinayakumar, A. U. Prem Sankar
Secured Image Transmission Through Region-Based Steganography Using Chaotic Encryption

Information security is one of the challenging problems nowadays, and to solve this problem, a new algorithm which provides double layer security is proposed, by combining region-based steganography with chaotic encryption. Region-based steganography is the technique of protecting the hidden information in certain regions of interest, which is most efficient for data hiding. One of them is the edge region in image, which is most effective for data hiding. For encrypting data, we used CNN because of its random nature which is very challenging for hackers to know about the secret information. In the proposed work, the secret data are first encoded using CNN, and then, the scrambled message is embedded inside the edge region of cover image using matrix encoding scheme, which provides high security. The complete procedure is instigated in MATLAB, and the result analysis is given, which shows the strength of the technique.

Shreela Dash, M. N. Das, Mamatarani Das
Technical Analysis of CNN-Based Face Recognition System—A Study

Face recognition is the essential security system, which is subjected to get more scrutiny in recent years especially in the field of research and also in industry. This study addresses the various approaches for recognizing face back on neural network by adopting convolutional neural network (CNN). The study has done on different techniques of face alignment, preprocessing techniques, and also in the size of the face images. This paper explains the computational analysis of face recognition system and emphasizes the accuracies and constraints of the images. The predominant face alignment approaches used are Dlib and constrained local model (CLM). For training, Tan-Triggs preprocessing technique is used in face image size of 96 × 96 and 64 × 64. The face recognition grand challenge (FRGC) dataset is used for the analysis, and it produced the accuracy of range from 90 to 98.30% on corresponding approaches.

S. Sharma, Ananya Kalyanam, Sameera Shaik
Application of Search Group Algorithm for Automatic Generation Control of Interconnected Power System

A novel search group algorithm (SGA) technique with PID controller is proposed for an application toward multi-area interconnected power system-based automatic generation control (AGC). A reheat thermal power system over three unequal areas is considered. The system includes the nonlinearity parameters such as GRC and GDB. The supremacy of SGA tuned PID controller is projected with a comparative empirical result over recently published firefly algorithm (FA) optimized technique tuned PID controller for the similar multi-area power system which is interconnected. Simulation study confirms that the proposed SGA technique is better as compare to FA technique for the system.

Dillip Khamari, Rabindra Kumar Sahu, Sidhartha Panda
A Heuristic Comparison of Optimization Algorithms for the Trajectory Planning of a 4-axis SCARA Robot Manipulator

This chapter presents four heuristic optimization techniques to optimize and compare the results inverse kinematics (IK) solutions, firefly, bat, particle swarm optimization (PSO) and teaching–learning-based optimization (TLBO) algorithm has been used as the optimization method for the intended purpose. In order to execute the algorithms, an objective function as the Euclidean distance between two points in space has been assigned. For each method, the convergence of the optimal position trajectory towards a set target point has been shown. The best cost plots for all algorithms have been presented. The error in positions of the obtained trajectory in X, Y and Z directions for different methods is depicted. In order to compare the outcomes of IK solutions obtained by executing the algorithms, a four-DOF SCARA robot manipulator has been considered for illustration purpose.

Pradip Kumar Sahu, Gunji Balamurali, Golak Bihari Mahanta, Bibhuti Bhusan Biswal
A Computer-Aided Diagnosis System for Breast Cancer Using Deep Convolutional Neural Networks

The computer-aided diagnosis for breast cancer is coming more and more sought due to the exponential increase of performing mammograms. Particularly, diagnosis and classification of the mammary masses are of significant importance today. For this reason, numerous studies have been carried out in this field and many techniques have been suggested. This paper proposes a convolutional neural network (CNN) approach for automatic detection of breast cancer using the segmented data from digital database for screening mammography (DDSM). We develop a network with CNN architecture that avoids the extracting traditional handcrafted feature phase by processing the extraction of features and classification at one time within the same network of neurons. Therefore, it provides an automatic diagnosis without the user admission. The proposed method offers better classification rates, which allows a more secure diagnosis of breast cancer.

Nacer Eddine Benzebouchi, Nabiha Azizi, Khaled Ayadi
Indian Stock Market Prediction Using Machine Learning and Sentiment Analysis

Stock market is a very volatile in-deterministic system with vast number of factors influencing the direction of trend on varying scales and multiple layers. Efficient Market Hypothesis (EMH) states that the market is unbeatable. This makes predicting the uptrend or downtrend a very challenging task. This research aims to combine multiple existing techniques into a much more robust prediction model which can handle various scenarios in which investment can be beneficial. Existing techniques like sentiment analysis or neural network techniques can be too narrow in their approach and can lead to erroneous outcomes for varying scenarios. By combing both techniques, this prediction model can provide more accurate and flexible recommendations. Embedding technical indicators will guide the investor to minimize the risk and reap better returns.

Ashish Pathak, Nisha P. Shetty
Exploring the Average Information Parameters over Lung Cancer for Analysis and Diagnosis

Lung cancer seems to be a very common cause of death among the people all over the world. Hence, accurate detection of lung cancer increases the chance of survival of the people. The major problem with the treatment is the time constraint in several physical diagnoses that increases the death possibilities so basically this method is an approach to help the physicians to take more accurate decision in this regard. This paper comes up with a method which is based on average information statistical parameters using image processing for lung cancer analysis. The basic aim is to help the physicians to take decisions regarding possibilities of lung cancer. Image averaging is a digital image processing technique, which is mostly implemented to improve the quality of images that have been degraded by random noise. The average information parameters are among the statistical parameters that are implemented for lung cancer analysis, and hence, some of the parameters like Entropy, Standard Deviation, Mean, Variance, and MSE are considered in this paper. The selection of average information parameters is thoroughly based on the calculation of number of iterations carried over the lung images through the algorithm. This paper also successfully rejects null hypothesis test by implementing ANOVA. The images are microscopic lung images and the algorithm is implemented in MATLAB.

Vaishnaw G. Kale, Vandana B. Malode
A Hash-Based Approach for Document Retrieval by Utilizing Term Features

Digital data increase on servers with time which resulted in different researchers focusing on this field. Various issues are arising on the server such as data handling, security, maintenance, etc. In this paper, an approach for the document retrieval is proposed which efficiently fetches the document according to the query which is given by user. Here hash-based indexing of the dataset document was done by utilizing term features. In order to provide privacy for the terms, each of this is identified by a unique number and each document has its hash index key for identification. Experiment was done on real and artificial dataset. Results show that NDCG, precision, and recall parameter of the work are better as compared to previous work on different size of datasets.

Rajeev Kumar Gupta, Durga Patel, Ankit Bramhe
Transform Domain Mammogram Classification Using Optimum Multiresolution Wavelet Decomposition and Optimized Association Rule Mining

Author propose mammogram classification technique to classify the breast tissues as benign or malignant. Mammogram is segmented to obtain Region of Interest (ROI) and 2D DWT is obtained. GLCM feature matrix is generated for all the detailed coefficient of 2D DWT. Optimum Feature Decomposition Algorithm (OFDA) is used to discretize and optimize the features. Author propose Optimum Decomposition Selection Algorithm (ODSA) to select optimum decomposition from nine multiresolution wavelet decompositions of ROI using Euclidean distance between the feature matrixes. High-dimensional future space may degrade the performance of the classifier. Using propose algorithm, the size of feature matrix reduces to [N × F]T from [(N × 9) × F]T. Hence, dimension of search space reduces by approximately 90%. From the optimized feature vector and optimized decomposition, a signature feature vector matrix consisting of optimum decomposition and its optimum feature vector is generated to form transactional database. Association rules are generated using Apriori algorithm. These rules are optimized using multiobjective Genetic Algorithm with adaptive crossover and mutation. Mammogram is classified using Class Identification using Strength of Classification (CISCA) algorithm. The results are tested on two standard databases: MIAS and DDSM. It is noted that the propose scheme has advantage in terms of accuracy and computational complexity of the classifier.

Poonam Sonar, Udhav Bhosle
Noise Reduction in Electrocardiogram Signal Using Hybrid Methods of Empirical Mode Decomposition with Wavelet Transform and Non-local Means Algorithm

Electrocardiogram (ECG) signal helps the physicians in the detection of cardiac-related diseases. Many noises like power line interference (PLI), baseline wander, electromyography (EMG) noise and burst noise are contaminated with the raw signal and corrupt the shape of the waveform which makes the detection faulty. So in recent years, many signal processing methods are proposed for removal of these noise artifacts effectively. In this paper, two hybrid methods, i.e., empirical mode decomposition (EMD) with wavelet transform filtering and EMD with non-local means (NLM) are proposed. The results are analyzed with performance parameters like signal to noise ratio (SNR), mean square error (MSE), and percent root mean square difference (PRD). The results exhibit better performance in hybrid method of EMD with NLM technique.

Sarmila Garnaik, Nikhilesh Chandra Rout, Kabiraj Sethi
A Path-Oriented Test Data Generation Approach Hybridizing Genetic Algorithm and Artificial Immune System

Validating the correctness of software through a tool has started gaining a wide foothold in the business. A test data generator is one such tool which automatically generates the test data for software so as to attain maximum coverage. Researchers in the past have adopted different evolutionary algorithms to automatically generate a data set. One such often used procedure is Genetic Algorithm (GA). Due to certain flaws present in this approach, we have redefined the cause of concern for coverage in structural testing. In this paper, we have explored the properties of immune system along with GA. We have proposed a new hybrid algorithm—GeMune algorithm—inspired from these biological backdrops. Experimental results certify that the new algorithm has a better coverage compared to the use of only Genetic Algorithm for structural testing.

Gargi Bhattacharjee, Ashish Singh Saluja
To Enhance Web Response Time Using Agglomerative Clustering Technique for Web Navigation Recommendation

An organization needs to comprehend their customers and clients conduct, inclinations, and future needs which rely on their past conduct. Web usage mining is an intuitive research point in which customers and clients’ session grouping is done to comprehend the exercises. This exploration examines the issue of mining and breaking down incessant example and particularly centered around lessening the quantity of standards utilizing shut example procedure and it likewise diminish filters the span of the database utilizing agglomerative grouping strategy. In the present work, a novel technique for design mining is introduced to tackle the issue through profile-based closed sequential pattern mining utilizing agglomerative clustering (PCSPAC). In this research, the proposed method is an improved version of Weblog mining techniques and to the online navigational pattern forecasting. In the proposed approach, first, we store the Web data which is accessed by the user and then find the pattern. Items with the same pattern are merged and then the closed frequent set of Web pages is found. Main advantage of our approach is that when the user next time demands for the same item then it will search only partial database, not in whole data. There is no need to take input as number of clusters. Experimental results illustrate that proposed approach reduces the search time with more accuracy.

Shraddha Tiwari, Rajeev Kumar Gupta, Ramgopal Kashyap
Query-Optimized PageRank: A Novel Approach

This paper addresses a ranking model which uses the content of the documents along with their link structures to obtain an efficient ranking scheme. The proposed model combines the advantages of TF-IDF and PageRank algorithm. TF-IDF is a term-weighting scheme that is widely used to evaluate the importance of a term in a document by converting textual representation of information into a vector space model. The PageRank algorithm uses hyperlink (links between documents) to determine the importance of a Web document in the corpus. Combining the relevance of documents with their PageRanks will refine the retrieval results. The idea is to update the link structure based on the document similarity score with the user query. Results obtained from the experiment indicate that the performance of the proposed ranking technique is promising and thus can be considered as a new direction in ranking the documents.

Rajendra Kumar Roul, Jajati Keshari Sahoo, Kushagr Arora
Effects of Social Media on Social, Mental, and Physical Health Traits of Youngsters

The widespread use of social media has revolutionized the mode of communication in today’s world. It has become an unparalleled mode of interaction around the globe. With such deep penetration of this technology in people’s life, it has brought about many changes in their lifestyle including their health. The concept of health in today’s world not only refers to the physical fitness but also encompasses the mental and social well-being. With the advent of technology, most individuals are exposed to the Internet and social networking and spend a major portion of their time using these for performing one or the other activities. The constant and excessive use of social media has affected human life in multiple ways. This paper lucidly examines the impacts of social media on the physical, mental, and social health aspects of youngsters.

Gautami Tripathi, Mohd Abdul Ahad
Document Labeling Using Source-LDA Combined with Correlation Matrix

Topic modeling is one of the most applied and active research areas in the domain of information retrieval. Topic modeling has become increasingly important due to the large and varied amount of data produced every second. In this paper, we try to exploit two major drawbacks (topic independence and unsupervised learning) of latent Dirichlet allocation (LDA). To remove the first drawback, we use Wikipedia as a knowledge source to make a semi-supervised model (Source-LDA) for generating predefined topic-word distribution. The second drawback is removed using a correlation matrix containing cosine-similarity measure of all the topics. The reason for using a semi-supervised LDA instead of a supervised model is not to overfit the data for new labels. Experimental results show that the performance of Source-LDA combine with correlation matrix is better than the traditional LDA and Source-LDA.

Rajendra Kumar Roul, Jajati Keshari Sahoo
Diffusion Least Mean Square Algorithm for Identification of IIR System Present in Each Node of a Wireless Sensor Networks

Most of the real-world practical systems are inherently dynamic and their characteristics are represented by transfer functions which are IIR in nature. In literature-distributed estimation, algorithms have been developed for stable FIR system. In this paper, a distributed estimation technique is developed for identification of IIR system present at each node of a wireless sensor network. The distributed parameter estimation generally based on two modes of cooperation strategies: Incremental and Diffusion. In case of change in network topology, the diffusion mode of cooperation works well and shows robustness to link and node failure. Thus, an infinite impulse response diffusion least mean square (IIR DLMS) algorithm is introduced. In simulation, its performance is compared with the incremental version (infinite impulse response incremental least mean square algorithm (IIR ILMS)). Superior performance by the proposed approach is reported for parameter estimation of two IIR systems under various noisy environments.

Km Dimple, Dinesh Kumar Kotary, Satyasai Jagannath Nanda
Comparative Evaluation of Various Feature Weighting Methods on Movie Reviews

Sentiment analysis is a method of extracting subjective information from customer reviews. The analysis helps to reveal the consumer insights about the product, a theme, or a service. In the existing literature, various methods such as BoW and TF-IDF are employed for sentiment analysis and deep learning methods are not explored much. We made an attempt to apply Word2Vec feature weighting method for this problem. We carried out various experiments for sentiment analysis on a large dataset IMDB that contains movie review. We compared various feature weighting methods and analyzed using different classifiers, and the best combination was determined. From the experimental results, we conclude that Word2Vec with SGD is the best combination for sentiment classification problem on IMDB dataset. The result shown in the paper can be used as a base for future exploration of opinioned value on any textual data.

S. Sivakumar, R. Rajalakshmi
Dynamic ELD with Valve-Point Effects Using Biogeography-Based Optimization Algorithm

The paper focuses biogeography-based optimization (BBO) algorithm for solving the dynamic economic load dispatch problem (DELDP) of dispatchable units considering valve-point loading effects. BBO is a new biogeography-inspired optimization algorithm. Mathematical equations describe creation of species, movements from one habitat (island) to another and to get extinct. This algorithm follows two steps such as migration and mutation. The proposed method calculates economical schedule of units to satisfy load demand and ramp rate limits during operation to minimize total production cost. BBO search technique determines the global optimum dispatch solution. Various constraints like load balance constraints, operating limits, valve-point loading, ramp rate, and network loss coefficient are incorporated. The nonlinear nature of the generators in test system is considered to illustrate the effectiveness of the proposed method. The robustness of the methodology has been validated by demonstrating and comparing with previously developed techniques discussed in the literature.

A. K. Barisal, Soudamini Behera, D. K. Lal
A Survey on Teaching–Learning-Based Optimization Algorithm: Short Journey from 2011 to 2017

Since the early days of optimization, basically there are two famous optimization methods such as evolutionary-based and swarm intelligence-based algorithms. These two algorithms are population-based metaheuristics and are used to solve many of the real-world complex computing problems. However, recent research of some of the multi-objective optimization algorithms reveals that those earlier developed metaheuristics are unable to solve the multi-dimensional problems due to their pitfalls such as adjustment of controlling parameters, probabilistic nature, own algorithmic-dependent parameters. Looking into such scenario, in 2011 a new population-based metaheuristic was developed by R.V. Rao called teaching–learning-based optimization (TLBO) algorithm. Since its inception, the applicability of TLBO has crossed many milestones as compared to other recently developed metaheuristics for its use in diversified problem domains of engineering. In this paper, a survey is conducted on TLBO and its variants along with the discussion on its range of applications from 2011 to 2017.

Janmenjoy Nayak, Bighnaraj Naik, G. T. Chandrasekhar, H. S. Behera
Predicting Users’ Preferences for Movie Recommender System Using Restricted Boltzmann Machine

Recommender system is one of the most important crucial parts for e-commerce domains, enabling them to produce correct recommendations to individual users. Collaborative filtering is considered as the successful technique for recommender system that takes rating scores to find most similar users/items for recommending items. In this work, in order to exploit user rating information, a model has been developed that uses Restricted Boltzmann Machine (RBM) to learn deeply and predict the ratings or preferences which are missed. The experiment is done on MovieLens benchmark dataset that compares with Pearson correlation and average prediction-type algorithms. Experimental result exhibits the performance of RBM to predict users’ preferences.

Dayal Kumar Behera, Madhabananda Das, Subhra Swetanisha
Comparative Analysis of DTC Induction Motor Drives with Firefly, Harmony Search, and Ant Colony Algorithms

In industries, induction motor drives are more popular due to their brushless structure, low cost, low maintenance, and robust performance. Direct Torque Control (DTC) drive performance gained more importance due to its fast dynamic response and simple control structure. In order to enhance the performance of the speed control of DTC drives, this paper implements different optimization techniques to tune the speed PI controller. The simulation is carried out using MATLAB, and the results of genetic algorithm, ant colony, harmony search, and firefly optimization process are compared for different speeds of the DTC drive, with respect to peak overshoot and settling time.

Naveen Goel, R. N. Patel, Saji Chacko
DNA Gene Expression Analysis on Diffuse Large B-Cell Lymphoma (DLBCL) Based on Filter Selection Method with Supervised Classification Method

The exponential growth of DNA dataset in the scientific repository has been encouraging interdisciplinary research on ecology, computer science, and bioinformatics. For better classification of cancer (DNA gene expression), many technologies are useful as demonstrated by a prior experimental study. The major challenging task of gene selection method is extracting informative genes contribution in the classification from the DNA microarray datasets at low computational cost. In this paper, amalgamation of Spearman’s correlation (SC) and filter-based feature selection (FS) methods is proposed. We demonstrate the extensive comparison of the effect of Spearman’s correlation with FS methods, i.e., Relief-F, Joint Mutual Information (JMI), and max-relevance and min-redundancy (MRMR). To measure the classification performance, four diverse supervised classifiers, i.e., K-nearest neighbor (K-NN), support vector machines (SVM), naïve Bayes (NB), and decision tree (DT), have been used on DLBCL dataset. The result demonstrates that Spearman’s correlation in conglomeration with MRMR performs better than other combinations.

Alok Kumar Shukla, Pradeep Singh, Manu Vardhan
Categorizing Text Data Using Deep Learning: A Novel Approach

With large number of Internet users on the Web, there is a need to improve the working principle of text classification, which is an important and well-studied area of machine learning. Hence, in order to work with the text data and to increase the efficiency of the classifier, choice of quality features is of paramount importance. This study emphasizes two important aspects of text classification: proposes a new feature selection technique named Combined Cohesion Separation and Silhouette Coefficient (CCSS) to find the feature set which gathers the crux of the terms in the corpus without deteriorating the outcome in the construction process and then discusses the underlying architecture and importance of deep learning in text classification. To carry out the experimental work, four benchmark datasets are used. The empirical results of the proposed approach using deep learning are more promising compared to the other established classifiers.

Rajendra Kumar Roul, Sanjay Kumar Sahay
An Approach to Detect Patterns (Sub-graphs) with Edge Weight in Graph Using Graph Mining Techniques

The task of detecting pattern or sub-graph in a large graph has applications in large areas such as biology, computer vision, computer-aided design, electronics, intelligence analysis, and social networks. So work on graph-based pattern detection has a wide range of research fields. Since the characteristics and application requirements of graph vary, graph-based detection is not the only problem, but it is a set of graph-related problems. This paper proposes a new approach for detection of sub-graph or pattern from a weighted graph with edge weight detection method using graph mining techniques. The edge detection method is proposed since most of the graphs are weighted one. Hence this paper proposes an algorithm named EdWePat for detection of patterns or sub-graphs with edge weight detection rather node value.

Bapuji Rao, Sarojananda Mishra
Comparative Performance Analysis of Adaptive Tuned PID Controller for Multi-machine Power System Network

In this paper, Genetic Algorithm (GA) and Bacterial Foraging Optimization (BFO) are two optimization algorithms used for tuning of Proportional–Integral–Derivative (PID) controller parameters in designing a multi-machine power system network. These are popular evolutionary algorithms which are generally used for tuning of PID controllers. The proposed approach is easy for implementation as well as it has superior features. The computational techniques enhance the performance of the system, and the convergence characteristics obtained are also stable. The system scheduling BFO-PID and GA-PID controller is modeled using MATLAB platform. When a comparison in the performance of optimal PIDs with conventional PID controller is carried out, the performance of BFO-PID and GA-PID controller is better as it improves the speed, loop response stability, and steady-state error; the rise time is also minimized. The results after simulation show that the controller developed using the BFO algorithm achieves a faster response as compared to GA.

Mahesh Singh, Aparajita Agrawal, Shimpy Ralhan, Rajkumar Jhapte
A Competent Algorithm for Enhancing Low-Quality Finger Vein Images Using Fuzzy Theory

Soft computing methods and the fuzzy theoretic approaches, in particular, are widely known for their ability to tackle the uncertainties and vagueness that exist in image processing problems. This paper puts forward a distinctive enhancement algorithm for finger vein biometric images in which interval type-2 fuzzy sets are used. Finger vein biometrics is one of the latest reliable biometric systems that make use of the uniqueness of the finger vein patterns of individuals. Low contrast, blur, or noise often result in the lower quality of the captured finger vein images. For efficient enhancement of the finger vein images, interval type-2 fuzzy set is presented in this work and Einstein T-conorm is suggested for type reduction by combining the upper and lower membership functions. The performance assessment of the proposed algorithm is done by estimating the linear index of fuzziness and entropy. The experiments are performed using different vein pattern images, and the outcomes are analyzed by comparing with the existing methods. The performance evaluation visibly exhibits the efficiency of the recommended method in comparison with the existing methods.

Rose Bindu Joseph, Devarasan Ezhilmaran
An Adaptive Fuzzy Filter-Based Hybrid ARIMA-HONN Model for Time Series Forecasting

In this paper, linear ARIMA and nonlinear HONN especially pi-sigma neural network (PSNN) models are integrated to develop a hybrid ARIMA-HONN model for time series forecasting. Assuming the time series to be a sum of low-volatile and high-volatile components, the time series is first decomposed into constituent components by using an adaptive fuzzy filter. Then, the low-volatile and high-volatile components are modeled by using ARIMA and HONN models, respectively. The final prediction is obtained by combining the ARIMA predictions with HONN predictions. Using benchmark real-world datasets such as lynx, sunspot, temperature, passenger, and unemployment, the proposed ARIMA-HONN, ETS, ARIMA, ANN and two existing hybrid ARIMA-ANN models were simulated. Simulation results indicated the superiority of proposed model as compared to its counterparts for the datasets used.

Sibarama Panigrahi, H. S. Behera
An Evolutionary Algorithm-Based Text Categorization Technique

In general, most of the organizations generate unstructured data from which extraction of meaningful information becomes a difficult task. Preprocessing of unstructured data before mining helps to improve the efficiency of the mining algorithms. In this paper, text data is initially preprocessed using tokenization, stop word removal, and stemming operations and a bag-of-words is identified to characterize the text dataset. Next, improved strength pareto evolutionary algorithm-based genetic algorithm is applied to determine the more compact set of informative words for clustering of text documents efficiently. It is a bi-objective genetic algorithm used to approximate the pareto-optimal front exploring the search space for optimal solution. The external clustering index and number of words described in the documents are considered as two objective functions of the algorithm, and based on these functions chromosomes in the population are evaluated and the best chromosome in non-dominated pareto front of final population gives the optimal set of words sufficient for categorizartion of text dataset.

Ajit Kumar Das, Asit Kumar Das, Apurba Sarkar
Short-Term Load Forecasting Using Genetic Algorithm

Electrical power load forecasting has at all conditions been a basic subject in the energy trade. Load forecasting requires relative learning, reminiscent of neighborhood climate, and past load request information. The precision of load anticipating needs a huge impact on a power organization’s system and making cost. Review load forecasting is along these lines essential, especially with the progressions happening inside the utility business in light of deregulation and dispute. A few outmoded approaches, for example, regression model, time approach model and pro framework have been proposed for without a moment’s hesitation stack deciding by various levels of accomplishment. In this paper, ANN arranged through back development in the mix with the genetic algorithm is utilized. In back spread, the weights of neuron change as indicated by the edge plunge which may look out for close-by minima, so genetic algorithm is executed with backpropagation.

Papia Ray, Saroj Kumar Panda, Debani Prasad Mishra
A Dynamic Bottle Inspection Structure

In our market, most of the products are available in jars or bottles. So in view of maintaining proper specification of a particular bottle, the same should be properly investigated. The proposed bottle inspection has been concentrated through an artificial intelligent (AI) model and the performance of the said also evaluated. For this analysis, about 5000 bottle models are taken and their different properties have been considered for meeting large information to and from a data set, out of which they are categorized into two classes like defect-free and defective bottles. For analysis, an artificial intelligent scheme has been followed along with vision builder simulation tool which is carried out with a core i3 processor.

Santosh Kumar Sahoo, M. Mahesh Sharma, B. B. Choudhury
Feature Selection-Based Clustering on Micro-blogging Data

The growing popularity of micro-blogging phenomena opens up a flexible platform for the public as communication media for the public. For any trending/non-trending topic, thousands of post are posted daily in micro-blogs. During any important event, such as natural calamity and election, and sports event, such as IPL and World Cup, a huge number of messages (micro-blogs) are posted. Due to fast and huge exchange of messages causes information overload, hence clustering or grouping similar messages is an effective way to reduce that. Less content and noisy nature of messages are challenging factor in micro-blog data clustering. Incremental huge data is another challenge to clustering. So, in this work, a novel clustering approach is proposed for micro-blogs combining feature selection technique. The proposed approach has been applied to several experimental dataset, and it is compared with several existing clustering techniques which results in better outcome than other methods.

Soumi Dutta, Sujata Ghatak, Asit Kumar Das, Manan Gupta, Sayantika Dasgupta
Backmatter
Metadaten
Titel
Computational Intelligence in Data Mining
herausgegeben von
Prof. Dr. Himansu Sekhar Behera
Dr. Janmenjoy Nayak
Dr. Bighnaraj Naik
Prof. Dr. Ajith Abraham
Copyright-Jahr
2019
Verlag
Springer Singapore
Electronic ISBN
978-981-10-8055-5
Print ISBN
978-981-10-8054-8
DOI
https://doi.org/10.1007/978-981-10-8055-5