Skip to main content
Top

2017 | Book | 1. edition

Recent Advances on Soft Computing and Data Mining

The Second International Conference on Soft Computing and Data Mining (SCDM-2016), Bandung, Indonesia, August 18-20, 2016 Proceedings

insite
SEARCH

About this book

This book provides a comprehensive introduction and practical look at the concepts and techniques readers need to get the most out of their data in real-world, large-scale data mining projects. It also guides readers through the data-analytic thinking necessary for extracting useful knowledge and business value from the data.
The book is based on the Soft Computing and Data Mining (SCDM-16) conference, which was held in Bandung, Indonesia on August 18th–20th 2016 to discuss the state of the art in soft computing techniques, and offer participants sufficient knowledge to tackle a wide range of complex systems. The scope of the conference is reflected in the book, which presents a balance of soft computing techniques and data mining approaches. The two constituents are introduced to the reader systematically and brought together using different combinations of applications and practices. It offers engineers, data analysts, practitioners, scientists and managers the insights into the concepts, tools and techniques employed, and as such enables them to better understand the design choice and options of soft computing techniques and data mining approaches that are necessary to thrive in this data-driven ecosystem.

Table of Contents

Frontmatter
Erratum to: Text Detection in Low Resolution Scene Images Using Convolutional Neural Network
Anhar Risnumawan, Indra Adji Sulistijono, Jemal Abawajy

Soft Computing

Frontmatter
Cluster Validation Analysis on Attribute Relative of Soft-Set Theory

Data clustering on categorical data pose a difficult challenge since there are no-inherent distance measures between data values. One of the approaches that can be used is by introducing a series of clustering attributes in the categorical data. By this approach, Maximum Total Attribute Relative (MTAR) technique that is based on the attribute relative of soft-set theory has been proposed and proved has better execution time as compared to other equivalent techniques that used the same approach. In this paper, the cluster validity analysis on the technique is explained and discussed. In this analysis, the validity of the clusters produced by MTAR technique is evaluated by the entropy measure using two standards dataset: Soybean (Small) and Zoo from University California at Irvine (UCI) repository. Results show that the clusters produce by MTAR technique have better entropy and improved the clusters validity up to 33%.

Rabiei Mamat, Ahmad Shukri Mohd Noor, Tutut Herawan, Mustafa Mat Deris
Optimizing Weights in Elman Recurrent Neural Networks with Wolf Search Algorithm

This paper presents a Metahybrid algorithm that consists of the dual combination of Wolf Search (WS) and Elman Recurrent Neural Network (ERNN). ERNN is one of the most efficient feed forward neural network learning algorithm. Since ERNN uses gradient descent technique during the training process; therefore, it is not devoid of local minima and slow convergence problem. This paper used a new metaheuristic search algorithm, called wolf search (WS) based on wolf’s predatory behavior to train the weights in ERNN to achieve faster convergence and to avoid the local minima. The performance of the proposed Metahybrid Wolf Search Elman Recurrent Neural Network (WRNN) is compared with Bat with back propagation (Bat-BP) algorithm and other hybrid variants on benchmark classification datasets. The simulation results show that the proposed Metahybrid WRNN algorithm has better performance in terms of CPU time, accuracy and MSE than the other algorithms.

Nazri Mohd. Nawi, M. Z. Rehman, Norhamreeza Abdul Hamid, Abdullah Khan, Rashid Naseem, Jamal Uddin
Optimization of ANFIS Using Artificial Bee Colony Algorithm for Classification of Malaysian SMEs

Adaptive Neuro-Fuzzy Inference System (ANFIS) has been widely applied in industry as well as scientific problems. This is due to its ability to approximate every plant with proper number of rules. However, surge in auto-generated rules, as the inputs increase, adds up to complexity and computational cost of the network. Therefore, optimization is required by pruning the weak rules while, at the same time, achieving maximum accuracy. Moreover, it is important to note that over-reducing rules may result in loss of accuracy. Artificial Bee Colony (ABC) is widely applied swarm-based technique for searching optimum solutions as it uses few setting parameters. This research explores the applicability of ABC algorithm to ANFIS optimization. For the practical implementation, classification of Malaysian SMEs is performed. For validation, the performance of ABC is compared with one of the popular optimization techniques Particle Swarm Optimization (PSO) and recently developed Mine Blast Algorithm (MBA). The evaluation metrics include number of rules in the optimized rule-base, accuracy, and number of iterations to converge. Results indicate that ABC needs improvement in exploration strategy in order to avoid trap in local minima. However, the application of any efficient metaheuristic with the modified two-pass ANFIS learning algorithm will provide researchers with an approach to effectively optimize ANFIS when the number of inputs increase significantly.

Mohd. Najib Mohd. Salleh, Kashif Hussain, Rashid Naseem, Jamal Uddin
Forecasting of Malaysian Oil Production and Oil Consumption Using Fuzzy Time Series

Many statistical models have been implemented in the energy sectors, especially in the oil production and oil consumption. However, these models required some assumptions regarding the data size and the normality of data set. These assumptions give impact to the forecasting accuracy. In this paper, the fuzzy time series (FTS) model is suggested to solve both problems, with no assumption be considered. The forecasting accuracy is improved through modification of the interval numbers of data set. The yearly oil production and oil consumption of Malaysia from 1965 to 2012 are examined in evaluating the performance of FTS and regression time series (RTS) models, respectively. The result indicates that FTS model is better than RTS model in terms of the forecasting accuracy.

Riswan Efendi, Mustafa Mat Deris
A Fuzzy TOPSIS with Z-Numbers Approach for Evaluation on Accident at the Construction Site

The construction industry has been identified as one of the most risky industries where involves fatalities accidents. Identifying the causes that lead to the accidents implicates a lot of uncertain and imprecise cases. Z-numbers involve more uncertainties than Fuzzy Sets (FSs). They provide us with additional degree of freedom to represent the uncertainty and fuzziness of the real situations. In this paper, we introduce a Fuzzy TOPSIS (FTOPSIS) with Z-numbers to handle uncertainty in the construction problems. Five criteria and six alternatives are used to evaluate the causes of workers’ accident at the construction sites. Data in form of linguistic variables were collected from three authorised personnel of three agencies. From the analysis, it shows that the FTOPSIS with Z-numbers provides us with an another useful way to handle Fuzzy Multi-Criteria Decision Making (FMCDM) problems in a more intelligent and flexible manner due to the fact that it uses Z-numbers with FTOPSIS.

Nurnadiah Zamri, Fadhilah Ahmad, Ahmad Nazari Mohd Rose, Mokhairi Makhtar
Formation Control Optimization for Odor Localization

This paper presents a swarm robots formation control by using a new hybrid algorithm Fuzzy-Kohonen Networks and Particle Swarm Optimization (FKN-PSO). The FKN-PSO approach is proposed, to overcome the formation control problem due to the loss of a source of odor, caused by the failure in sensor detection, fail in robot motion control and environmental uncertainty in odor localization. The experiments are conducted by using simple swarm robots in the real environment with on-board sensor and processor. The results are compared between FKN-PSO and Fuzzy-PSO to look at the performance of the swarm robots in the process of odor localization. As the results found that the propose algorithm produce fast response and efficiently process than Fuzzy-PSO, they are able to locate the source of odor in a short time and capable for keeping in formation to find the target.

Bambang Tutuko, Siti Nurmaini, Rendyansyah, P. P. Aditya, Saparudin
A New Search Direction for Broyden’s Family Method in Solving Unconstrained Optimization Problems

The conjugate gradient method plays an important role in solving large scale problems and the quasi-Newton method is known as the most efficient method in solving unconstrained optimization problems. Hence, in this paper, we proposed a new hybrid method between conjugate gradient method and quasi-Newton method known as the CG-Broyden method. Then, the new hybrid method is compared with the quasi-Newton methods in terms of the number of iterations and CPU-time using Matlabin Windows 10 which has 4 GB RAM and running using an Intel ® Core ™ i5. Furthermore, the performance profile graphic is used to show the effectiveness of the new hybrid method.. Our numerical analysis provides strong evidence that our CG-Broyden method is more efficient than the ordinary Broyden method Besides, we also prove that the new algorithm is globally convergent.

Mohd Asrul Hery Ibrahim, Zailani Abdullah, Mohd Ashlyzan Razik, Tutut Herawan
Improved Functional Link Neural Network Learning Using Modified Bee-Firefly Algorithm for Classification Task

Functional Link Neural Network (FLNN) has been becoming as an important tool used in many applications task particularly in solving a non-linear separable problems. This is due to its modest architecture which required less tunable weights for training as compared to the standard multilayer feed forward network. The most common learning scheme for training the FLNN is a Backpropagation (BP-learning) algorithm. However, learning method by BP-learning algorithm tend to easily get trapped in local minima especially when dealing with non-linearly separable classification problems which affect the performance of FLNN. This paper discussed the implementation of modified Artificial Bee Colony with Firefly algorithm for training the FLNN network to overcome the drawback of BP-learning scheme. The aim is to introduce an alternative learning scheme that can provide a better solution for training the FLNN network for classification task.

Yana Mazwin Mohmad Hassim, Rozaida Ghazali, Noorhaniza Wahid
Artificial Neural Network with Hyperbolic Tangent Activation Function to Improve the Accuracy of COCOMO II Model

In software engineering, Constructive Cost Model II (COCOMO II) is one of the most cited, famous and widely used model to estimate and predict some important features of the software project such as effort, cost, time and manpower estimations. Lately, researchers incorporate it with soft computing techniques to solve and reduce the ambiguity and uncertainty of its software attributes. In this paper, Artificial Neural Network (ANN) with Hyperbolic Tangent Activation Function is used to improve the accuracy of the COCOMO II model and the backpropagation learning algorithm used in the training process. In the experiment, COCOMO II SDR dataset is used for training and testing the model. The result shows that eight out of twelve projects have a closer effort value of actual effort. It shows that the proposed model produces better performance comparing to sigmodal function.

Sarah Abdulkarem Alshalif, Noraini Ibrahim, Tutut Herawan
A Study of Data Imputation Using Fuzzy C-Means with Particle Swarm Optimization

An imputation method involving Fuzzy C-Means (FCM) with Particle Swarm Optimization (PSO) is implemented. The FCM is applied to identify the similar records in the complete dataset. Then, the records are optimized using PSO based on information from incomplete dataset. To evaluate the proposed method, experimental test are conducted using three datasets which is Cleveland Heart Disease, Iris and Breast Cancer dataset to verify the proposed method. Root Mean Square Error (RMSE) results of three different datasets are compared with seven different ratios of missing data. The results show the proposed approach can be used to the existing ones for imputation.

Nurul Ashikin Samat, Mohd Najib Mohd Salleh
Utilizing Clonal Selection Theory Inspired Algorithms and K-Means Clustering for Predicting OPEC Carbon Dioxide Emissions from Petroleum Consumption

The prediction of carbon dioxide (CO2) emissions from petroleum consumption inspired and motivated this research. Over the years, the rate of emissions of CO2 continues to multiply, resulting in global warming. This paper thus proposes the use of clonal selection theory inspired algorithms; CLONALG and AIRS to forecast global CO2 emissions. The K-means algorithm divides the data into groups of similar and meaningful patterns. Comparative simulations with multi-layer Perceptron, IBk, fuzzy-rough nearest neighbor, and vaguely quantified nearest neighbor reveal that the CLONALG and AIRS produced outstanding results, and are able to generate highest detection rates and lowest false alarm rates. As such, gathering useful information with the accurate prediction of CO2 emissions can help to reduce the emission of CO2 contributions to global warming which assist in policies on climate change.

Ayodele Lasisi, Rozaida Ghazali, Haruna Chiroma
One-Way ANOVA Model with Fuzzy Data for Consumer Demand

This paper presents a statistical method which could distinguish the customer’s demand into different type whereby fuzzy data is in consideration. A one-way analysis of variance (ANOVA) model for fuzzy data is introduced with hypothesis test, $$ F $$F-test, which is the pivot statistic in ANOVA model. In the experiment, several different factors in testing with one-way ANOVA model are considered. The results of this study indicate that the solution method introduced in this paper could give decision maker a result with favorable degree of each factor. This kind of result is beneficial to the decision maker and retailer to distinguish which factor is the most critical for the customer and with how much amount of products would be allocated for customers.

Pei Chun Lin, Nureize Arbaiy, Isredza Rahmi Abd. Hamid
Chicken S-BP: An Efficient Chicken Swarm Based Back-Propagation Algorithm

An innovative metaheuristic based algorithm Chicken Swarm Optimization (CSO) is inspired by characteristics of chicken flock. CSO is particularly suitable for the investigation in candidate solutions for large spaces. This paper hybridize the CSO algorithm with the Back Propagation (BP) algorithm to solve the local minimum problem and to enhance convergence to global minimum in BP algorithm. The proposed Chicken Swarm Back Propagation (Chicken S-BP) is compared with the Artificial Bee Colony Back-Propagation (ABCBP), Genetic Algorithm Neural Network (GANN) and traditional BPNN algorithms. In particular Iris, Australian Credit Card, and 7-Bit Party classification datasets are used in training and testing the performance of the Chicken S-BP hybrid network. Results of simulation illustrates that Chicken S-BP algorithm efficiently prevents local minima and provides optimal solution.

Abdullah Khan, Nazri Mohd Nawi, Rahmat Shah, Nasreen Akhter, Atta Ullah, M. Z. Rehman, Norhamreeza AbdulHamid, Haruna Chiroma
A Review on Violence Video Classification Using Convolutional Neural Networks

The volatile growth of social media content on the Internet is revolutionizing content distribution and social interaction. Social media exploded as a category of online discourse where people create content, share it, bookmark it and network it at prodigious rate. Examples comprise Facebook, MySpace, Youtube, Instagram, Digg, Twitter, Snapchat and others. Since it is easy to reach, use and the high velocity of spreading information among users. The internet as it is at present is made up of a vast array of protocols and networks where traffickers can anonymously share large volumes of illegal material amongst each other from locations with relaxed or non-existent laws that prohibit the possession or trafficking of illegal material. In this paper, a review of applications of deep networks techniques has been presented. Hence, the existing literature suggests that we do not lose sight of the current and future potential of applications of deep network techniques. Thus, there is a high potential for the use of Convolutional Neural Networks (CNN) for violence video classification, which has not been fully investigated and would be one of the interesting directions for future research in video classification.

Ashikin Ali, Norhalina Senan
Modified Backpropagation Algorithm for Polycystic Ovary Syndrome Detection Based on Ultrasound Images

Polycystic Ovary Syndrome (PCOS) is an endocrine abnormality that occurred in the female reproductive cycle. In general, the approaches to detect PCO follicles are (1) stereology and (2) feature extraction and classification. In Stereology, two-dimensional images are viewed as projections of three-dimensional objects. In this paper, we use the second approach, namely Gabor Wavelet as a feature extractor and a modified backpropagation as a classifier. The modification of backpropagation algorithm which is proposed, namely Levenberg - Marquardt optimization and Conjugate Gradient - Fletcher Reeves to improve the convergence rate. Levenberg - Marquardt optimization produce the higher accuracy than Conjugate Gradient - Fletcher Reeves, but it has a drawback of running time. The best accuracy of Levenberg - Marquardt is 93.925% which is gained from 33 neurons and 16 vector feature and Conjugate Gradient - Fletcher Reeves is 87.85% from 13 neurons and 16 vector feature.

Untari N. Wisesty, Jondri Nasri, Adiwijaya
An Implementation of Local Regression Smoothing on Evolving Fuzzy Algorithm for Planting Calendar Forecasting Based on Rainfall

The agricultural sector has an important role in the Indonesian economy. Agriculture provides a national food stocks, especially rice as a staple food of Indonesian people. Weather conditions, especially rainfall, severely affected when the right time to start planting. This is very important because this will affect the productivity of farmers. Therefore, rainfall forecasting system is required to create a calendar season, especially rice plant. In this paper, we propose a rainfall forecasting system based on Fuzzy which is optimized using Genetic Algorithms. Data preprocessing is handled by using Local Regression Smoothing for handling of fluctuating data. This paper implements the Local Regression Smoothing on Evolving Fuzzy algorithm with monthly rainfall data. Based on the accuracy of more than 80%, the result of next months rainfall forecasting could be used in the making of rice plant planting calendar in the Bandung regency with 3 periods of planting season, which are from November to February, from December to March, and from January to April given that a control of water needs in surplus of rainfall, and added water needs in rainfall deficiency.

Arizal Akbar Rahma Saputro, Fhira Nhita, Adiwijaya
Chebyshev Multilayer Perceptron Neural Network with Levenberg Marquardt-Back Propagation Learning for Classification Tasks

Artificial neural network has been proved among the best tools in data mining for classification tasks. Multilayer perceptron (MLP) neural network commonly used due to the fast convergence and easy implementation. Meanwhile, it fails to tackle higher dimensional problems. In this paper, Chebyshev multilayer perceptron neural network with Levenberg Marquardt back propagation learning is presented for classification task. Here, Chebyshev orthogonal polynomial is used as functional expansion for solution of higher dimension problems. Four benchmarked datasets for classification are collected from UCI repository. The computational results are compared with MLP trained by different training algorithms namely, Gradient Descent back propagation (MLP-GD), Levenberg Marquardt back propagation (MLP-LM), Gradient Descent back propagation with momentum (MLP-GDM), and Gradient Descent with momentum and adaptive learning rate (MLP-GDX). The findings show that, proposed model outperforms all compared methods in terms of accuracy, precision and sensitivity.

Umer Iqbal, Rozaida Ghazali
Computing the Metric Dimension of Hypercube Graphs by Particle Swarm Optimization Algorithms

In this paper, we present a PSO (Particle Swarm Optimization) algorithm for determining the metric dimension of graphs. We choose PSO because of its simplicity, robustness, and adaptability for various optimization problems [5]. Our PSO uses the binary valued vector for particles. The binary valued vector is used to represent which one of vertices of a graph is belong to resolving set. The feasibility is enforced by repairing particles. We tested our PSO by computing the metric dimension of hypercube graphs. The result is our PSO can achieve metric dimension known in literature [8] in reasonable amount of time.

Danang Triantoro Murdiansyah, Adiwijaya
Non-linear Based Fuzzy Random Regression for Independent Variable Selection

This paper demonstrates a fuzzy random regression approach using genetic algorithm (FRR-GA) to select independent variable for regression model. The FRR-GA approach enables us to indicate the best coefficient values among regressor that indicate the best independent variable, which is important to build regression model. Additionally, the fuzzy random regression approach is employed to treat dual uncertainties due to the realization of such data in real application. This paper presents an algorithm reflecting the non-linear strategy in the fuzzy random regression model. A numerical example illustrates the proposed solution procedure whereby the result suggested several feasible solutions to the user.

Mohd Zaki Mohd Salikon, Nureize Arbaiy
Time Series Forecasting Using Ridge Polynomial Neural Network with Error Feedback

Time series forecasting gets much attention due to its impact on many practical applications. Higher-order neural network with recurrent feedback is a powerful technique which used successfully for forecasting. It maintains fast learning and the ability to learn the dynamics of the series over time. In general, the most used recurrent feedback is the network output. However, no much attention has been paid to use network error instead of the network output. For that, in this paper, we propose a novel model which is called Ridge Polynomial Neural Network with Error Feedback (RPNN-EF) that combines the properties of higher order and error feedback recurrent neural network. Three signals have been used in this paper, namely heat wave temperature, IBM common stock closing price and Mackey–Glass equation. Simulation results show that RPNN-EF is significantly faster than other RPNN-based models for one-step ahead forecasting and its forecasting performance is more significant than these models for multi-step ahead forecasting.

Waddah Waheeb, Rozaida Ghazali, Tutut Herawan
Training ANFIS Using Catfish-Particle Swarm Optimization for Classification

ANFIS performance depends on the parameters it is trained with. Therefore, the training mechanism needs to be faster and reliable. Many have trained ANFIS parameters using GD, LSE, and metaheuristic techniques but the efficient one are still to be developed. Catfish-PSO algorithm is one of the latest successful swarm intelligence based technique which is used in this research for training ANFIS. As opposed to standard PSO, Catfish-PSO has string exploitation and exploration capability. The experimental results of training ANFIS network for classification problems show that Catfish-PSO algorithm achieved much better accuracy and satisfactory results.

Norlida Hassan, Rozaida Ghazali, Kashif Hussain

Data Mining

Frontmatter
FCA-ARMM: A Model for Mining Association Rules from Formal Concept Analysis

The evolution of technology in this era has contributed to a growing of abundant data. Data mining is a well-known computational process in discovering meaningful and useful information from large data repositories. There are various techniques in data mining that can be deal with this situation and one of them is association rule mining. Formal Concept Analysis (FCA) is a method of conceptual knowledge representation and data analysis. It has been applied in various disciplines including data mining. Extracting association rule from constructed FCA is very promising study but it is quite challenging, not straight forward and nearly unfocused. Therefore, in this paper we proposed an Integrated Formal Concept Analysis–Association Rule Mining Model (FCA-ARMM) and an open source tool called FCA-Miner. The results show that FCA-ARMM with FCA-Miner successful in generating the association rule from the real dataset.

Zailani Abdullah, Md Yazid Mohd Saman, Basyirah Karim, Tutut Herawan, Mustafa Mat Deris, Abdul Razak Hamdan
ELP-M2: An Efficient Model for Mining Least Patterns from Data Repository

Most of the algorithm and data structure facing a computational problem when they are required to deal with a highly sparse and dense dataset. Therefore, in this paper we proposed a complete model for mining least patterns known as Efficient Least Pattern Mining Model (ELP-M2) with LP-Tree data structure and LP-Growth algorithm. The comparative study is made with the well-know LP-Tree data structure and LP-Growth algorithm. Two benchmarked datasets from FIMI repository called Kosarak and T40I10D100K were employed. The experimental results with the first and second datasets show that the LP-Growth algorithm is more efficient and outperformed the FP-Growth algorithm at 14% and 57%, respectively.

Zailani Abdullah, Amir Ngah, Tutut Herawan, Noraziah Ahmad, Siti Zaharah Mohamad, Abdul Razak Hamdan
Integration of Self-adaptation Approach on Requirements Modeling

Self-adaptation approaches appear to respond to environmental complexity and uncertainty of today’s software systems. However, in order to prepare the system with the capability of self-adaptation requires a specific strategy, including when conducting stage requirements modeling. Activity of requirements modeling to be very decisive, when selecting and entering new elements to be added. Here we adopt a feedback loop as a strategy of self-adaptation, which is integrated into a goal-based approach as an approach to requirements. This paper discusses the integration of the two approaches, with the aim of obtaining a new model, which has the advantages of both.

Aradea, Iping Supriana, Kridanto Surendro, Irfan Darmawan
Detection of Redundancy in CFG-Based Test Cases Using Entropy

Testing is an activity conducted by the software tester to validate the behavior of the system, whether it is working correctly or not. The effectiveness of generating test cases becomes a crucial task where there are an increment of source code and the rapid change of the requirement. Therefore, to select the effective test cases become a problem when the test cases are redundant. It creates a new challenge on how to reduce the unnecessary test cases that will increase the cost and maintenance of the software testing process. Thus, this paper proposed the usage of entropy in detecting and removing the redundancy of test cases generated from Control Flow Graph (CFG). The result shows that the proposed approach reduced 61% of test cases compared to the original test suite. In conclusion, entropy can be an alternative approach in detecting and reducing the redundant test cases.

Noor Fardzilawati Md Nasir, Noraini Ibrahim, Tutut Herawan
Variety of Approaches in Self-adaptation Requirements: A Case Study

Self-adaptation requirements are requirements engineering studies to develop self-adaptive systems. This approach provides a way how activity at design-time requirements to meet stakeholder needs and system-to-be. Currently, there is a variety of approaches were proposed to the researchers through the development of goal-oriented requirements engineering. The ideas expressed through the expansion of this model into a way that is quite promising, however the various approaches proposed, does not mean no shortage. This paper describes in detail the variety of approaches available today through the implementation of a case study, and analysis of the results, we found 5 main features that can be used as consideration in formulating self-adaptation requirements, namely goal concept, environment model, behavior analysis, run-time dependencies, and adaptation strategy. Besides that, we saw of future research chance through deep study at goal-based modeling and loop feedback with utilizing data mining technique.

Aradea, Iping Supriana, Kridanto Surendro, Irfan Darmawan
Dynamic Trackback Strategy for Email-Born Phishing Using Maximum Dependency Algorithm (MDA)

Generally, most strategy prefers to use fake tokens to detect phishing activity. However, using fake tokens is limited to static feature selection that needs to be pre-determined. In this paper, a tokenless trackback strategy for email-born phishing is presented, which makes the strategy dynamic. Initially, the selected features were tested on the trackback system to generate phishing profile using Maximum Dependency Algorithm (MDA). Phishing emails are split into group of phishers constructed by the MDA algorithm. Then, the forensic analysis is implemented to identify the type of phisher against already assumed group of attacker either single or collaborative attacker. The performance of the proposed strategy is tested on email-born phishing. The result shows that the dynamic strategy could be used for tracking and classifying the attacker.

Isredza Rahmi A. Hamid, Noor Azah Samsudin, Aida Mustapha, Nureize Arbaiy
A Case Based Methodology for Problem Solving Aiming at Knee Osteoarthritis Detection

Knee osteoarthritis is the most common type of arthritis and a major cause of impaired mobility and disability for the ageing populations. Therefore, due to the increasing prevalence of the malady, it is expected that clinical and scientific practices had to be set in order to detect the problem in its early stages. Thus, this work will be focused on the improvement of methodologies for problem solving aiming at the development of Artificial Intelligence based decision support system to detect knee osteoarthritis. The framework is built on top of a Logic Programming approach to Knowledge Representation and Reasoning, complemented with a Case Based approach to computing that caters for the handling of incomplete, unknown, or even self-contradictory information.

Marisa Esteves, Henrique Vicente, José Machado, Victor Alves, José Neves
Preliminary Study for the Implementation of Electrical Capacitance Volume Tomography (ECVT) to Display Fruit Content

There are some problems in the Indonesian fruit export that need a solution which is a non-destructive tool on fruit to distinguish the conditions of raw, ripe and rotten fruits. In this preliminary research, the first step is to measure the electrical characteristic which is the capacitance of fruit. The result shows, in general, the value of the capacitance decreases when frequency is enlarged. It also can be concluded that the differences in capacitance seen more clearly at high frequencies. On the use of multi channel ECVT scanner, the resulting image shows only the outside of the fruit, so it is difficult to distinguish the condition of each fruits. Further studies are to make a sensor that can wrap the fruit inside so that the inner of the fruit can be more clearly seen. The algorithm used for the image reconstruction in this research is Linear Back Projection (LBP).

Riza Agustiansyah, Rohmat Saedudin, Mahfudz Al Huda
Dependency Scheme for Revise and Reasoning Solution

Revising the solution is one step in the cycle of Case-Based Reasoning (CBR). Revising the process is a task to make any improvements to a solution that cannot be reused by a question and answer system. This paper presents a new scheme to improve the solution in question-answer system using the dependency approach between words or phrases. The results of the repair cases should be initially tested to ensure that the obtained solution has met the criteria of a problem. The testing process will be conducted using the search dependency structure of the solution. By using the data of 135 Indonesian sentence English-premises solution in the form of a sentence, the scheme is built to produce an accuracy of 80,74%.

Wiwin Suwarningsih, Ayu Purwarianti, Iping Supriana
A New Binary Similarity Measure Based on Integration of the Strengths of Existing Measures: Application to Software Clustering

Different binary similarity measures have been explored with different agglomerative hierarchical clustering approaches for software clustering, to make the software systems understandable and manageable. Similarity measures have strengths and weakness that results in improving and deteriorating clustering quality. Determine whether strengths of the similarity measures can be used to avoid their weaknesses for software clustering. This paper presents the strengths of some of the well known existing binary similarity measures. Using these strengths, this paper introduces an improved new binary similarity measure. A series of experiments, on five different test software systems, is presented to evaluate the effectiveness of our new binary similarity measure. The results indicate that our new measure show the combined strengths of the existing similarity measures by reducing the arbitrary decisions, increasing the number of clusters and thus improve the authoritativeness of the clustering.

Rashid Naseem, Mustafa Mat Deris
A Comparative Study of Linear and Nonlinear Regression Models for Outlier Detection

Artificial Neural Networks provide models for a large class of natural and artificial phenomena that are difficult to handle using classical parametric techniques. They offer a potential solution to fit all the data, including any outliers, instead of removing them. This paper compares the predictive performance of linear and nonlinear models in outlier detection. The best-subsets regression algorithm for the selection of minimum variables in a linear regression model is used by removing predictors that are irrelevant to the task to be learned. Then, the ANN is trained by the Multi-Layer Perceptron to improve the classification and prediction of the linear model based on standard nonlinear functions which are inherent in ANNs. Comparison of linear and nonlinear models was carried out by analyzing the Receiver Operating Characteristic curves in terms of accuracy and misclassification rates for linear and nonlinear models. The results for linear and nonlinear models achieved 68% and 93%, respectively, with better fit for the nonlinear model.

Paul Inuwa Dalatu, Anwar Fitrianto, Aida Mustapha
Clustering Based on Classification Quality (CCQ)

Clustering a set of objects into homogeneous classes is a fundamental operation in data mining. Categorical data clustering based on rough set theory has been an active research area in the field of machine learning. However, pure rough set theory is not well suited for analyzing noisy information systems. In this paper, an alternative technique for categorical data clustering using Variable Precision Rough Set model is proposed. It is based on the classification quality of Variable Precision Rough theory. The technique is implemented in MATLAB. Experimental results on three benchmark UCI datasets indicate that the technique can be successfully used to analyze grouped categorical data because it produces better clustering results.

Iwan Tri Riyadi Yanto, Rd Rohmat Saedudin, Dedy Hartama, Tutut Herawan
A Framework of Clustering Based on Chicken Swarm Optimization

Chicken Swarm Optimization (CSO) algorithm which is one of the most recently introduced optimization algorithms, simulates the intelligent foraging behaviour of chicken swarm. Data clustering is used in many disciplines and applications. It is an important tool and a descriptive task seeking to identify homogeneous groups of objects based on the values of their attributes. In this work, CSO is used for data clustering. The performance of the proposed CSO was assessed on several data sets and compared with well known and recent metaheuristic algorithm for clustering: Particle Swarm Optimization (PSO) algorithm, Cuckoo Search (CS) and Bee Colony Algorithm (BC). The simulation results indicate that CSO algorithm have much potential and can efficiently be used for data clustering.

Nursyiva Irsalinda, Iwan Tri Riyadi Yanto, Haruna Chiroma, Tutut Herawan
Histogram Thresholding for Automatic Color Segmentation Based on k-means Clustering

Color segmentation method has been proposed and developed by many researchers, however it still become a challenging topic on how to automatically segment color image based on color information. This research proposes a method to estimate number of color and performs color segmentation. The method initiates cluster centers using histogram thresholding and peak selection on CIE L*a*b* chromatic channels. k-means is performed to find optimal cluster centers and to assign each color data into color labels using previously estimated clusters centers. Finally, initial color labels can be split or merge in order to segment black, dark, bright, or white color using luminosity histogram. The final cluster is evaluated using silhouette to measure the cluster quality and calculate the accuracy of color label prediction. The result shows that the proposed method achieves up to 85% accuracy on 20 test images and average silhouette value is 0.694 on 25 test images.

Adhi Prahara, Iwan Tri Riyadi Yanto, Tutut Herawan
Does Number of Clusters Effect the Purity and Entropy of Clustering?

Cluster analysis automatically partitioned the data into a number of different meaningful groups or clusters using the clustering algorithms. Every clustering algorithm produces its own type of clusters. Therefore, the evaluation of clustering is very important to find the better clustering algorithm. There exist a number of evaluation measures which can be broadly divided internal, external and relative measures. Internal measures are used to assess the quality of the obtained clusters like cluster cohesion and number of clusters (NoC). The external measures such as purity and entropy find the extent to which the clustering structure discovered by a clustering algorithm matches some external structure while the relative measures are used to assess two different clustering results using internal or external measures. To explore the effect of external evaluations specifically the NoC on internal evaluation measures like purity and entropy, an empirical study is conducted. The idea is taken from the fact that the NoC obtained in the clustering process is an indicator of the successfulness of a clustering algorithm. In this paper, some necessary propositions are formulated and then four previously utilized test cases are considered to validate the effect of NoC on purity and entropy. The proofs and experimental results indicate that the purity maximizes and the entropy minimizes with increasing NoC.

Jamal Uddin, Rozaida Ghazali, Mustafa Mat Deris
Text Detection in Low Resolution Scene Images Using Convolutional Neural Network

Text detection on scene images has increasingly gained a lot of interests, especially due to the increase of wearable devices. However, the devices often acquire low resolution images, thus making it difficult to detect text due to noise. Notable method for detection in low resolution images generally utilizes many features which are cleverly integrated and cascaded classifiers to form better discriminative system. Those methods however require a lot of hand-crafted features and manually tuned, which are difficult to achieve in practice. In this paper, we show that the notable cascaded method is equivalent to a Convolutional Neural Network (CNN) framework to deal with text detection in low resolution scene images. The CNN framework however has interesting mutual interaction between layers from which the parameters are jointly learned without requiring manual design, thus its parameters can be better optimized from training data. Experiment results show the efficiency of the method for detecting text in low resolution scene images.

Anhar Risnumawan, Indra Adji Sulistijono, Jemal Abawajy
Handling Imbalanced Data in Churn Prediction Using RUSBoost and Feature Selection (Case Study: PT.Telekomunikasi Indonesia Regional 7)

Solving imbalance problems is a challenging tasks in data mining and machine learning. Most classifiers are biased towards the majority class examples when learning from highly imbalanced data. In practice, churn prediction is considered as one of data mining application that reflects imbalance problems. This study investigates how to handle class imbalance in churn prediction using RUSBoost, a combination of random under-sampling and boosting algorithm, which is combined with feature selection for better performance result. The datasets used are broadband internet data collected from a telecommunication industry in Indonesia. The study firstly select the important features using Information Gain, and then building churn prediction model using RUSBoost with C4.5 as the weak learner. The result shows that feature selection and RUSBoost improve 16% of the performance of prediction and reduce 48% of the processing time.

Erna Dwiyanti, Adiwijaya, Arie Ardiyanti
Extended Local Mean-Based Nonparametric Classifier for Cervical Cancer Screening

Malignancy associated changes approach is one of possible strategies to classify a Pap smear slide as positive (abnormal) or negative (normal) in cervical cancer screening procedure. The malignancy associated changes (MAC) approach acquires analysis of the cells as a group as the abnormal phenomenon cannot be detected at individual cell level. However, the existing classification algorithms are limited to automation of individual cell analysis task as in rare event approach. Therefore, in this paper we apply extended local-mean based nonparametric classifier to automate a group of cells analysis that is applicable in MAC approach. The proposed classifiers extend the existing local mean-based nonparametric techniques in two ways: voting and pooling schemes to label each patient’s Pap smear slide. The performances of the proposed classifiers are evaluated against existing local mean-based nonparametric classifier in terms of accuracy and area under receiver operating characteristic curve (AUC). The extended classifiers show favourable accuracy compared to the existing local mean-based nonparametric classifier in performing the Pap smear slide classification task.

Noor Azah Samsudin, Aida Mustapha, Nureize Arbaiy, Isredza Rahmi A. Hamid
Adaptive Weight in Combining Color and Texture Feature in Content Based Image Retrieval

Low-level image feature extraction is the basis of content based image retrieval (CBIR) systems. In that process, the usage of more than one descriptors has tremendous impact on the increasing of system accuracy. Based on that fact, in this paper we combined color and texture feature in the feature extraction process, namely Color Layout Descriptor (CLD) for color feature extraction and Edge Histogram Descriptor (EHD) for texture feature extraction. We measure the system performance on retrieving top-5, top-10, top-15, and top-20 relevant images. We successfully demonstrated in the experiment, that the combination of color and texture descriptor might be improved the performance of retrieval system, significantly. In our proposed system, the combination of CLD and EHD reaches 72.82% in accuracy, using adaptive weight in Late Fusion Method.

Ema Rachmawati, Mursil Shadruddin Afkar, Bedy Purnama
WordNet Gloss for Semantic Concept Relatedness

Semantic lexical similarity and relatedness are important issues in natural language processing (NLP). Similarity and relatedness are not the same, while they are very closely related. To date, in many works these two issues are mixed up which harm system’s effectiveness. A popular approach to measure semantic similarity and relatedness is utilizing WordNet, a lexical database. This paper shows that Wordnet’s gloss is a potential source for measuring semantic relatedness. Experiment result using WordSim353 relatedness database confirms the effectiveness of the approach.

Moch Arif Bijaksana, Rakhmad Indra Permadi
Difference Expansion-Based Data Hiding Method by Changing Expansion Media

In this era, protecting secret data has played an important role since such data may be transmitted over public networks or stored in public storages. One possible method to protect the data is by implementing steganography/data hiding algorithms, such as Difference Expansion (DE). It works by embedding a secret message on the difference value of two pixels, in the case the cover is an image. Because the data changes directly, Difference Expansion has a problem on the limit values which are called overflow and underflow. This affects the amount of the secret message and quality of the resulted stego data. In this paper, we propose to change the embedding method on a matrix which is generated from an LSB image. Therefore, there is no restriction on media value where the data is embedded. The experimental result shows that this proposed method is able to improve the performance of stego data.

Tohari Ahmad, Diksy M. Firmansyah, Dwi S. Angreni

Ensemble Methods and their Applications

Frontmatter
A New Customer Churn Prediction Approach Based on Soft Set Ensemble Pruning

Accurate customer churn prediction is vital in any business organization due to higher cost involved in getting new customers. In telecommunication businesses, companies have used various types of single classifiers to classify customer churn, but the classification accuracy is still relatively low. However, the classification accuracy can be improved by integrating decisions from multiple classifiers through an ensemble method. Despite having the ability of producing the highest classification accuracy, ensemble methods have suffered significantly from their large volume of base classifiers. Thus, in the previous work, we have proposed a novel soft set based method to prune the classifiers from heterogeneous ensemble committee and select the best subsets of the component classifiers prior to the combination process. The results of the previous study demonstrated the ability of our proposed soft set ensemble pruning to reduce a substantial number of classifiers and at the same time producing the highest prediction accuracy. In this paper, we extended our soft set ensemble pruning on the customer churn dataset. The results of this work have proven that our proposed method of soft set ensemble pruning is able to overcome one of the drawbacks of ensemble method. Ensemble pruning based on soft set theory not only reduce the number of members of the ensemble, but able to increase the prediction accuracy of customer churn.

Mohd Khalid Awang, Mokhairi Makhtar, Mohd Nordin Abd Rahman, Mustafa Mat Deris
An Association Rule Mining Approach in Predicting Flood Areas

This study focuses on the application of Association rules mining for the flood data in Terengganu. Flood is one of the natural disasters that happens every year during the monsoon season and causes damage towards people, infrastructure and the environment. This paper aimed to find the correlation between water level and flood area in developing a model to predict flood. Malaysian Drainage and Irrigation Department supplied the dataset which were the flood area, water level and rainfall data. The association rules mining technique will generate the best rules from the dataset by using Apriori algorithm which had been applied to find the frequent itemsets. Consequently, by using the Apriori algorithm, it generated the 10 best rules with 100% confidence level and 40% minimum support after the candidate generation and pruning technique. The results of this research showed the usability of data mining in this field and can help to give early warning towards potential victims and spare some time in saving lives and properties.

Mokhairi Makhtar, Nur Ashikin Harun, Azwa Abd Aziz, Zahrahtul Amani Zakaria, Fadzli Syed Abdullah, Julaily Aida Jusoh
The Reconstructed Heterogeneity to Enhance Ensemble Neural Network for Large Data

This paper present an enhanced approach for ensemble multi classifier of Artificial Neural Networks (ANN). The motivation of this study is to enhance the ANN capability and performance using reconstructed heterogeneous if the homogenous classifiers are deployed. The clusters set are partitioned into two sets of cluster; clusters of a same class and clusters of multi class which both of them were using different partition techniques. Each partitions represented by an independent classifier of highly correlated patterns from different classes. Each set of clusters are compared and the final decision is voted by using majority voting. The approach is tested on benchmark large dataset and small dataset. The results show that the proposed approach achieved almost near to 99% of accuracy which is better classification than the existing approach.

Mumtazimah Mohamad, Mokhairi Makhtar, Mohd Nordin Abd Rahman
A New Mobile Malware Classification for SMS Exploitation

Mobile malware is ubiquitous in many malicious activities such as money stealing. Consumers are charged without their consent. This paper explores how mobile malware exploit the system calls via SMS. As a solution, we proposed a system calls classification based on surveillance exploitation system calls for SMS. The proposed system calls classification is evaluated and tested using applications from Google Play Store. This research focuses on Android operating system. The experiment was conducted using Drebin dataset which contains 5560 malware applications. Dynamic analysis was used to extract the system calls from each application in a controlled lab environment. This research has developed a new mobile malware classification for Android smartphone using a covering algorithm. The classification has been evaluated in 500 applications and 126 applications have been identified to contain malware.

Nurzi Juana Mohd Zaizi, Madihah Mohd Saudi, Adiebah Khailani
Data Mining Techniques for Classification of Childhood Obesity Among Year 6 School Children

Today, data mining is broadly applied in many fields, including healthcare and medical fields. Obesity problem among children is one of the issues commonly explored using data mining techniques. In this paper, the classification of childhood obesity among year six school children from two districts in Terengganu, Malaysia is discussed. The data were collected from two main sources; a Standard Kecergasan Fizikal Kebangsaan untuk Murid Sekolah Malaysia/National Physical Fitness Standard for Malaysian School Children (SEGAK) Assessment Program and a set of distributed questionnaire. From the collected data, 4,245 complete data sets were promptly analyzed. The data preprocessing and feature selection were implemented to the data sets. The classification techniques, namely Bayesian Network, Decision Tree, Neural Networks and Support Vector Machine (SVM) were implemented and compared on the data sets. This paper presents the evaluation of several feature selection methods based on different classifiers.

Fadzli Syed Abdullah, Nor Saidah Abd Manan, Aryati Ahmad, Sharifah Wajihah Wafa, Mohd Razif Shahril, Nurzaime Zulaily, Rahmah Mohd Amin, Amran Ahmed
Multiple Criteria Preference Relation by Dominance Relations in Soft Set Theory

This paper presents the applicability of soft set theory for discovering the preference relation in multi-valued information systems. The proposed approach is based on the notion of multi-soft sets. An inclusion of objects into value set of decision class in soft set theory is used to discover the relation between objects based on preference relation. Results from the experiment shows that dominance relation based on soft theory for preference relation is able to produce a finer object classification by eliminating inconsistencies during classification process as opposed to the expert judgement classification.

Mohd Isa Awang, Ahmad Nazari Mohd Rose, Mohd Khalid Awang, Fadhilah Ahmad, Mustafa Mat Deris
Reduce Scanning Time Incremental Algorithm (RSTIA) of Association Rules

In the real world where large amounts of data grow steadily, some old association rules can become stale, and new databases may give rise to some implicitly valid patterns or rules. Hence, updating rules or patterns is also important. A simple method for solving the updating problem is to reapply the mining algorithm to the entire database, but this approach is time-consuming. This paper reuses information from old frequent itemsets to improve its performance and addresses the problem of high cost access to incremental databases in which data are very changing by reducing the number of scanning times for the original database. A log file has been used to keep track of database changes whenever, a transaction has been added, deleted or even modified, a new record is added to the log file. This helps identifying the newly changes or updates in incremental databases. A new vertical mining technique has been used to minimize the number of scanning times to the original database. This algorithm has been implemented and developed using C#.net and applied to real data and gave a good result comparing with pure Apriori.

Iyad Aqra, Muhammad Azani Hasibuan, Tutut Herawan
A New Multi Objective Optimization to Improve Growth Domestic Produce of Economic Using Metaheuristic Approaches: Case Study of Iraq Economic

Currently, optimization problems are some of the immediate concern in economics. Peoples’ need is fast diversifying, while resources remain limited. This phenomenon is called the Multi-Objective Optimization (MOO) problem. Current techniques are mostly grounded in redundancy, large size path, long processing time. At this point in time, economic problems can be solved by utilizing mathematical principles, and one of the most common and effective approach include metaheuristics as soft computing techniques approaches in the context of the development of significance based plan reduction in the growth domestic product (GDP). The indicators in this model can be utilized to assess the state of a nation’s economy. This paper will discuss metaheuristics as soft computing techniques such Ant Colony Optimization (ACO) and Artificial Bees Colony (ABC) in order to propose an effective solution in the reduction of the complexity of MOO in the economy via the determination of an efficient strategy (plan). Experimental results proved that the usage of metaheuristics as soft computing techniques approaches is effective and more promising that current techniques, while ABC is superior to ACO in the context of search time and the exploration of an efficient global strategy (plan).

Ahmed Khalaf Zager Al Saedi, Rozaida Ghazali, Mustafa Mat Deris
The Algorithm Expansion for Starting Point Determination Using Clustering Algorithm Method with Fuzzy C-Means

The starting point determination in Fuzzy C-Means algorithm (FCM) is taken by random. Thus, the algorithm for starting point determination was developed with Hierarchical Agglomerative Clustering approach as a substitution of membership degree randomization process in the early iteration. It is expected that the clustering process will produce fewer iteration. The process contained on this algorithm is the incorporation of a number of clusters based on the approach contained in complete linkage. Then it will calculate the difference in the objective function for each iterations after the clustering process has been conducted on the FCM. The iteration process will be stopped after the difference of objective function is smaller than the prescribed limit. In this research, analysis of variance from the obtained cluster produces a good homogeneity and heterogeneity value. In addition, the number of iteration is getting fewer.

Edrian Hadinata, Rahmat W. Sembiring, Tien Fabrianti Kusumasari, Tutut Herawan
On Mining Association Rules of Real-Valued Items Using Fuzzy Soft Set

Association rules is s one of data mining method that have been implemented in many discipline areas. This rule is able to find interesting relation between the data in a large data set. The traditional association rule has been employed to handle crisp set of items. However, for real-valued items, the traditional association rules fail to handle them. This paper introduces an alternative method for mining association rules for real-valued items. It is based on the concept of hybridization between fuzzy and soft sets. This combination is called fuzzy soft association rules. The results show that the introduced concept was able to mine an interesting association rules among the real number of items where they are represented in fuzzy soft set. Furthermore, it has the ability in dealing with uncertainty or vague data.

Dede Rohidin, Noor A. Samsudin, Tutut Herawan
Application of Wavelet De-noising Filters in Mammogram Images Classification Using Fuzzy Soft Set

Recent advances in the field of image processing have revealed that the level of noise in mammogram images highly affect the images quality and classification performance of the classifiers. Whilst, numerous data mining techniques have been developed to achieve high efficiency and effectiveness for computer aided diagnosis systems. However, fuzzy soft set theory has been merely experimented for medical images. Thus, this study proposed a classifier based on fuzzy soft set with embedding wavelet de-noising filters. Therefore, the proposed methodology involved five steps namely: MIAS dataset, wavelet de-noising filters hard and soft threshold, region of interest identification, feature extraction and classification. Therefore, the feasibility of fuzzy soft set for classification of mammograms images has been scrutinized. Experimental results show that proposed classifier FussCyier provides the classification performance with Daub3 (Level 1) with accuracy 75.64% (hard threshold), precision 46.11%, recall 84.67%, F-Micro 60%. Thus, the results provide an alternative technique to categorize mammogram images.

Saima Anwar Lashari, Rosziati Ibrahim, Norhalina Senan, Iwan Tri Riyadi Yanto, Tutut Herawan
Design Selection of In-UVAT Using MATLAB Fuzzy Logic Toolbox

The design of tool holder was a crucial step to make sure the tool holder is enough to handle all forces on turning process. Because of the direct experimental approach is expensive, a few design of innovative ultrasonic vibration assisted tuning (In-UVAT) has proposed. This design has analyzed using finite element simulation to predict feasibility of tool holder displacement and effective stress. SS201 and AISI 1045 materials were used with sharp and ramp corners flexure hinges on design. To decide which one the design is selected was used MATLAB Fuzzy Logic Toolbox. The result shows that AISI 1045 material and which has ramp corner flexure hinge was the best choice to be produced. It has the Eff. Stress Static equal 3, Displacement Static equal 17.5, Eff. Stress Dynamic equal 3, Displacement Dynamic equal 17.5 and Durability Value is 86.4.

Haris Rachmat, Tatang Mulyana, Sulaiman bin H. Hasan, Mohd. Rasidi bin Ibrahim

Web Mining, Services and Security

Frontmatter
A Web Based Peer-to-Peer RFID Architecture

To realize the maximum benefits of RFID technology in large scale distributed environments, the use of an architectural framework which fulfils the specific requirements of those systems is paramount. Unfortunately, the existing frameworks are designed at a high level to allow the development and deployment of a number of fundamentally different systems. Therefore, specialist systems based on this kind of framework will run into a number of issues due to the nature of those applications and their unique needs. In this paper, we present web based P2P architecture for distributed RFID systems specifically targeted at distributed RFID systems. We carry out a comparative analysis of the proposed which shows that our architecture has a number of significant advantages over other existing systems.

Harinda Fernando, Hairulnizam Mahdin
Performance-Aware Trust-Based Access Control for Protecting Sensitive Attributes

The prevailing trend of the seamless digital collection has prompted privacy concern not only among academia but also among the majority. In enforcing the automation of privacy policies and law, access control has been one of the most devoted subjects. Despite the recent advances in access control frameworks and models, there are still issues that impede the development of effective access control. Among them are the lack of assessment’s granularity in user authorization, and reliance on identity, role or purpose-based access control schemes. In this paper, we address the problem of protecting sensitive attributes from inappropriate access. We propose an access control mechanism that employs two trust metrics name experience and behavior. We also propose a scheme for quantifying those metrics in an enterprise computing environment. Finally, we show that these metrics are useful in improving the assessment granularity in permitting or prohibiting users to gain access to sensitive attributes.

Mohd Rafiz Salji, Nur Izura Udzir, Mohd Izuan Hafez Ninggal, Nor Fazlida Mohd. Sani, Hamidah Ibrahim
E-Code Checker Application

Nowadays, many areas in computer sciences use ontology such as knowledge engineering, software reuse, digital libraries, web on the heterogeneous information processing, semantic web, and information retrieval. The area of halal industry is the fastest growing global business across the world. The halal food industry is thus crucial for Muslims all over the world as it serves to ensure them that the food items they consume daily are syariah compliant. However, ontology has still not been used widely in the halal industry. Today, Muslim community still have problem to verify halal status for halal products in the market especially in foods consisting of E number. In this paper, ontology will apply at E numbers as a method to solve problems of various halal sources. There are various chemical ontology and databases found to help this ontology construction. The E numbers in this chemical ontology are codes for chemicals that can be used as food additives. With this E numbers ontology, Muslim community could identify and verify the halal status effectively for halal products in the market.

Shahreen Kasim, Ummi Aznazirah Azahar, Noor Azah Samsudin, Mohd Farhan Md Fudzee, Hairulnizam Mahdin, Azizul Azhar Ramli, Suriawati Suparjoh
Factors Influencing the Use of Social Media in Adult Learning Experience

The uses of social media are very popular nowadays as a platform of informations exchanges among the users. It has been used to support e-learning, where lecturers uploading lecture notes and tutorial videos on a group page that joined by their students. For young generations, this is something that they expect from their lecturer. However it is different with adult learner who are not all used to today’s technologies. This paper study the factors of that influenced the use of social media in adult learning. By understanding the factors, practitioners can used it as a basis to build a more specific social media tools that is intended for adult learner. To understand the factors, a survey has been distributed to a group of postgraduate students, and the data is analyzed by using IBM SPSS 2.0. The research suggested that technology acceptance factor as most dominant factor among the others that influenced the use of social media in adult learning. The results of study also suggested there is significant relationship of all factors specified.

Masitah Ahmad, Norhayati Hussin, Syafiq Zulkarnain, Hairulnizam Mahdin, Mohd Farhan Md. Fudzee
A Framework to Analyze Quality of Service (QoS) for Text-To-Speech (TTS) Services

Quality of service (QoS) evaluation is vital for text-to-speech (TTS) web service applications. Most of the current solutions focus on either evaluating functional or nonfunctional attributes of the TTS. In this paper, we propose a QoS framework to evaluate and analyze the perceived QoS that combines general and specific mechanisms for measuring both functional and nonfunctional requirements of speech quality. General mechanism measures the response time of TTS services while specific mechanism measures intelligibility and naturalness through subjective quality measurements, which are mapped onto mean opinion score (MOS). The result shows the workability of the framework, tested by predetermined users to three services: service1 (Fromtexttospeech) resulting 47.84%; service2 and service3 (NaturalReader and Yakitome) are 31.62 and 21.53% respectively. The TTS services evaluation can be to enhance the user experience.

Mohd Farhan Md Fudzee, Mohamud Hassan, Hairulnizam Mahdin, Shahreen Kasim, Jemal Abawajy
Indoor Navigation Using A* Algorithm

This paper introduced an indoor navigation application that helps junior students in Faculty of Computer Science and Information Technology (FSKTM) to find their classroom location. This project implements the A* (pronounced A Star) path finding algorithm to calculate the shortest path for users. Users can choose to view the floor plan of the building or start navigation. Users can choose their starting point from the list and set their destination to start navigation. The application is then calculate the shortest path for users by implement the A* algorithm. The route path will show on the floor plan after calculation done. Thus, users will find this project is easy to use and time saving.

Shahreen Kasim, Loh Yin Xia, Norfaradilla Wahid, Mohd Farhan Md Fudzee, Hairulnizam Mahdin, Azizul Azhar Ramli, Suriawati Suparjoh, Mohamad Aizi Salamat
Mining Significant Association Rules from on Information and System Quality of Indonesian E-Government Dataset

Electronic government (e-government) refers to how to apply the information and communication technologies (ICT) to improve the efficiency, effectiveness, transparency and responsibility of public governments. They are usually adopted in complex setting influenced not only the infrastructure factor but also the others factor such as end user satisfaction. Information and system quality are often seen as a key antecedent of user satisfaction. This paper presents an application of data mining technique based on association rules mining to capture interesting rules on information and system quality of Indonesian e-Government dataset. It is based on Least Frequent Items method by embedding FP-Growth algorithm. The rules are formed by implementing the relationship of an item or many items to an item (cardinality: many-to-one). The rule is categorized as interesting if it has a highest critical relative support, positive correlation and confidence. The results show that the total number of significant rules is 256 which is 14% from the overall rules captured i.e. 1811 on information quality data, meanwhile for system quality the total number of significant rules is 1790 which is 21% from the overall rules captured i.e. 18414.

Deden Witarsyah Jacob, Mohd Farhan Md Fudzee, Mohamad Aizi Salamat, Rohmat Saedudin, Zailani Abdullah, Tutut Herawan
A Feature Selection Algorithm for Anomaly Detection in Grid Environment Using k-fold Cross Validation Technique

An Intrusion Detection System (IDS) seeks to identify unauthorized access to computer systems’ resources and data. The spreading of a data set size, in number of records as well as of attributes, as trigger the development of a number of big data platforms as well as parallel data analysis algorithms. This paper proposed a state-of-the-art technique to reduce the number of input features in dataset by using the Sequential Forward Selection (SFS) with k-Fold Cross Validation Model. Before reaching the feature reduction stage, the pre-processing analysis for detecting unusual observations that do not seem to belong to the pattern of variability produced by the other observations. The pre-processing analysis consists of outlier’s detection and Transformation. Outliers are best detected visually whenever this is possible. This paper explains the steps for detecting outliers’ data and describes the transformation method that transforms them to normality. The transformation obtained by maximizing Lamda functions usually improves the approximation to normality.

Dahliyusmanto, Tutut Herawan, Syefrida Yulina, Abdul Hanan Abdullah
Soft Set Approach for Clustering Graduated Dataset

Every university has objectives to make sure their students graduate on time. This objective can be achieved by using early warning system (EWS). Through EWS, students who will graduate late can be recognized in advance. Thus, appropriate interventions can be given to the student so that they can graduate on time. The predictive model is the core of an EWS, that built based on the graduated student data. The problem that often arises in a predictive model is the degree of accuracy. In order to increase the accuracy of the prediction, the clustering of attribute selection need to be conducted first. One of approach that can be used to cluster attribute selection is by using Maximum Degree of Domination in Soft Set Theory (MDDS) algorithm. This article implements the MDDS algorithm to cluster the attributes from student datasets. The results obtained from this research is the dominant attributes that can be used as a foundation to develop a predictive model of student graduation time.

Rd Rohmat Saedudin, Shahreen Binti Kasim, Hairulnizam Mahdin, Muhammad Azani Hasibuan
An Application of Rough Set Theory for Clustering Performance Expectancy of Indonesian e-Government Dataset

Performance expectancy has been studied as an important factor which influences e-government. Therefore, grouping of e-government users involving performance expectancy factor is still challenging. Computational model can be explored as an efficient clustering technique for grouping e-government users. This paper presents an application of rough set theory for clustering performance expectancy of e-government user. The propose technique base on the selection of the best clustering attribute where the maximum dependency of attribute in e-government data is used. The datasets are taken from a survey aimed to understand of the adoption issue in e-government service usage at Bandung city in Indonesia. At this stage of the research, we point how a soft set approach for data clustering can be used to select the best clustering attribute. The result of this study will present useful information for decision maker in order to make policy concerning theirs people and may potentially give a recommendation how to design and develop e-government system in improving public service.

Deden Witarsyah Jacob, Mohd Farhan Md. Fudzee, Mohamad Aizi Salamat, Rd Rohmat Saedudin, Iwan Tri Riyadi Yanto, Tutut Herawan
Backmatter
Metadata
Title
Recent Advances on Soft Computing and Data Mining
Editors
Tutut Herawan
Rozaida Ghazali
Nazri Mohd Nawi
Mustafa Mat Deris
Copyright Year
2017
Electronic ISBN
978-3-319-51281-5
Print ISBN
978-3-319-51279-2
DOI
https://doi.org/10.1007/978-3-319-51281-5

Premium Partner