Skip to main content

2014 | Buch

Recent Advances on Soft Computing and Data Mining

Proceedings of The First International Conference on Soft Computing and Data Mining (SCDM-2014) Universiti Tun Hussein Onn Malaysia, Johor, MalaysiaJune 16th-18th, 2014

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the First International Conference on Soft Computing and Data Mining, SCDM 2014, held in Universiti Tun Hussein Onn Malaysia, in June 16th-18th, 2014. The 65 revised full papers presented in this book were carefully reviewed and selected from 145 submissions, and organized into two main topical sections; Data Mining and Soft Computing. The goal of this book is to provide both theoretical concepts and, especially, practical techniques on these exciting fields of soft computing and data mining, ready to be applied in real-world applications. The exchanges of views pertaining future research directions to be taken in this field and the resultant dissemination of the latest research findings makes this work of immense value to all those having an interest in the topics covered.

Inhaltsverzeichnis

Frontmatter
A Fuzzy Time Series Model in Road Accidents Forecast

Many researchers have explored fuzzy time series forecasting models with the purpose to improve accuracy. Recently, Liu et al., have proposed a new method, which an improved version of Hwang et al., method. The method has proposed several properties to improve the accuracy of forecast such as levels of window base, length of interval, degrees of membership values, and existence of outliers. Despite these improvements, far too little attention has been paid to real data applications. Based on these advantageous, this paper investigates the feasibility and performance of Liu et al., model to Malaysian road accidents data. Twenty eight years of road accidents data is employed as experimental datasets. The computational results of the model show that the performance measure of mean absolute forecasting error is less than 10 percent. Thus it would be suggested that the Liu et al., model practically fit with the Malaysian road accidents data.

Lazim Abdullah, Chye Ling Gan
A Jordan Pi-Sigma Neural Network for Temperature Forecasting in Batu Pahat Region

This paper disposes towards an idea to develop a new network model called a Jordan Pi Sigma Neural Network (JPSN) to overcome the drawbacks of ordinary Multilayer Perceptron (MLP) whilst taking the advantages of Pi-Sigma Neural Network (PSNN). JPSN, a network model with a single layer of tuneable weights with a recurrent term added in the network, is trained using the standard backpropagation algorithm. The network was used to learn a set of historical temperature data of Batu Pahat region for five years (2005-2009), obtained from Malaysian Meteorological Department (MMD). JPSN’s ability to predict the future trends of temperature was tested and compared to that of MLP and the standard PSNN. Simulation results proved that JPSN’s forecast comparatively superior to MLP and PSNN models, with the combination of learning rate 0.1, momentum 0.2 and network architecture 4-2-1 andlower prediction error. Thus, revealing a great potential for JPSN as an alternative mechanism to both PSNN and MLP in predicting the temperature measurement for one-step-ahead.

Noor Aida Husaini, Rozaida Ghazali, Lokman Hakim Ismail, Tutut Herawan
A Legendre Approximation for Solving a Fuzzy Fractional Drug Transduction Model into the Bloodstream

While an increasing number of fractional order integrals and differential equations applications have been reported in the physics, signal processing, engineering and bioengineering literatures, little attention has been paid to this class of models in the pharmacokinetics-pharmacodynamic (PKPD) literature. In this research, we are confined with the application of Legendre operational matrix for solving fuzzy fractional differential equation arising in the drug delivery model into the bloodstream. The results illustrates the effectiveness of the method which can be in high agreement with the exact solution.

Ali Ahmadian, Norazak Senu, Farhad Larki, Soheil Salahshour, Mohamed Suleiman, Md. Shabiul Islam
A Multi-reference Ontology for Profiling Scholars’ Background Knowledge

In most ontology-based scholar’s recommender systems, profiling approaches employ a reference ontology as a backbone hierarchy to learn the topics of scholar’s interests. It often works on the assumption that the reference ontology contains possible topics of scholars’ preferences. However, such single reference ontologies lack sufficient ontological concepts and poor ontological concepts, which unable to capture the entire scholars’ interests in terms of academic knowledge. In this paper, we extract, select, and merge heterogeneous subjects from different taxonomies on the Web and enrich by Wikipedia to constructs an OWL reference ontology for Computer Science domain. Compared to similar reference ontologies, our ontology purely supports the structure of scholars’ knowledge, contains richer topics of the domain, and best fits for profiling the scholars’ knowledge.

Bahram Amini, Roliana Ibrahim, Mohd Shahizan Othman, Mohd Nazir Ahmad
A New Binary Particle Swarm Optimization for Feature Subset Selection with Support Vector Machine

Social Engineering (SE) has emerged as one of the most familiar problem concerning organizational security and computer users. At present, the performance deterioration of phishing and spam detection systems are attributed to high feature dimensionality as well as the computational cost during feature selection. This consequently reduces the classification accuracy or detection rate and increases the False Positive Rate (FPR). This research is set to introduce a novel feature selection method called the New Binary Particle Swarm Optimization (NBPSO) to choose a set of optimal features in spam and phishing emails. The proposed feature selection method was tested in a classification experiments using the Support Vector Machine (SVM) to classify emails according to the various features as input. The results obtained by experimenting on two phishing and spam emails showed a reasonable performance to the phishing detection system.

Amir Rajabi Behjat, Aida Mustapha, Hossein Nezamabadi-Pour, Md. Nasir Sulaiman, Norwati Mustapha
A New Hybrid Algorithm for Document Clustering Based on Cuckoo Search and K-means

In this paper we propose a new approach for document clustering based on Cuckoo Search and K-means. Due to the random initialization of centroids, cuckoo search clustering can reach better solution but the number of iterations may increase drastically. In order to overcome this drawback, we propose to replace this randomness by k-means at the beginning step. The effectiveness of the proposed approach was tested on the benchmark extracted from Reuters 21578 Text Categorization Dataset and the UCI Machine Learning Repository. The obtained results show the efficiency of the new approach in term of reducing the number of iterations and fitness values. Furthermore, it can improve the quality of clustering measured by the famous F-measure.

Ishak Boushaki Saida, Nadjet Kamel, Bendjeghaba Omar
A New Positive and Negative Linguistic Variable of Interval Triangular Type-2 Fuzzy Sets for MCDM

Fuzzy linguistic variable in decision making field has received significant attention from researchers in many areas. However, the existed research is given attention only in one side rather than two sides. Therefore, the aim of this paper is to introduce a new linguistic variable which considers both sides, positive and negative sides for symmetrical interval triangular type-2 fuzzy set (T2 FS). This new linguistic variable is developed in line with the interval type-2 fuzzy TOPSIS (IT2 FTOPSIS) method. Besides, a ranking value for aggregation process is modified to capture both positive and negative aspect for triangular. Then, this new method is tested using two illustrative examples. The results show that the new method is highly beneficial in terms of applicability and offers a new dimension to problem solving technique for the type-2 fuzzy group decision-making environment.

Nurnadiah Zamri, Lazim Abdullah
A New Qualitative Evaluation for an Integrated Interval Type-2 Fuzzy TOPSIS and MCGP

Sometimes, information needed an objectively evaluation. It is hard to determine the value of some parameters because of their uncertain or ambiguous nature. However, most of the study neglected the qualitative evaluation. This paper aims to propose a new qualitative evaluation which considers three different aspects which are linguistic to crisp, the unconvinced decision and in between. This new qualitative evaluation is developed to produce an optimal preference ranking of an integrated fuzzy TOPSIS and multi-choice goal programming MCGP in interval type-2 fuzzy sets (IT2 FSs) aspects. An example is used to illustrate the proposed method. The results show that the qualitative evaluation in the new method is suitable for the integrated interval type-2 fuzzy TOPSIS and MCGP. Results are consistent with the numerical example. This new method offers a new dimension to type-2 fuzzy group decision-making environment.

Nurnadiah Zamri, Lazim Abdullah
A Performance Comparison of Genetic Algorithm’s Mutation Operators in n-Cities Open Loop Travelling Salesman Problem

Travelling Salesman Problem (TSP) is one of the most commonly studied optimization problem. In Open Loop Travelling Salesman Problem (OTSP), the salesman travels to all the given

m

cities but does not return to the city he started and each city is visited by salesman exactly once. However, a new problem of OTSP occur when the salesman does not visit all the given

m

cities, but only to visit

n

cities from the given

m

cities. This problem called

n

-Cities Open Loop Travelling Salesman Problem (

n

OTSP), which seems to be more close to the real-life transportation problem. In this paper, Genetic Algorithm (GA) with different mutation operators is implemented to the

n

OTSP in order to investigate which mutation operators give the optimal solution in minimizing the distance and computational time of the

n

visited cities. The mutation operators are inversion, displacement, pairwise swap and the combination of the above three operators. The results of these comparisons show that the GA-inversion mutation operator can achieve better solution in minimizing the total distance of the tour. In addition, the GA with combination of three mutation operators has great potential in reducing the computation time.

Hock Hung Chieng, Noorhaniza Wahid
A Practical Weather Forecasting for Air Traffic Control System Using Fuzzy Hierarchical Technique

Due to rapid changes of global climate, weather forecasting has becomes one of the significant research fields. Modern airports maintain high security flight operations through precise knowledge of weather forecasting. The objectives of this research focused on two major parts; the weather forecasting model of an airport system and the fuzzy hierarchical technique used. In general, this research emphasizes on the building blocks of a weather forecasting application that could support Terminal Aerodrome Forecast by utilizing Mamdani model. The developed application considers variables, groups of weather elements, combination of weather elements in a group, web data sources and structured knowledge to provide a profound forecast.

Azizul Azhar Ramli, Mohammad Rabiul Islam, Mohd Farhan Md. Fudzee, Mohamad Aizi Salamat, Shahreen Kasim
Adapted Bio-inspired Artificial Bee Colony and Differential Evolution for Feature Selection in Biomarker Discovery Analysis

The ability of proteomics in detecting particular disease in the early stages intrigues researchers, especially analytical researchers, computer scientists and mathematicians. Further, high throughput of proteomics pattern derived from mass spectrometry analysis has embarked new paradigm for biomarker analysis through accessible body fluids such as serum, saliva, and urine. Recently, sophisticated computational techniques that are mimetic natural survival and behaviour of organisms have been widely adopted in problem-solving algorithm. As we put emphasis on feature selection algorithm, the most challenging phase in biomarker analysis is selecting most parsimonious features of voluminous mass spectrometry data. Therefore this study reveals the hybrid artificial bee colony and differential evolution as feature selection techniques exhibits comparable results. These results were compared with other types of bio-inspired algorithms such as ant colony and particle swarm optimisation. The proposed method produced; 1) 100 percent and 98.44 of accuracy of the ovarian cancer dataset; and 2) 100 percent and 94.44 percent for TOX dataset for both training and testing respectively.

Syarifah Adilah Mohamed Yusoff, Rosni Abdullah, Ibrahim Venkat
An Artificial Intelligence Technique for Prevent Black Hole Attacks in MANET

Mobile ad hoc networks (MANETs) can be operated in the difficult environments or emergency situations. In this type of networks, the nodes work of forwarding packets together. Routing protocols are worked based on multi-hop to discover a path from source to destination node when the direct path between them does not exist. One of the standard MANET protocols is Ad hoc on-demand distance vector protocol (AODV). AODV is attacked by many types of attacks such as black hole attack due its routing mechanism. Black hole provides highest destination sequence number and lowest hop count number to attract sourcenode and drop the packets. Most previous works were used trusted neighbor nodes for preventing black hole attack and making AODV more secure. However, these solutions suffer from high routing overhead and missing specific mechanism for providing a shortest secure path. In this paper, we propose an intelligent preventing technique for AODV to prevent black hole attacks, which is called Shortest Secure Path for AODV (SSP-AODV). This intelligent technique is integrated A* and Floyd-Warshall’s algorithms. The simulation is conducted in Network Simulator 2. The results indicate that the proposed intelligent technique outperform standard AODV in two terms; packet loss delivery and average End-to-End delay. The performance of proposed technique can significantly reduce the effect of black hole attacks.

Khalil I. Ghathwan, Abdul Razak B. Yaakub
ANFIS Based Model for Bispectral Index Prediction

Prediction of depth of hypnosis is important in administering optimal anaesthesia during surgical procedure. However, the effect of anaesthetic drugs on human body is a nonlinear time variant system with large inter-patient variability. Such behaviours often caused limitation to the performance of conventional model. This paper explores the possibility of using the Adaptive Neuro-Fuzzy Inference System (ANFIS) to create a model for predicting Bispectral Index (BIS). BIS is a well-studied indicator of hypnotic level. Propofol infusion rate and past values of BIS were used as the input variables for modelling. Result shows that the ANFIS model is capable of predicting BIS very well.

Jing Jing Chang, S. Syafiie, Raja Kamil Raja Ahmad, Thiam Aun Lim
Classify a Protein Domain Using SVM Sigmoid Kernel

Protein domains are discrete portion of protein sequence that can fold independently with their own function. Protein domain classification is important for multiple reasons, which including determines the protein function in order to manufacture new protein with new function. However, there are several issues that need to be addressed in protein domain classification which include increasing domain signal and accurate classify to their category. Therefore, to overcome this issue, this paper proposed a new approach to classify protein domain from protein subsequences and protein structure information using SVM sigmoid kernel. The proposed method consists of three phases: Data generating, creating sequence information and classification. The data generating phase selects potential protein and generates clear domain information. The creating sequence information phase used several calculations to generate protein structure information in order to optimize the domain signal. The classification phase involves SVM sigmoid kernel and performance evaluation. The performance of the approach method is evaluated in terms of sensitivity and specificity on single-domain and multiple-domain using dataset SCOP 1.75. The result on SVM sigmoid kernel shown higher accuracy compare with single neural network and double neural network for single and multiple domain prediction. This proposed approach develops in order to solve the problem of coincidently group into both categories either single or multiple domain. This method showed an improvement of classification in term of sensitivity, specificity and accuracy.

Ummi Kalsum Hassan, Nazri Mohd. Nawi, Shahreen Kasim, Azizul Azhar Ramli, Mohd Farhan Md Fudzee, Mohamad Aizi Salamat
Color Histogram and First Order Statistics for Content Based Image Retrieval

Content Based Image Retrieval (CBIR) is one of the fastest growing research areas in the domain of multimedia. Due to the increase in digital contents these days, users are experiencing difficulties in searching for specific images in their databases. This paper proposed a new effective and efficient image retrieval technique based on color histogram using Hue-Saturation-Value (HSV) and First Order Statistics (FOS), namely HSV-

fos

. FOS is used for the extraction of texture features while color histogram deals with color information of the image. Performance of the proposed technique is compared with the Variance Segment and Histogram based techniques and results shows that HSV-

fos

technique achieved 15% higher accuracy as compared to Variance Segment and Histogram-based techniques. The proposed technique can help the forensic department for identification of suspects.

Muhammad Imran, Rathiah Hashim, Noor Eliza Abd Khalid
Comparing Performances of Cuckoo Search Based Neural Networks

Nature inspired meta-heuristic algorithms provide derivative-free solutions to solve complex problems. Cuckoo Search (CS) algorithm is one of the latest additions to the group of nature inspired optimization heuristics. In this paper, Cuckoo Search (CS) is implemented in conjunction with Back propagation Neural Network (BPNN), Recurrent Neural Network (RNN), and Levenberg Marquardt back propagation (LMBP) algorithms to achieve faster convergence rate and to avoid local minima problem. The performances of the proposed Cuckoo Search Back propagation (CSBP), Cuckoo Search Levenberg Marquardt (CSLM) and Cuckoo Search Recurrent Neural Network (CSRNN) algorithms are compared by means of simulations on OR and XOR datasets. The simulation results show that the CSRNN performs better than other algorithms in terms of convergence speed and Mean Squared Error (MSE).

Nazri Mohd Nawi, Abdullah Khan, M. Z. Rehman, Tutut Herawan, Mustafa Mat Deris
CSLMEN: A New Cuckoo Search Levenberg Marquardt Elman Network for Data Classification

Recurrent Neural Networks (RNN) have local feedback loops inside the network which allows them to store earlier accessible patterns. This network can be trained with gradient descent back propagation and optimization technique such as second-order methods. Levenberg-Marquardt has been used for networks training but still this algorithm is not definite to find the global minima of the error function. Nature inspired meta-heuristic algorithms provide derivative-free solution to optimize complex problems. This paper proposed a new meta-heuristic search algorithm, called Cuckoo Search (CS) to train Levenberg Marquardt Elman Network (LMEN) in achieving fast convergence rate and to avoid local minima problem. The proposed Cuckoo Search Levenberg Marquardt Elman Network (CSLMEN) results are compared with Artificial Bee Colony using BP algorithm, and other hybrid variants. Specifically 7-bit parity and Iris classification datasets are used. The simulation results show that the computational efficiency of the proposed CSLMEN training process is highly enhanced when coupled with the Cuckoo Search method.

Nazri Mohd Nawi, Abdullah Khan, M. Z. Rehman, Tutut Herawan, Mustafa Mat Deris
Enhanced MWO Training Algorithm to Improve Classification Accuracy of Artificial Neural Networks

The Mussels Wandering Optimization (MWO) algorithm is a novel meta-heuristic optimization algorithm inspired ecologically by mussels’ movement behavior. The MWO algorithm has been used to solve linear and nonlinear functions and it has been adapted in supervised training of Artificial Neural Networks (ANN). Based on the latter application, the classification accuracy of ANN based on MWO training was on par with other algorithms. This paper proposes an enhanced version of MWO algorithm; namely Enhanced-MWO (E-MWO) in order to achieve an improved classification accuracy of ANN. In addition, this paper discusses and analyses the MWO and the effect of MWO parameters selection (especially, the shape parameter) on ANN classification accuracy. The E-MWO algorithm is adapted in training ANN and tested using well-known benchmarking problems and compared against other algorithms. The obtained results indicate that the E-MWO algorithm is a competitive alternative to other evolutionary and gradient-descent based training algorithms in terms of classification accuracy and training time.

Ahmed A. Abusnaina, Rosni Abdullah, Ali Kattan
Fuzzy Modified Great Deluge Algorithm for Attribute Reduction

This paper proposes a local search meta-heuristic free of parameter tuning to solve the attribute reduction problem. Attribute reduction can be defined as the process of finding minimal subset of attributes from an original set with minimum loss of information. Rough set theory has been used for attribute reduction with much success. However, the reduction method inside rough set theory is applicable only to small datasets, since finding all possible reducts is a time consuming process. This motivates many researchers to find alternative approaches to solve the attribute reduction problem. The proposed method, Fuzzy Modified Great Deluge algorithm (Fuzzy-mGD), has one generic parameter which is controlled throughout the search process by using a fuzzy logic controller. Computational experiments confirmed that the Fuzzy-mGD algorithm produces good results, with greater efficiency for attribute reduction, when compared with other meta-heuristic approaches from the literature.

Majdi Mafarja, Salwani Abdullah
Fuzzy Random Regression to Improve Coefficient Determination in Fuzzy Random Environment

Determining the coefficient value is important to measure relationship in algebraic expression and to build a mathematical model though it is complex and troublesome. Additionally, providing precise value for the coefficient is difficult when it deals with fuzzy information and the existence of random information increase the complexity of deciding the coefficient. Hence, this paper proposes a fuzzy random regression method to estimate the coefficient values for which statistical data contains simultaneous fuzzy random information. A numerical example illustrates the proposed solution approach whereby coefficient values are successfully deduced from the statistical data and the fuzziness and randomness were treated based on the property of fuzzy random regression. The implementation of the fuzzy random regression method shows the significant capabilities to estimate the coefficient value to further improve the model setting of production planning problem which retain simultaneous uncertainties.

Nureize Arbaiy, Hamijah Mohd Rahman
Honey Bees Inspired Learning Algorithm: Nature Intelligence Can Predict Natural Disaster

Artificial bee colony (ABC) algorithm which used the honey bee intelligence behaviors, is a new learning technique comparatively attractive for solving optimization problems. Artificial Neural Network (ANN) trained with the ABC algorithm normally has poor exploration and exploitation processes due to the random and similar strategies for finding best position of foods. Global artificial bee colony (Global ABC) and Guided artificial bee colony (Guided ABC) algorithms used to produce enough exploitation and exploration strategies respectively. Here, a hybrid of Global ABC and Guided ABC is proposed called Global Guided ABC (GG-ABC) algorithm, for getting balance and robust exploitation and exploration process. The experimental result shows that the GG-ABC performed better than other algorithms for prediction of earthquake hazards.

Habib Shah, Rozaida Ghazali, Yana Mazwin Mohmad Hassim
Hybrid Radial Basis Function with Particle Swarm Optimisation Algorithm for Time Series Prediction Problems

Time Series Prediction (TSP) is to estimate some future value based on current and past data samples. Researches indicated that most of models applied on TSP suffer from a number of shortcomings such as easily trapped into a local optimum, premature convergence, and high computation complexity. In order to tackle these shortcomings, this research proposes a method which is Radial Base Function hybrid with Particle Swarm Optimization algorithm (RBF-PSO). The method is applied on two well-known benchmarks dataset Mackey-Glass Time Series (MGTS) and Competition on Artificial Time Series (CATS) and one real world dataset called the Rainfall dataset. The results revealed that the RBF-PSO yields competitive results in comparison with other methods tested on the same datasets, if not the best for MGTS case. The results also demonstrate that the proposed method is able to produce good prediction accuracy when tested on real world rainfall dataset as well.

Ali Hassan, Salwani Abdullah
Implementation of Modified Cuckoo Search Algorithm on Functional Link Neural Network for Climate Change Prediction via Temperature and Ozone Data

The effect of climate change presents a huge impact on the development of a country. Furthermore, it is one of the causes in determining planning activities for the advancement of a country. Also, this change will have an adverse effect on the environment such as flooding, drought, acid rain and extreme temperature changes. To be able to avert these dangerous and hazardous developments, early predictions regarding changes in temperature and ozone is of utmost importance. Thus, neural network algorithm namely the Multilayer Perceptron (MLP) which applies Back Propagation algorithm (BP) as their supervised learning method, was adopted for use based on its success in predicting various meteorological jobs. Nevertheless, the convergence velocity still faces problem of multi layering of the network architecture. As consequence, this paper proposed a Functional Link Neural Network (FLNN) model which only has a single layer of tunable weight trained with the Modified Cuckoo Search algorithm (MCS) and it is called FLNN-MCS. The FLNN-MCS is used to predict the daily temperatures and ozone. Comprehensive simulation results have been compared with standard MLP and FLNN trained with the BP. Based on the extensive output, FLNN-MCS was proven to be effective compared to other network models by reducing prediction error and fast convergence rate.

Siti Zulaikha Abu Bakar, Rozaida Ghazali, Lokman Hakim Ismail, Tutut Herawan, Ayodele Lasisi
Improving Weighted Fuzzy Decision Tree for Uncertain Data Classification

The analytical data about the rainfall pattern, soil structure of the planting crop will partition data by taking full advantage of the incomplete information to achieve better performance. Ignoring uncertain and vague nature of real world will undoubtedly eliminate substantial information. This paper reports the empirical results that provide high return in planting material breeders in agriculture industry through effective policies of decision making. In order to handle the attribute of incomplete information, several fuzzy modeling approach has been proposed, which support the fuzziness at the attribute level. We describe a novel algorithmic framework for this challenge. We first transform the small throughput data into similarity values. Then, we propagate alternate good data and allow decision tree induction to select the best weight for our entropy-based decision tree induction. As a result, we generalize decision algorithms that provide simpler and more understandable classifier to optimally retrieve the information based on user interaction. The proposed method leads to smaller decision tree and as a consequence better test performance in planting material classification.

Mohd Najib Mohd Salleh
Investigating Rendering Speed and Download Rate of Three-Dimension (3D) Mobile Map Intended for Navigation Aid Using Genetic Algorithm

Prior studies have shown that rendering 3D map dataset in mobile device in a wireless network depends on the download speed. Crucial to that is the mobile device computing resource capabilities. Now it has become possible with a wireless network to render large and detailed 3D map of cities in mobile devices at interactive rates of over 30 frame rate per second (fps). The information in 3D map is generally limited and lack interaction when it’s not rendered at interactive rate; on the other hand, with high download rate 3D map is able to produce a realistic scene for navigation aid. Unfortunately, in most mobile navigation aid that uses a 3D map over a wireless network could not serve the needs of interaction, because it suffers from low rendering speed. This paper investigates the trade-off between rendering speed and download rate of the 3D mobile map using genetic algorithm (GA). The reason of using GA is because it takes larger problem space than other algorithms for optimization, which is well suited for establishing fast 3D map rendering speed on-the-fly to the mobile device that requires useful solutions for optimization. Regardless of mobile device’s computing resources, our finding from GA suggest that download rate and rendering speed are mutually exclusive. Thus, manipulated static aerial photo-realistic images instead of 3D map are well-suited for navigation aid.

Adamu I. Abubakar, Akram Zeki, Haruna Chiroma, Tutut Herawan
Kernel Functions for the Support Vector Machine: Comparing Performances on Crude Oil Price Data

The purpose of this research is to broaden the theoretic understanding of the effects of kernel functions for the support vector machine on crude oil price data. The performances of five (5) kernel functions of the support vector machine were compared. The analysis of variance was used for validating the results and we take additional steps to study the Post Hoc. Findings emanated from the research indicated that the performance of the wave kernel function was statistically significantly better than the radial basis function, polynomial, exponential, and sigmoid kernel functions. Computational efficiency of the wave activation function was poor compared with the other kernel functions in the study. This research could provide a better understanding of the behavior of the kernel functions for support vector machine on the crude oil price dataset. The study has the potentials of triggering interested researchers to propose a novel methodology that can advance crude oil prediction accuracy.

Haruna Chiroma, Sameem Abdulkareem, Adamu I. Abubakar, Tutut Herawan
Modified Tournament Harmony Search for Unconstrained Optimisation Problems

Lately, Harmony Search algorithm (HSA) has attracted the attentions of researchers in operation research and artificial intelligence domain due to its capabilities of solving complex optimization problems in various fields. Different variants of HSA were proposed to overcome its weaknesses such as stagnation at local optima and slow convergence. The limitations of HSA have been mainly addressed in three aspects: studying the effect of HSA parameter settings, hybridizing it with other part of metaheuristic algorithms and the selection schemes that are used in selecting decision variables from harmony memory vectors. This paper focuses on improving the performance of HSA by introducing a new variant of HSA named Modified Tournament Harmony Search (MTHS) algorithm. The MTHS modifies the tournament selection scheme in order to improve the performance and efficiency of the classical HSA. Empirical results demonstrate the effectiveness of the proposed MTHS method and show its significance when compared with three benchmark variants of HSA.

Moh’d Khaled Shambour, Ahamad Tajudin Khader, Ahmed A. Abusnaina, Qusai Shambour
Multi-Objective Particle Swarm Optimization for Optimal Planning of Biodiesel Supply Chain in Malaysia

In this paper we develop a mathematical model for optimal planning of the biofuel supply chain. The model considers the optimal selection of feedstock while minimizing the total cost and social impact over the planning horizon. A multi-objective linear programming model (MOLP) is proposed to find the optimal solution. A multi-objective particle swarm optimization (MOPSO) method is applied to solve the mathematical model and it is compared with non-dominated sorting genetic algorithm (NSGA-II) . The model is used to evaluate the biodiesel production from palm oil and jatropha in Malaysia.

Maryam Valizadeh, S. Syafiie, I. S. Ahamad
Nonlinear Dynamics as a Part of Soft Computing Systems: Novel Approach to Design of Data Mining Systems

In this article we will present the main steps of a new approach to design of Data Mining systems as well as its strengths and limitation. We will discuss how the structure of Soft Computing systems is formed through an incoming data in nonlinear dynamic systems. We will also give an example of the use of a chaotic dynamic system to solve a clustering problem under uncertainty (no a priori information about topology and number of clusters).

Elena N. Benderskaya
Soft Solution of Soft Set Theory for Recommendation in Decision Making

Soft set theory is a new general mathematical method for dealing with uncertain data which proposed by Molodtsov in 1999 had been applied by researchers in decision making problems. However, most existing studies generated exact solution that should be soft solution because the determination of the initial problem only uses values ​​or language approach. This paper shows the use of soft set theory as a generic mathematical tool to describe the objects in the form of information systems and evaluate using multidimensional scaling techniques to find the soft solution and recommendation for making a decision.

R. B. Fajriya Hakim, Eka Novita Sari, Tutut Herawan
Two-Echelon Logistic Model Based on Game Theory with Fuzzy Variable

This paper applies Game Theory Based on Two–Echelon Logistic Models for Competitive behaviors in Logistics developed by Watada

et al

, which proposed the optimal decision method under two-echelon situation for logistic service providers. This study used three types of game theory; Cournot, Collusion, and Stackelberg to gain the optimizing strategies of exporters in each scenario. The aim of this paper is to realize optimal decision-making under competitiveness of these logistics service providers where they perform different game behaviors for achieving optimum solutions. Due to uncertain demand in the real world, fuzzy demands were applied for game theory in the two-echelon logistic model and compared results between fuzzy and non-fuzzy case. Numerical example is presented to clearly illustrate results by using fuzzy case and using crisp number. We obtain higher profits of both a shipper and forwarders when comparing the results yielded by non-fuzzy and fuzzy approaches.

Pei Chun Lin, Arbaiy Nureize
A Hybrid Approach to Modelling the Climate Change Effects on Malaysia’s Oil Palm Yield at the Regional Scale

Understanding the climate change effects on local crops is vital for adapting new cultivation practices and assuring world food security. Given the volume of palm oil produced in Malaysia, climate change effects on oil palm phenology and fruit production have greater implications at both local and international scenes. In this context, the paper looks at analysing the recent climate change effects on oil palm yield within a five year period (2007-2011) at the regional scale. The hybrid approach of data mining techniques (association rules) and statistical analyses (regression) used in this research reveal new insights on the effects of climate change on oil palm yield within this small data set insufficient for conventional analyses on their own.

Subana Shanmuganathan, Ajit Narayanan, Maryati Mohamed, Rosziati Ibrahim, Haron Khalid
A New Algorithm for Incremental Web Page Clustering Based on k-Means and Ant Colony Optimization

Internet serves as source of information. Clustering web pages is needed to identify topics in a page. But dynamism is one of the web clustering challenges, because the web pages change very frequently and new pages are always added and removed. Processing a new page should not require to repeat the whole clustering. For these reasons, incremental algorithms are an appropriate alternative for web page clustering

In this paper we propose a new hybrid technique we call Incremental K Ant Colony Clustering (IKACC). It is based on the Ant Colony Optimization and the k-means algorithms. We adapt this approach to classify the new pages in the online manner, and we compare it to incremental k-means algorithm. The results show that this approach is more efficient and produces better results.

Yasmina Boughachiche, Nadjet Kamel
A Qualitative Evaluation of Random Forest Feature Learning

Feature learning is a hot trend in the machine learning community now. Using a random forest in feature learning is a relatively unexplored area compared to its application in classification and regression. This paper aims to show the characteristics of the features learned by a random forest and its connections with other methods.

Adelina Tang, Joan Tack Foong
A Semantic Content-Based Forum Recommender System Architecture Based on Content-Based Filtering and Latent Semantic Analysis

The rapidly increasing popularity of social computing has encouraged Internet users to interact with online discussion forums to discuss various topics. Online discussion forums have been used as a medium for collaborative learning that supports knowledge sharing and information exchanging between users. One of the serious problems of such environments is high volume of shared data that causes a difficulty for users to locate relevant content to their preferences. In this paper, we propose an architecture of a forum recommender system that recommends relevant post messages to users based on content-based filtering and latent semantic analysis which in turn will increase the dynamism of online forums, help users to discover relevant post messages, and prevent them from redundant post messages as well as bad content post messages.

Naji Ahmad Albatayneh, Khairil Imran Ghauth, Fang-Fang Chua
A Simplified Malaysian Vehicle Plate Number Recognition

This paper propose an automatic inspection system of alphabets and numbers to recognize Malaysian vehicles plate number based on digital image processing and Optical Character Recognition (OCR). An intelligent OCR Training Interface has been used as a library and the system has been developed using LabVIEW Software. This software then is used to test with different situation to ensure the proposed system can be applied for real implementation. Based on the results, the proposed system shows good performance for inspection and can recognize an alphabets and numbers of vehicle plate number. To sum up, the proposed system can recognize the alphabets and numbers of the Malaysian vehicles plate number for inspection.

Abd Kadir Mahamad, Sharifah Saon, Sarah Nurul Oyun Abdul Aziz
Agglomerative Hierarchical Co-clustering Based on Bregman Divergence

Recently, co-clustering algorithms are widely used in heterogeneous information networks mining, and the distance metric is still a challenging problem. Bregman divergence is used to measure the distance in traditional co-clustering algorithms, but the hierarchical structure and the feature of the entity itself are not considered. In this paper, an agglomerative hierarchical co-clustering algorithm based on Bregman divergence is proposed to learn hierarchical structure of multiple entities simultaneously. In the aggregation process, the cost of merging two co-clusters is measured by a monotonic Bregman function, integrating heterogeneous relations and features of entities. The robustness of algorithms based on different divergences is tested on synthetic data sets. Experiments on the DBLP data sets show that our algorithm improves the accuracy over existing co-clustering algorithms.

Guowei Shen, Wu Yang, Wei Wang, Miao Yu, Guozhong Dong
Agreement between Crowdsourced Workers and Expert Assessors in Making Relevance Judgment for System Based IR Evaluation

Creating a gold standard dataset for relevance judgments in IR evaluation is a pricey and time consuming task. Recently, crowdsourcing, a low cost and fast approach, draws a lot of attention in creating relevance judgments. This study investigates the agreement of the relevance judgments, between crowdsourced workers and human assessors (e.g TREC assessors), validating the use of crowdsourcing for creating relevance judgments. The agreement is calculated for both individual and group agreements through percentage agreement and kappa statistics. The results show a high agreement between crowdsourcing and human assessors in group assessment while the individual agreement is not acceptable. In addition, we investigate how the rank ordering of systems change while replacing human assessors’ judgments with crowdsourcing by different evaluation metrics. The conclusion, supported by the results, is that relevance judgments generated through crowdsourcing produces is more reliable systems ranking when it involves measuring of low performing systems.

Parnia Samimi, Sri Devi Ravana
An Effective Location-Based Information Filtering System on Mobile Devices

As mobile devices evolve, research on providing location-based services attract researchers interest. A location-based service recommends information based on users geographical location provided by a mobile device. Mobile devices are engaged with users daily activities and lots of information and services are requested by users, so suggesting the proper information on mobile devices that reflects user preferences becomes more and more difficult. Lots of recent studies have tried to tackle this issue but most of them are not successful because of reasons such as using large datasets or making suggestions based on dynamically collected ratings within different groups instead of focusing on individuals. In this paper, we propose a location based information filtering system that exposes users preferences using Bayesian inferences. A Bayesian network is constructed with conditional probability table while Users characteristics and location data are gathered by using the mobile device. After preprocessing those data, the system integrates that information and uses time to produce the most accurate suggestions. We collected a dataset from 20 restaurants in Malaysia and we gathered behavioral data from two registered users for 7 days. We conducted experiment on the dataset to demonstrate effectiveness of the proposed system and to explain user preferences.

Marzanah A. Jabar, Niloofar Yousefi, Ramin Ahmadi, Mohammad Yaser Shafazand, Fatimah Sidi
An Enhanced Parameter-Free Subsequence Time Series Clustering for High-Variability-Width Data

In time series mining, subsequence time series (STS) clustering has been widely used as a subroutine in various mining tasks, e.g., anomaly detection, classification, or rule discovery. STS clustering’s main objective is to cluster similar underlying subsequences together. Other than the known problem of meaninglessness in the STS clustering results, another challenge is on clustering where the subsequence patterns have variable lengths. General approaches provide a solution only to the problems where the range of width variability is small and under some predefined parameters, which turns out to be impractical for real-world data. Thus, we propose a new algorithm that can handle much larger variability in the pattern widths, while providing the parameter-free characteristic, so that the users would no longer suffer from the difficult task of parameter selection. The Minimum Description Length (MDL) principle and motif discovery technique are adopted to be used in determining the proper widths of the subsequences. The experimental results confirm that our proposed algorithm can effectively handle very large width variability of the time series subsequence patterns by outperforming all other recent STS clustering algorithms.

Navin Madicar, Haemwaan Sivaraks, Sura Rodpongpun, Chotirat Ann Ratanamahatana
An Optimized Classification Approach Based on Genetic Algorithms Principle

In this paper, we address the problem of generating relevant classification rules. Within this framework we are interested in rules of the form

a

1

 ∧ 

a

2

… ∧ 

a

n

b

which allow us to propose a new approach based on the cover set and genetic algorithms principle. This approach allows obtaining frequent and rare rules while avoiding making a breadth search. It is an improvement of

afortiori

approach. Moreover, our proposed algorithm can extract the classifier using a clustering for the attributes which allows to minimize the processing of the classifier building.

Ines Bouzouita
Comparative Performance Analysis of Negative Selection Algorithm with Immune and Classification Algorithms

The ability of Negative Selection Algorithm (NSA) to solve a number of anomaly detection problems has proved to be effective. This paper thus presents an experimental study of negative selection algorithm with some classification algorithms. The purpose is to ascertain their efficiency rates in accurately detecting abnormalities in a system when tested with well-known datasets. Negative selection algorithm with some selected immune and classifier algorithms are used for experimentation and analysis. Three different datasets have been acquired for this task and a comparison performance executed. The empirical results illustrates that the artificial immune system of negative selection algorithm can achieve highest detection and lowest false alarm. Thus, it signifies the suitability and potentiality of NSA for discovering unusual changes in normal behavioral flow.

Ayodele Lasisi, Rozaida Ghazali, Tutut Herawan
Content Based Image Retrieval Using MPEG-7 and Histogram

Rapid development of multimedia technologies made Content Based Image Retrieval (CBIR) an energetic research area for the researchers of multimedia domain. Texture and color features have been the primal descriptors for images in the field of CBIR. This paper proposed a new CBIR system by combining the both color and texture features. Color Layout Descriptor (CLD) from MPEG-7 is used for the color feature extraction while, Mean, variance, skewness, Kurtosis, energy and entropy are used as texture descriptors. Experiments are performed on Coral Database. The results of the proposed method namely CLD-

fos

are compared with the four well reputed systems (i.e. SIMPLIcity, Histogram based, FIRM, and Variance Segment etc) from the industry. The results of the CLD-

fos

demonstrated high accuracy rate than the previous systems during the simulations. The proposed CLD-fos achieved significant performance in terms of accuracy.

Muhammad Imran, Rathiah Hashim, Noor Elaiza Abd Khalid
Cost-Sensitive Bayesian Network Learning Using Sampling

A significant advance in recent years has been the development of cost-sensitive decision tree learners, recognising that real world classification problems need to take account of costs of misclassification and not just focus on accuracy. The literature contains well over 50 cost-sensitive decision tree induction algorithms, each with varying performance profiles. Obtaining good Bayesian networks can be challenging and hence several algorithms have been proposed for learning their structure and parameters from data. However, most of these algorithms focus on learning Bayesian networks that aim to maximise the accuracy of classifications. Hence an obvious question that arises is whether it is possible to develop cost-sensitive Bayesian networks and whether they would perform better than cost-sensitive decision trees for minimising classification cost? This paper explores this question by developing a new Bayesian network learning algorithm based on changing the data distribution to reflect the costs of misclassification.The proposed method is explored by conducting experiments on over 20 data sets. The results show that this approach produces good results in comparison to more complex cost-sensitive decision tree algorithms.

Eman Nashnush, Sunil Vadera
Data Treatment Effects on Classification Accuracies of Bipedal Running and Walking Motions

Many real-world data can be irrelevant, redundant, inconsistent, noisy or incomplete. To extract qualitative data for classification analysis, efficient data preprocessing techniques such as data transformation, data compression, feature extraction and imputation are required. This study investigates three data treatment approaches: randomization; attribute elimination and missing values imputation on bipedal motion data. The effects of data treatment were examined on classification accuracies to retrieve informative attributes. The analysis is performed on bipedal running and walking motions concerning the human and ostrich obtained from public available domain and a real case study. The classification accuracies were tested on seven classifier categories aided by the WEKA tool. The findings show enhancements in classification accuracies for treated dataset in bipedal run and walk with respective enhancements of 3.21% and 2.29% in treated data compared to the original. The findings support the integration of data randomization and selective attribute elimination treatment for better effects in classification analysis.

Wei Ping Loh, Choo Wooi H‘ng
Experimental Analysis of Firefly Algorithms for Divisive Clustering of Web Documents

This paper studies two clustering algorithms that are based on the Firefly Algorithm (FA) which is a recent swarm intelligence approach. We perform experiments utilizing the Newton’s Universal Gravitation Inspired Firefly Algorithm (GFA) and Weight-Based Firefly Algorithm (WFA) on the 20_newsgroups dataset. The analysis is undertaken on two parameters. The first is the alpha (

α

) value in the Firefly algorithms and latter is the threshold value required during clustering process. Results showed that a better performance is demonstrated by Weight-Based Firefly Algorithm compared to Newton’s Universal Gravitation Inspired Firefly Algorithm.

Athraa Jasim Mohammed, Yuhanis Yusof, Husniza Husni
Extended Naïve Bayes for Group Based Classification

This paper focuses on extending Naive Bayes classifier to address group based classification problem. The group based classification problem requires labeling a group of multiple instances given the prior knowledge that all the instances of the group belong to same unknown class. We present three techniques to extend the Naïve Bayes classifier to label a group of homogenous instances. We then evaluate the extended Naïve Bayes classifier on both synthetic and real data sets and demonstrate that the extended classifiers may be a promising approach in applications where the test data can be arranged into homogenous subsets.

Noor Azah Samsudin, Andrew P. Bradley
Improvement of Audio Feature Extraction Techniques in Traditional Indian Musical Instrument

Traditional Indian musical instrument is one of the oldest musical instruments in the world. The musical instruments have their own importance in the field of music. Traditional Indian musical instrument could be categorized into three types such as stringed instruments, percussion instruments and wind-blown instruments. However, this paper will focus on string instruments because its show fluctuating behavior due to noise. Therefore, three techniques are selected based on the frequently used by previous researches which show some shortcoming while extracting noisy signal. The three techniques are Mel-Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding (LPC) and Zero-Crossing Rate (ZCR). Hence, this research attempts to improve the feature extracting techniques by integrating Zero Forcing Equalizer (ZFE) with those extraction techniques. Three classifiers that are k-Nearest Neighbor (kNN), Bayesian Network (BNs) and Support Vector Machine (SVM) are used to evaluate the performance of audio classification accuracy. The proposed technique shows better classification accuracy when dealing with noisy signal.

Kohshelan, Noorhaniza Wahid
Increasing Failure Recovery Probability of Tourism-Related Web Services

The reliability of tourism-related Web services are crucial. Users do not tend to check and use a service again once it is failed. Researches proved that a simple one-to-one replacement of a failed service is not dependable to recover a system from a total failure. In order to increase the probability of recovery of a failure we use a renovation approach to replace a set of services. Broadening the search area among the services in a graph of services enables us to increase the failure recovery probability. The time complexity is also considered and proved to be low at the failure time by transferring the time-consuming calculations to an offline phase prior to the execution time of the service. The approach is evaluated on a set of services including the tourism-related Web services. The probability of recovery substantially increased to more than 54% of the simulated failures.

Hadi Saboohi, Amineh Amini, Tutut Herawan
Mining Critical Least Association Rule from Oral Cancer Dataset

Data mining has attracted many research attentions in the information industry. One of the important and interesting areas in data mining is mining infrequent or least association rule. Typically, least association rule is referred to the infrequent or uncommonness relationship among a set of item (itemset) in database. However, finding this rule is more difficult than frequent rule because they may contain only fewer data and thus require more specific measure. Therefore, in this paper we applied our novel measure called Critical Relative Support (CRS) to mine the critical least association rule from the medical dataset called Oral-Cancer-HUSM-S1. The result shows that CRS can be use to determine the least association rule and thus proven its scalability.

Zailani Abdullah, Fatiha Mohd, Md Yazid Mohd Saman, Mustafa Mat Deris, Tutut Herawan, Abd Razak Hamdan
Music Emotion Classification (MEC): Exploiting Vocal and Instrumental Sound Features

Music conveys and evokes feeling. Many studies that correlate music with emotion have been done as people nowadays often prefer to listen to a certain song that suits their moods or emotion .This project present works on classifying emotion in music by exploiting vocal and instrumental part of a song. The final system is able to use musical features extracted from vocal part and instrumental part of a song, such as spectral centroid, spectral rolloff and zero-cross as to classify whether selected Malay popular music contain “sad” or “happy” emotion. Fuzzy

k

-NN (FKNN) and artificial neural network (ANN) are used in this system as a machine classifier. The percentages of emotion classified in Malay popular songs are expected to be higher when both features are applied.

Mudiana Mokhsin Misron, Nurlaila Rosli, Norehan Abdul Manaf, Hamizan Abdul Halim
Resolving Uncertainty Information Using Case-Based Reasoning Approach in Weight-Loss Participatory Sensing Campaign

Participatory sensing (PS) is an approach to distribute data collection, to analyze and interpret it. Identifying trusted and recommended participants with intention to have quality data to be analyzed is still a challenge because unlike in other domains, participatory sensing participants must fulfill the requirements of service provider, where participants are required to contribute quality data in a longer time frames. Many factors can influence the integrity of the information.One of major concerns in data contribution is the possibility of data truthfulness of being uncertain due to incompleteness, imprecision, vagueness, fragmentary. Consequently, it will cause the information to become unreliable to be analyzed. Detecting the uncertainty information is essentialto value the information. Therefore, the objective of this paper is two-fold. First, we give an overview of uncertainty information and the characteristics that suits participatory sensing system. Second, we outline how Case-based Reasoning approach can be implemented to tackling the uncertainty information in order to distinguish trusted and un-trusted participants.To address both objectives, this paper proposed uncertainty information detection approach based on information relevance using decision tree that integrate Case-based Reasoning, data mining, and information retrieval into our participatory sensing application, w8L0ss

Andita Suci Pratiwi, Syarulnaziah Anawar
Towards a Model-Based Framework for Integrating Usability Evaluation Techniques in Agile Software Model

Various new agile software models were offered though agile manifesto as a counteraction to conventional and extensive software techniques and process design. SE followed a systematic approach of development. Whereas integrating usability in software development improved the ability of software product to be used, learned and be attractive to the users. Research showed the benefit of usability; yet, to this day agile software model continues to exhibit less importance of this quality attribute. Moreover, poor usability and inefficient design were the common reasons in software product failure. The aim of this paper was to develop a model to integrate usability evaluation methods into agile software model. This was done by proposing a unique model and evaluate the proposed model by using IEEE Std 12207-2008, ISO 9241:210.

Saad Masood Butt, Azura Onn, Moaz Masood Butt, Nadra Tabassam
Emulating Pencil Sketches from 2D Images

In this paper we present a pixel-based approach to the production of pencil sketch style images. Input pixels are mapped, using their intensity via a texture-map, to the output sketches. Conceptually, pixels are grouped into regions and the texture obtained from the Texture-map is applied to the output image for a given region. The hatchings and cross-hatchings textures give the resultant images the likeness of pencil sketches. By altering the texture-map applied during the transformation, good results can be obtained, often closely mimicking human sketches. We present details of our approach and give example of sketches. In future work, we wish to enrich the texture-maps so that the texture could better reflect or hint the surface properties of objects in the scene (e.g., hardness, softness, etc.).

Azhan Ahmad, Somnuk Phon-Amnuaisuk, Peter D. Shannon
Router Redundancy with Enhanced VRRP for Intelligent Message Routing

Overlay query routing mechanism is a popular approach for query routing process in the distributed service and resource discovery. However it suffers from drawbacks such as escalated inter-ISP traffic and redundant traffic forwarding in the underlying IP layer. In order to avoid these problems we have proposed earlier that the overlay query routing process could be moved down to the IP layer with the help of intelligent message routing (IMR). The routers in the IP layer build a second routing table by mapping the content of the query messages with the target location of the services which is used for query forwarding. For such a system to be implemented in the Internet scale, high availability of routing service is vital. Employing Virtual Router Redundancy Protocol (VRRP) for redundancy takes care of classical route updates to the backup router. However, the service specific routing table which is specific for underlay query processing needs to be updated independent of VRRP. In this paper, we address the issue of applying router redundancy for IMR with enhanced VRRP which also can handle the redundancy for the service specific routing table. We have also analyzed the performance of routers with respect to the additional overhead taken due to service specific routing.

Haja Mohd Saleem, Mohd Fadzil Hassan, Seyed M. Buhari
Selecting Most Suitable Members for Neural Network Ensemble Rainfall Forecasting Model

Neural network ensembles are more accurate than a single neural network because they have higher generalization ability. To increase the generalization ability the members of the ensemble must be accurate and diverse. This study presents a method for selecting the most suitable members for an ensemble which uses genetic algorithms to minimize the error function of the ensemble ENN-GA. The performance of the proposed method is compared with the performance of two widely used methods, bagging and boosting. The models developed are trained and tested using 41 years rainfall data of Colombo and Katugastota Sri Lanka. The results show that the ENN-GA model is more accurate than Bagging and Boosting models. The best performance for Colombo was obtained by ENN-GA with 14 members with RMSE 7.33 and for Katugastota by ENN-GA with 12 members with RMSE 6.25.

Harshani Nagahamulla, Uditha Ratnayake, Asanga Ratnaweera
Simulating Basic Cell Processes with an Artificial Chemistry System

When we simulate life, there are always more things to simulate than what have been coded. Life is complex and seems to have endless possibilities. Using artificial chemistry as a starting point to simulate life is a promising way to limit the possibilities because chemical reactions are always the same under the same physical conditions. We have built an 3D artificial chemistry system simulating molecules and the reactions among them. Our goal is to simulate a cell or a group of cells in the future using mainly molecules and chemical reactions. In this paper, we show that the system can simulate the fundamental aspects of reproduction, metabolism and adaptation in cells. This is accomplished by simulating reproducing molecules, reactions which provide energy to the reproducing molecules and the adaptation ability of reproducing molecules.

Chien-Le Goh, Hong Tat Ewe, Yong Kheng Goh
The Effectiveness of Sampling Methods for the Imbalanced Network Intrusion Detection Data Set

One of the countermeasures taken by security experts against network attacks is by implementing Intrusion Detection Systems (IDS) in computer networks. Researchers often utilize the

de facto

network intrusion detection data set, KDD Cup 1999, to evaluate proposed IDS in the context of data mining. However, the imbalanced class distribution of the data set leads to a rare class problem. The problem causes low detection (classification) rates for the rare classes, particularly R2L and U2R. Two commonly used sampling methods to mitigate the rare class problem were evaluated in this research, namely, (1) under-sampling and (2) over-sampling. However, these two methods were less effective in mitigating the problem. The reasons of such performance are presented in this paper.

Kok-Chin Khor, Choo-Yee Ting, Somnuk Phon-Amnuaisuk
A Clustering Based Technique for Large Scale Prioritization during Requirements Elicitation

We consider the prioritization problem in cases where the number of requirements to prioritize is large using a clustering technique. Clustering is a method used to find classes of data elements with respect to their attributes. K-Means, one of the most popular clustering algorithms, was adopted in this research. To utilize k-means algorithm for solving requirements prioritization problems, weights of attributes of requirement sets from relevant project stakeholders are required as input parameters. This paper showed that, the output of running k-means algorithm on requirement sets varies depending on the weights provided by relevant stakeholders. The proposed approach was validated using a requirement dataset known as RALIC. The results suggested that, a synthetic method with scrambled centroids is effective for prioritizing requirements using k-means clustering.

Philip Achimugu, Ali Selamat, Roliana Ibrahim
A Comparative Evaluation of State-of-the-Art Cloud Migration Optimization Approaches

Cloud computing has become more attractive for consumers to migrate their applications to the cloud environment. However, because of huge cloud environments, application customers and providers face the problem of how to assess and make decisions to choose appropriate service providers for migrating their applications to the cloud. Many approaches have investigated how to address this problem. In this paper we classify these approaches into non-evolutionary cloud migration optimization approaches and evolutionary cloud migration optimization approaches. Criteria including cost, QoS, elasticity and degree of migration optimization have been used to compare the approaches. Analysis of the results of comparative evaluations shows that a Multi-Objectives optimization approach provides a better solution to support decision making to migrate an application to the cloud environment based on the significant proposed criteria. The classification of the investigated approaches will help practitioners and researchers to deliver and build solid approaches.

Abdelzahir Abdelmaboud, Dayang N. A. Jawawi, Imran Ghani, Abubakar Elsafi
A Review of Intelligent Methods for Pre-fetching in Cloud Computing Environment

Innovation of technology in our world has expanded drastically. People like to use the applications that can easier their work in everyday’s life. They have many data to be store and today the people like to store their data in cloud computing storage because the can access the data everywhere at anytime. Besides, most of users of smart phone access their data from storage that are outside from their mobile phone. This trend called as Mobile Cloud Computing and it changes the way users use the computer and the Internet. Even though, by increasing the number of users accesses the storage, it slows down the performance service of cloud computing. Due to these issues, the current researchers have applied a pre-fetching method as one of the method to improve on performance services. However, there are some limitations on pre-fetching method, which is the overhead that is cause by overaggressive pre-fetching. Therefore, in this paper a review on the ideas of enhancing the accessibility of Cloud Computing is explore by using the intelligent methods to improve the current pre-fetching method.

Nur Syahela Hussien, Sarina Sulaiman, Siti Mariyam Shamsuddin
Enhanced Rules Application Order Approach to Stem Reduplication Words in Malay Texts

Word stemming algorithm is a natural language morphogical process of reducing derived words to their respective root words. Due to the importance of word stemming algorithm, many Malay word stemming algorithms have been developed in the past years. However, previous researchers only focused on improving affixation word stemming with various stemming approaches. There is no reduplication word stemming has been developed for Malay language thus far. In Malay language, affixation and reduplication are derived words in which have their own morphological rules. Therefore, the use of affixation word stemming to stem reduplication words is considered inappropriate. Hence this paper presents the proposed reduplication word stemming algorithm to stem full, rhythmic and partial reduplication words to their respective root words. This proposed stemming algorithm uses Rules Application Order with Stemming Errors Reducer to stem these reduplication words. Malay online newspaper articles have been used to evaluate this proposed stemming algorithm. The experimental results showed that the proposed stemming algorithm able to stem full, rhythmic, affixed and partial reduplication with better stemming accuracy. Hence, the future improvement of Malay word stemming algorithm should include affixation and reduplication word stemming.

M. N. Kassim, Mohd Aizaini Maarof, Anazida Zainal
Islamic Web Content Filtering and Categorization on Deviant Teaching

Currently, process for blocking the deviant teaching website is done manually by Malaysia authorities. In addition there are no Web filtering product offered to filter religion content and especially for Malay language. Web filtering can be used as protection against inappropriate and prevention of misuse of the network and hence, it can be used to filter the content of suspicious websites and alleviate the dissemination of such Web page. The purpose of the paper is to filter the deviant teachings Web page and classify them into three categories which are deviate, suspicious and clean. There are three Term Weighting Scheme techniques were used as feature selection included Term Frequency Inverse Document Frequency (TFIDF), Entropy and Modified Entropy. Support Vector Machine (SVM) will be used for classification process. As a result, M. Entropy shows the most suitable term weighting scheme to use in Islamic web pages filtering rather than TFIDF and Entropy.

Nurfazrina Mohd Zamry, Mohd Aizaini Maarof, Anazida Zainal
Multiobjective Differential Evolutionary Neural Network for Multi Class Pattern Classification

In this paper, a Differential Evolution (DE) algorithm for solving multiobjective optimization problems to solve the problem of tuning Artificial Neural Network (ANN) parameters is presented. The multiobjective evolutionary used in this study is a Differential Evolution algorithm while ANN used is Three-Term Backpropagation network (TBP). The proposed algorithm, named (MODETBP) utilizes the advantages of multi objective differential evolution to design the network architecture in order to find an appropriate number of hidden nodes in the hidden layer along with the network error rate. For performance evaluation, indicators, such as accuracy, sensitivity, specificity and 10-fold cross validation are used to evaluate the outcome of the proposed method. The results show that our proposed method is viable in multi class pattern classification problems when compared with TBP Network Based on Elitist Multiobjective Genetic Algorithm (MOGATBP) and some other methods found in literature. In addition, the empirical analysis of the numerical results shows the efficiency of the proposed algorithm.

Ashraf Osman Ibrahim, Siti Mariyam Shamsuddin, Sultan Noman Qasem
Ontology Development to Handle Semantic Relationship between Moodle E-learning and Question Bank System

Distributed and various systems on learning environment produce heterogeneity data in data level implementation. Heterogeneity data on learning environment is about different data representation between learning system. This problem makes the integration problem increasingly complex. Semantic relationship is a very interesting issue in learning environment case study. Difference data representation on each data source makes numerous systems difficult to communicated and integrated with the others. Many researchers found that the semantic technology is the best way to resolve the heterogeneity data representation issues. Semantic technology can handle heterogeneity of data, data with different representations in different data sources. Semantic technology also can do data mapping from different database and different data format that have same meaning data. This paper focuses on semantic data mapping to handle the semantic relationship on heterogeneity data representation using semantic ontology approach. In the first level process, using D2RQ engine to produce turtle (.ttl) file format that can be used for Local Java Application using Jena Library and Triple Store. In the second level process we develop ontology knowledge using protégé tools to handle semantic relationship. In this paper, produce ontology knowledge to handle a semantic relationship between Moodle E-learning system and Question Bank system.

Arda Yunianta, Norazah Yusof, Herlina Jayadianti, Mohd Shahizan Othman, Shaffika Suhaimi
Backmatter
Metadaten
Titel
Recent Advances on Soft Computing and Data Mining
herausgegeben von
Tutut Herawan
Rozaida Ghazali
Mustafa Mat Deris
Copyright-Jahr
2014
Electronic ISBN
978-3-319-07692-8
Print ISBN
978-3-319-07691-1
DOI
https://doi.org/10.1007/978-3-319-07692-8

Premium Partner