Skip to main content

2013 | Buch

Intelligent Information and Database Systems

5th Asian Conference, ACIIDS 2013, Kuala Lumpur, Malaysia, March 18-20, 2013, Proceedings, Part II

herausgegeben von: Ali Selamat, Ngoc Thanh Nguyen, Habibollah Haron

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The two-volume set LNAI 7802 and LNAI 7803 constitutes the refereed proceedings of the 5th Asian Conference on Intelligent Information and Database Systems, ACIIDS 2013, held in Kuala Lumpur, Malaysia in March 2013.

The 108 revised papers presented were carefully reviewed and selected from numerous submissions. The papers included are grouped into topical sections on: innovations in intelligent computation and applications; intelligent database systems; intelligent information systems; tools and applications; intelligent recommender systems; multiple modal approach to machine learning; engineering knowledge and semantic systems; computational biology and bioinformatics; computational intelligence; modeling and optimization techniques in information systems, database systems and industrial systems; intelligent supply chains; applied data mining for semantic Web; semantic Web and ontology; integration of information systems; and conceptual modeling in advanced database systems.

Inhaltsverzeichnis

Frontmatter

Tools and Applications

Detection of Noise in Digital Images by Using the Averaging Filter Name COV

One of the significant problems in digital signal processing is the filtering and reduction of undesired interference. Due to the abundance of methods and algorithms for processing signals characterized by complexity and effectiveness of removing noise from a signal, depending on the character and level of noise, it is difficult to choose the most effective method. So long as there is specific knowledge or grounds for certain assumptions as to the nature and form of the noise, it is possible to select the appropriate filtering method so as to ensure optimum quality. This chapter describes several methods for estimating the level of noise and presents a new method based on the properties of the smoothing filter.

Janusz Pawel Kowalski, Jakub Peksinski, Grzegorz Mikolajczak
k-Means Clustering on Pre-calculated Distance-Based Nearest Neighbor Search for Image Search

Content-based image retrieval (CBIR) would be an important future trend in search engines. This paper proposed a nearest neighbor search (NNS) method that uses

k

-means clustering and pre-calculated distances on a known set of image samples to be used for performing image queries within the set. The proposed algorithm adds a clustering step prior to the rest on an existing algorithm and uses the nearest clusters only for the NNS. The distance between the query images to the cluster is determined by using twice the standard deviation for the clusters to estimate the boundary of each cluster. The feature used is grey-level co-occurrence matrices (GLCM). This reduces both the samples explored by 25.21% and execution time by 26.62% for 16 chosen clusters within 23 clusters and a search radius of 0.2. The experimental results had shown an improvement in time complexity but on the same time sacrifices the hit rate that had dropped from 100% in the previous method that explores all potential samples but the proposed method only manage to achieve 70.77%.

Jing Yi Tou, Chun Yee Yong
A New Approach for Collaborative Filtering Based on Mining Frequent Itemsets

As one of the most successful approaches to building recommender systems, collaborative filtering (

CF

) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first propose a new CF model-based approach which has been implemented by basing on mining frequent itemsets technique with the assumption that “

The larger the support of an item is, the higher it’s likely that this item will occur in some frequent itemset, is

”. We then present the enhanced techniques such as the followings: bits representations, bits matching as well bits mining in order to speeding-up the algorithm processing with CF method.

Phung Do, Vu Thanh Nguyen, Tran Nam Dung
Reduction of Training Noises for Text Classifiers

Automatic text classification (TC) is essential for the archiving and retrieval of texts, which are main ways of recording information and expertise. Previous studies thus have developed many text classifiers. They often employed training texts to build the classifiers, and showed that the classifiers had good performance in various application domains. However, as the training texts are often inevitably unsound or incomplete in practice, they often contain many terms not related to the categories of interest. Such terms are actually

training noises

in classifier training, and hence can deteriorate the performance of the classifiers. Reduction of the training noises is thus essential. It is also quite challenging as training texts are unsound or incomplete. In this paper, we develop a technique TNR (

T

raining

N

oise

R

eduction) to remove the possible training noises so that the performance of the classifiers can be further improved. Given a training text

d

of a category

c

, TNR identifies a sequence of consecutive terms (in

d

) as the noises if the terms are not strongly related to

c

. A case study on the classification of Chinese texts of disease information shows that TNR can improve a Support Vector Machine (SVM) classifier, which is a state-of-the-art classifier in TC. The contribution is of significance to the further enhancement of existing text classifiers.

Rey-Long Liu
Prediction of Relevance between Requests and Web Services Using ANN and LR Models

An approach of Web service matching is proposed in this paper. It adopts semantic similarity measuring techniques to calculate the matching level between a pair of service descriptions. Their similarity is then specified by a numeric value. Determining a threshold for this value is a challenge in all similar matching approaches. To address this challenge, we propose the use of classification methods to predict the relevance of requests and Web services. In recent years, outcome prediction models using Logistic Regression and Artificial Neural Network have been developed in many research areas. We compare the performance of these methods on the OWLS-TC v3 service library. The classification accuracy is used to measure the performance of the methods. The experimental results show the efficiency of both methods in predicting the new cases. However, Artificial Neural Network with sensitivity analysis model outperforms Logistic Regression method.

Keyvan Mohebbi, Suhaimi Ibrahim, Norbik Bashah Idris
A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

The Malay language is an Austronesian language spoken in most countries in the South East Asia region that includes Malaysia, Indonesia, Singapore, Brunei and Thailand. Traditional linguistics is well developed for Malay but there are very limited resources and tools that are available or made accessible for computer linguistic analysis of Malay language. Assigning part of speech (POS) to running words in a sentence for Malay language is one of the pipeline processes in Natural Language Processing (NLP) tasks and it is not well investigated. This paper outlines an approach to perform the Part of Speech (POS) tagging for Malay text articles. We apply a simple Rule-based Part of Speech (RPOS) tagger to perform the tagging operation on Malay text articles. POS tagging can be described as a task of performing automatic annotation of syntactic categories for each word in a text document. A rule-based POS tagger generally involves a POS tag dictionary and a set of rules in order to identify the words that are considered parts of speech. In this paper, we propose a framework that applies Malay affixing rules to identify the Malay POS tag and the relation between words in order to select the best POS tag for words that have two or more valid POS tags. The results show that the performance accuracy of the ruled-based POS tagger is higher compared to a statistical POS tagger. This indicates that the proposed RPOS tagger is able to predict any unknown word’s POS at some promising accuracy.

Rayner Alfred, Adam Mujat, Joe Henry Obit
Viable System Model in Capturing Iterative Features within Architectural Design Processes

In project management, iteration can be seen as an undesirable characteristic that increases risk and lengthen the cycle time. However, in design management, iteration is the key feature in designing. Iteration can also be manifest as different types which give particular characteristics to different stages of the design process. However, there are no existing methods that could capture and model the iterative activities of designers and support the analysis and design of designers’ process management systems. The design structure matrix is one method that has the capability to capture iterative activities. However, this method does not seem suitable to support the development of a design process management system as it does not highlight the functional features within iterative activities. Therefore, the aim of this paper is to discuss the potential use of viable system model in capturing functional features and requirements within iterative activities of the architectural design process. This paper also highlights an example of a previous study which adapted viable system model in the diagnosis of complex processes.

Roliana Ibrahim, Khairul Anwar Mohamed Khaidzir, Fahimeh Zaeri
Identifying Same Wavelength Groups from Twitter: A Sentiment Based Approach

Social scientists have identified several network relationships and dimensions that induce homophily. Sentiments or opinions towards different issues have been observed as a key dimension which characterizes human behavior. Twitter is an online social medium where rapid communication takes place publicly. People usually express their sentiments towards various issues. Different persons from different walks of social life may share same opinion towards various issues. When these persons constitute a group, such groups can be conveniently termed same wavelength groups. We propose a novel framework based on sentiments to identify such same wavelength groups from twitter domain. The analysis of such groups would be of help in unraveling their response patterns and behavioral features.

Rafeeque Pandara, Selvaraju Sendhilkumar
An Improved Evolutionary Algorithm for Extractive Text Summarization

The main challenge of extractive-base text summarization is in selecting the top representative sentences from the input document. Several techniques were proposed to enhance the process of selection such as feature-base, cluster-base, and graph-base methods. Basically, this paper proposed to enhance a previous work, and provides some limitations in the similarity calculation of that previous work. This paper proposes an enhanced mixed feature-base and cluster-base approaches to produce a high qualified single-document summary. We used the Jaccard similarity measure to adjust the sentence clustering process instead of using the Normalized Google Distance (NGD) similarity measure. In addition, this paper proposes a new real-to-integer values modulator instead of using the genetic mutation operator which was adopted in the previous work. The Differential Evolution (DE) algorithm is used for train and test the proposed methods. The DUC2002 dataset was preprocessed and used as a test bed. The results show that our proposed differential mutant presented a satisfied performance while the Genetic mutant proved to be the better. In addition, our analysis of NGD similarity scores showed that NGD was an inappropriate selection in the previous study as it performs successfully in a very big database such as Google. Our selection of Jaccard measure was fortunate and obtained superior results surpassed the NGD using the new proposed modulator and the genetic operator. In addition, both algorithms outperformed the standard baseline Microsoft Word Summarizer and Copernic methods.

Albaraa Abuobieda, Naomie Salim, Yogan Jaya Kumar, Ahmed Hamza Osman
Hybrid-Learning Based Data Gathering in Wireless Sensor Networks

Prediction based data gathering or estimation is a very frequent phenomenon in wireless sensor networks (WSNs). Learning and model update is in the heart of prediction based data gathering. A majority of the existing prediction based data gathering approaches consider centralized and some others use localized and distributed learning and model updates. Our conjecture in this work is that no single learning approach may not be optimal for all the sensors within a WSN, especially in large scale WSNs. For, example for source nodes, which are very close to sink, centralized learning could be better compared to distributed one and vice versa for the further nodes. In this work, we explore the scope of possible hybrid (centralized and distributed) learning scheme for prediction based data gathering in WSNs. Numerical experimentations with two sensor datasets and their results of the proposed scheme, show the potential of hybrid approach.

Mohammad Abdur Razzaque, Ismail Fauzi, Akhtaruzzaman Adnan

Intelligent Recommender Systems

Orienteering Problem Modeling for Electric Vehicle-Based Tour

This paper presents the design and analyzes the performance of a tour planner for electric vehicles, aiming at overcoming their long charging time by computational intelligence. This service basically finds the maximal subset out of the whole user-selected tour spots and their visiting sequence not inducing waiting time for battery charging. For the schedule search belonging to the orienteering problem category, genetic algorithms are employed. It includes encoding a visiting sequence based on omission probability, defining a fitness function to count the number of visitable destinations, and tailoring genetic operators. For constraint processing, the waiting time estimator prohibits those schedules having non-permissible waiting time to be included in the population. The performance measurement result obtained from a prototype implementation discovers that the proposed service can include 95 % of selected spots in the final schedule on the typical tour scenario for the given inter-destination and stay time distribution.

Junghoon Lee, Gyung-Leen Park
Integrating Social Information into Collaborative Filtering for Celebrities Recommendation

With the exponential growth of users’ population and volumes of content in micro-blog web sites, people suffer from information overload problem more and more seriously. Recommendation system is an effective way to address this issue. In this paper, we studied celebrities recommendation in micro-blog services to better guide users to follow celebrities according to their interests. First we improved the jaccard similarity measure by significant weighting to enhance neighbor selection in collaborative filtering. Second, we integrated users’ social information into the similarity model to ease the cold start problem. Third we increased the density of the rating matrix by predicting the missing ratings to ease the data sparsity problem. Experiment results show that our algorithm improves the recommendation quality significantly.

Qingwen Liu, Yan Xiong, Wenchao Huang
A Semantically Enhanced Tag-Based Music Recommendation Using Emotion Ontology

In this paper, we propose a semantically enhanced tag-based approach to music recommendation. While most of approaches to tag-based recommendation are based on tag frequency, our approach is based on semantics of tags. In order to extract semantics of tags, we developed the emotion ontology for music called UniEmotion, which categorizes tags into positive emotional tags, negative emotional tags, and factual tags. According to the types of the tags, their weights are calculated and assigned to them. After then, user profiles using the weighted tags were generated and a user-based collaborative filtering algorithm was executed. To evaluate our approach, a data set of 1,100 users, tags which they added, and artists which they listened to was collected from last.fm. The conventional track-based recommendation, the unweighted tag-based recommendation, and the weighted tag-based recommendation are compared in terms of precision. Our experimental results show that the weighted tag-based recommendation outperforms other two approaches in terms of precision.

Hyon Hee Kim
A Method for Determination of an Opening Learning Scenario in Intelligent Tutoring Systems

The intelligent tutoring systems should guarantee an effective learning. Students who use those systems should achieve better learning results in a shorter time. Our previous research pointed out that the personalization of the learning scenario allows to satisfy the mentioned postulates. In this paper the method for determination of an opening learning scenario is presented. Before a student begins to learn an opening scenario is determined based on information provided during a registration process. User is offered the optimal learning path suitable for his learning styles and a current knowledge level. Worked out method applied the ant colony optimization technique. The effectiveness of the proposed solution was tested in a specially implemented environment. The researches demonstrate that the algorithm gives quite good results, because 66% of the learning material in the determined learning scenario were adapted to student’s learning styles.

Adrianna Kozierkiewicz-Hetmańska, Dariusz Zyśk
Recommending QA Documents for Communities of Question-Answering Websites

Question & Answering (Q&A) websites have become an essential knowledge-sharing platform. This platform provides knowledge-community services where users with common interests or expertise can form a knowledge community to collect and share QA documents. However, due to the massive amount of QAs, information overload can become a major problem. Consequently, a recommendation mechanism is needed to recommend QAs for communities of Q&A websites. Existing studies did not investigate the recommendation mechanisms for knowledge collections in communities of Q&A Websites. In this work, we propose a novel recommendation method to recommend related QAs for communities of Q&A websites. The proposed method recommends QAs by considering the community members’ reputations, the push scores and collection time of QAs, the complementary relationships between QAs and their relevance to the communities. Experimental results show that the proposed method outperforms other conventional methods, providing a more effective manner to recommend QA documents to knowledge communities.

Duen-Ren Liu, Chun-Kai Huang, Yu-Hsuan Chen
Using Subtree Agreement for Complex Tree Integration Tasks

Hierarchical structures are common in modern applications. Tree integration is one of the tools for them that is not fully researched. We define a complex tree to model other common hierarchical structures. Complex tree integration is parametrized by specific integration criteria. Sub-tree agreement is a group of criteria that describes the relation of sub-tree number and structure between input trees and the integrated tree. This paper provides several definitions of sub-tree agreement, the most important properties of these criteria, and examples of algorithms based on sub-tree agreement.

Marcin Maleszka, Ngoc Thanh Nguyen
Data Sets for Offline Evaluation of Scholar’s Recommender System

In an offline evaluation of recommender systems, data sets have been extensively used to measure the performance of recommender systems through statistical analysis. However, many data sets are domain and application dependent and cannot be engaged in different domains. This paper presents the construction of data sets for the offline evaluation of a scholar’s recommender system that suggests papers to scholars based on their background knowledge. We design a cross-validation approach to reduce the risk of false interpretations by relying on multiple independent sources of information. Our approach addresses four important issues including the privacy and diversity of knowledge resources, the quality of knowledge, and the timely knowledge. The resulting data sets represent the instance of scholar’s background knowledge in clusters of learning themes, which can be used to measure the performance of the scholar’s recommender system.

Bahram Amini, Roliana Ibrahim, Mohd Shahizan Othman
A Method for Collaborative Recommendation in Document Retrieval Systems

The most common problem in the context of recommendation systems is “cold start” problem which occurs when new product is recommended or a new user becomes to the system. A great part of systems do not personalize a user until they gather sufficient information. In this paper a novel method for recommending a profile for a new user based only on knowledge about a few demographic data is proposed. The method merges a content-based approach with collaborative recommendation. The main objective was to show that based on knowledge about other similar users, the system can classify a new user based on subset of demographic data and recommend him a non-empty profile. Using the proposed profile, the user will obtain personalized documents. A methodology of experimental evaluation was presented and simulations were performed. The preliminary experiments have shown that the most important demographic attributes are gender, age, favorite browser and level of education.

Bernadetta Mianowska, Ngoc Thanh Nguyen

Multiple Model Approach to Machine Learning

Combining Multiple Clusterings of Chemical Structures Using Cumulative Voting-Based Aggregation Algorithm

The use of consensus clustering methods in chemoinformatics is motivated because of the success of consensus scoring (data fusion) in virtual screening and also because of the ability of consensus clustering to improve the robustness, novelty, consistency and stability of individual clusterings in other areas. In this paper, Cumulative Voting-based Aggregation Algorithm (CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the extent to which they clustered compounds, which belong to the same activity class, together. Then, the results were compared to other consensus clustering and Ward’s methods. The MDL Drug Data Report (MDDR) database was used for experiments and the results were obtained by combining multiple clusterings that were applied using different distance measures. The experiments show that the voting-based consensus method can efficiently improve the effectiveness of chemical structures clusterings.

Faisal Saeed, Naomie Salim, Ammar Abdo, Hamza Hentabli
Investigation of Incremental Support Vector Regression Applied to Real Estate Appraisal

Incremental support vector regression algorithms (SVR) and sequential minimal optimization algorithms (SMO) for regression were implemented. Intensive experiments to compare predictive accuracy of the algorithms with different kernel functions over several datasets taken from a cadastral system were conducted in offline scenario. The statistical analysis of experimental output was made employing the nonparametric methodology designed especially for multiple

N×N

comparisons of

N

algorithms over

N

datasets including Friedman tests followed by Nemenyi’s, Holm’s, Shaffer’s, and Bergmann-Hommel’s post-hoc procedures. The results of experiments showed that most of SVR algorithms outperformed significantly a pairwise comparison method used by the experts to estimate the values of residential premises over all datasets. Moreover, no statistically significant differences between incremental SVR and non-incremental SMO algorithms were observed using our stationary cadastral datasets. The results open the opportunity of further research into the application of incremental SVR algorithms to predict from a data stream of real estate sales/purchase transactions.

Tadeusz Lasota, Petru Patrascu, Bogdan Trawiński, Zbigniew Telec
A Descriptive Method for Generating siRNA Design Rules

Short-interfering RNAs (siRNAs) suppress gene expression through a process called RNA interference (RNAi). Current research focuses on finding design principles or rules for siRNAs and using them to artificially generate siRNAs with high efficiency of gene knockdown ability. Design rules have been reported by analyzing biology experiments and applying learning methods. However, possible good design rules or hidden characteristics remain undetected. In contribution to computational methods for finding design rules which are mostly employed by discriminative learning techniques, in this paper we propose a novel descriptive method to discover two design rules for effective siRNA sequences with 19 nucleotides (nt) and 21 nt in length that have important characteristics of previous design rules and contain new characteristics of highly effective siRNA. The key idea of the method is first to transform siRNAs to transactions then apply an Apriori adaptation with automatic

min_support

values to detect descriptive rules for effective and ineffective siRNAs. Rational design rules are created by analyzing graphical representations of descriptive rules. Experimental evaluation on the two siRNA data sets including 5737 siRNA sequences shown that our design rules are promising to design siRNAs effectively.

Bui Thang Ngoc, Tu Bao Ho, Kawasaki Saori
A Space-Time Trade Off for FUFP-trees Maintenance

In the past, Hong

et al.

proposed an algorithm to maintain the fast updated frequent pattern tree (FUFP-tree), which was an efficient data structure for association-rule mining. However in the maintenance process, the counts of infrequent items and the IDs of transactions with those items were determined by rescanning all the transactions in the original database. This step might be quite time-consuming depending on the number of transactions in the original database and the number of rescanned items. This study improves that approach by storing 1-items during the maintenance process and based on the properties of FUFP-trees, such that the rescanned items and inserted items are processed more efficiently to reduce execution time. Experimental results show that the improved algorithm needs some more memory to store infrequent 1-items but the performance is better than the original one.

Bac Le, Chanh-Truc Tran, Tzung-Pei Hong, Bay Vo
Adaptive Splitting and Selection Method for Noninvasive Recognition of Liver Fibrosis Stage

Therapy of patients suffer form liver diseases strongly depends on the liver fibrosis progression. Unfortunately, to asses it the liver biopsy has been usually used which is an invasive and raging medical procedure which could lead to serious health complications. Additionally even when experienced medical experts perform liver biopsy and read the findings, up to a 20% error rate in liver fibrosis staging has been reported. Nowadays a few noninvasive commercial tests based on the blood examinations are available for the mentioned above problem. Unfortunately they are quite expensive and usually they are not refundable by the health insurance in Poland. Thus, the cross-disciplinary team, which includes researches form the Polish medical and technical universities has started work on new noninvasive method of liver fibrosis stage classification. This paper presents a starting point of the project where several traditional classification methods are compared with the originally developed classifier ensembles based on local specialization of the classifiers in given feature space partitions. The experiment was carried out on the basis of originally acquired database about patients with the different stages of liver fibrosis. The preliminary results are very promising, because they confirmed the possibility of outperforming the noninvasive commercial tests.

Bartosz Krawczyk, Michał Woźniak, Tomasz Orczyk, Piotr Porwik
Investigation of Mixture of Experts Applied to Residential Premises Valuation

Several experiments were conducted in order to investigate the usefulness of mixture of experts approach to an online internet system assisting in real estate appraisal. All experiments were performed using real-world datasets taken from a cadastral system. The analysis of the results was performed using statistical methodology including nonparametric tests followed by post-hoc procedures designed especially for multiple

1×N

and

N×N

comparisons. The mixture of experts architectures studied in the paper comprised: four algorithms used as expert networks:

glm

– general linear model,

mlp

– multilayer perceptron and two support vector regression

ε-SVR

and

ν-SVR

as well as and three algorithms

glm, mlp

, and

gmm

– gaussian mixture model employed as gating networks.

Tadeusz Lasota, Bartosz Londzin, Bogdan Trawiński, Zbigniew Telec
Competence Region Modelling in Relational Classification

Relational classification is a promising branch of machine learning techniques for classification in networked environments which does not fulfil the iid assumption (independent and identically distributed). During the past few years, researchers have proposed many relational classification methods. However, almost none of them was able to work efficiently with large amounts of data or sparsely labelled networks. It is introduced in this paper a new approach to relational classification based on competence region modelling. The approach aims at solving large relational data classification problems, as well as seems to be a reasonable solution for classification of sparsely labelled networks by decomposing the initial problem to subproblems (competence regions) and solve them independently. According to preliminary results obtained from experiments performed on real world datasets competence region modelling approach to relational classification results with more accurate classification than standard approach.

Tomasz Kajdanowicz, Tomasz Filipowski, Przemysław Kazienko, Piotr Bródka

Engineering Knowledge and Semantic Systems

Approach to Practical Ontology Design for Supporting COTS Component Selection Processes

The COTS (Commercial Off-The-Shelf components) selection process is difficult due to the huge number of existing COTS components. Moreover, the price of a mistake is great due to the complex nature of information systems. In this paper an analysis of different COTS component selection methodologies is presented. Based on this, the ontology for supporting COTS component selection processes is proposed. In order to achieve a high level of practicality on different levels of decision making, the ontology is implemented in Protégé software.

Agnieszka Konys, Jarosław Wątróbski, Przemysław Różewski
Planning of Relocation Staff Operations in Electric Vehicle Sharing Systems

This paper designs a computerized operation planner for relocation staffs in electric vehicle sharing systems, in which uneven vehicle distribution can lead to severe service quality degradation. After relocation pairs are created based on the target vehicle distribution and vehicle-to-station matching, our scheme finds an operation sequence for a relocation team. To overcome the time complexity of the ordering problem, a genetic algorithm is developed. It encodes a relocation schedule based on numbering of relocation pairs, defines a fitness function accounting for the inter-relocation move, and finally tailors genetic operators. The performance measurement result obtained from a prototype implementation shows that the proposed scheme finds an efficient schedule having a converged fitness value with just small-size population. The difference in relocation distance does not go beyond 24.8 % even in the case of extremely unbalanced distribution for the given parameters.

Junghoon Lee, Gyung-Leen Park
Thematic Analysis by Discovering Diffusion Patterns in Social Media: An Exploratory Study with TweetScope

The goal of this work is to capture diffusion patterns in social media, and to understand meaningful associations between the diffusion patterns and thematic features of the corresponding information. To do so, we have developed a Twitter-based diffusion monitoring system (called TweetScope) to efficiently collect the datasets from Twitter and conduct the proposed discovery process. Particularly, we expect that this work is feasible on establishing business strategies of various organizations.

Duc Nguyen Trung, Jason J. Jung, Namhee Lee, Jinhwa Kim
A Practical Method for Compatibility Evaluation of Portable Document Formats

This paper presents a method for verification of PDF documents for compatibility with publication models provided by scientific publishers. We first consider the problem of converting a document from PDF to XML format. Subsequently, we present an analysis of the document’s graphical layout which operates in two phases. The first phase develops a model using a semi-automatic process with limited user interaction. This is followed by comparing and matching of submitted documents. The experimental results demonstrate the degree of document compatibility with the model along with a report of errors and warning messages.

Dariusz Król, Michał Łopatka
Sentiment Analysis for Tracking Breaking Events: A Case Study on Twitter

Social media such as Twitter and Facebook can be considered as a new media different from the typical media group. The information on social media spread much faster than any other traditional news media due to the fact that people can upload information with no constrain to time or location. People also express their emotional status to let others know what they feel about information. For this reason many studies have been testing social media data to uncover hidden information under textual sentences. Analyzing social media is not simple due to the huge volume and variety of data. Many researches dealt with limited domain area to overcome the size issue. This study focuses on how the flow of sentiments and frequency of tweets are changed from November to December in 2009. We analyzed 110 million tweets collected by Stanford University and LIWC (Linguistic Inquiry Word Count) for sentiment analysis. We did find that people were not happy in afternoon but they were happy in night time as many psychologists suggested before. After analyzing large volume of tweets, we were also able to find the precise day when breaking events occurred. This study offer diverse evidence to prove that Twitter has valuable information for tracking breaking news over the world.

Dongjin Choi, Pankoo Kim

Computational Biology and Bioinformatics

Classification of Plantar Dermatoglyphic Patterns for the Diagnosis of Down’s Syndrome

Classification of patterns of the hallucal area of sole is one of the tasks of dermatoglyphic analysis. The paper describes pattern recognition and image processing methods applied to the problem of the hallucal area of sole patterns classification. Contrast enhancement, segmentation and contextual filtration techniques are used to enhance quality of the images. Application of an algorithm based on multi-scale pyramid decomposition of an image is proposed for ridge orientation calculation. Hallucal area pattern classifiers, which are part of an automatic system for rapid screen diagnosing of trisomy 21 (Down’s Syndrome) in infants, are created and discussed. The system is a tool supporting medical decision by automatic processing of dermatoglyphic prints and detecting features indicating presence of genetic disorder. Images of dermatoglyphic prints are pre-processed before the classification stage to extract features analyzed by Support Vector Machines algorithm. RBF kernel type is used in the training of SVM multi-class systems generated with one-vs-one scheme. Experiments conducted on the database of Collegium Medicum of the Jagiellonian University in Cracow show effectiveness of the proposed approach in classification of infants’ dermatoglyphs.

Hubert Wojtowicz, Wieslaw Wajs
Adaptive Cumulative Voting-Based Aggregation Algorithm for Combining Multiple Clusterings of Chemical Structures

Many consensus clustering methods have been studied and applied in many areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, Adaptive Cumulative Voting-based Aggregation Algorithm (A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of clustering to separate active from inactive molecules in each cluster and the results were compared to the Ward’s method. The chemical dataset MDL Drug Data Report (MDDR) database was used. Experiments suggest that the adaptive cumulative voting-based consensus method can efficiently improve the effectiveness of combining multiple clustering of chemical structures.

Faisal Saeed, Naomie Salim, Ammar Abdo, Hamza Hentabli
LINGO-DOSM: LINGO for Descriptors of Outline Shape of Molecules

The linear notations are more compact than connection tables so they can be useful for storing and transmitting large number of chemical structures. Implicitly they contain the information needed to compute all kinds of molecular structures and, thus, molecular properties derived from these structures. In this DOSM is a new method of obtaining a rough description of 2D molecular structure from its 2D connection graph in the form of character string. Our method is based on the fragmentation of DOSM strings into overlapping substrings of a defined size that we call LINGO-DOSM. The integral set of LINGO-DOSM derived from a given DOSM string, LINGO-DOSM allows rigorous structure specification using very small and simple rule. In this paper, we study the possibility of using the textual descriptor for describing the 2D structure of the molecule. Simulated virtual screening experiments with the MDDR database show clearly the superiority of the LINGO-DOSM descriptor compared to many standard descriptors tested in this paper.

Hamza Hentabli, Naomie Salim, Ammar Abdo, Faisal Saeed
Prediction of Mouse Senescence from HE-Stain Liver Images Using an Ensemble SVM Classifier

Study of cellular senescence from images in molecular level plays an important role in understanding the molecular basis of ageing. It is desirable to know the morphological variation between young and senescent cells. This study proposes an ensemble support vector machine (SVM) based classifier with a novel set of image features to predict mouse senescence from HE-stain liver images categorized into four classes. For the across-subject prediction that all images of the same mouse are divided into training and test images, the test accuracy is as high as 97.01% by selecting an optimal set of informative image features using an intelligent genetic algorithm. For the leave-one-subject-out prediction that the test mouse is not involved in the training images of 20 mice, we identified eight informative feature sets and established eight SVM classifiers with a single feature set. The best accuracy of using an SVM classifier is 71.73% and the ensemble classifier consisting of these eight SVM classifiers can advance performance with accuracy of 80.95%. The best two feature sets are the gray level correlation matrix for describing texture and Haralick texture set, which are good morphological features in studying cellular senescence.

Hui-Ling Huang, Ming-Hsin Hsu, Hua-Chin Lee, Phasit Charoenkwan, Shinn-Jang Ho, Shinn-Ying Ho

Computational Intelligence

An Introduction to Yoyo Blind Man Algorithm (YOYO-BMA)

In this paper, a new algorithm is proposed which is inspired by human intelligence named YOYO Blind Man Algorithms (YOYO-BMA). The main idea of YOYO-BMA is the combination of human intelligence with features of yoyo. In the proposed algorithm, it is supposed that there are some men in a dark room, which are named

blind men

. They look for the optimum. Each man has at least a yoyo to use as assistant. Men search problem space using their yoyos. This new algorithm is compared with 5 other different algorithms and the results show the better performance of YOYO-BMA compared with the other ones.

Mohammad Amin Soltani-Sarvestani, Shahriar Lotfi
A New Method for Job Scheduling in Two-Levels Hierarchical Systems

The use of parallel and distributed systems has become very common in the last decade. Dividing data is one of the challenges in these types of systems. Divisible load theory (DLT) is one of the proposed methods for scheduling data distribution in parallel or distributed systems. Many researches have been done in this field, but scheduling a multi-installment heterogeneous system with two-level hierarchical topology in which communication mode is blocking has not been addressed. In this paper, we find the proper size of task for each sub tree. Finally, in the experiments section, we show that the proposed methods work correctly and give us the best scheduling.

Amin Shokripour, Mohamed Othman, Hamidah Ibrahim, Shamala Subramaniam
Intelligent Water Drops Algorithm for Rough Set Feature Selection

In this article; Intelligent Water Drops (IWD) algorithm is adapted for feature selection with Rough Set (RS). Specifically, IWD is used to search for a subset of features based on RS dependency as an evaluation function. The resulting system, called IWDRSFS (Intelligent Water Drops for Rough Set Feature Selection), is evaluated with six benchmark data sets. The performance of IWDRSFS are analysed and compared with those from other methods in the literature. The outcomes indicate that IWDRSFS is able to provide competitive and comparable results. In summary, this study shows that IWD is a useful method for undertaking feature selection problems with RS.

Basem O. Alijla, Lim Chee Peng, Ahamad Tajudin Khader, Mohammed Azmi Al-Betar
Information-Based Scale Saliency Methods with Wavelet Sub-band Energy Density Descriptors

Pixel-based scale saliency (PSS) work bases on information estimation of data content and structure in multiscale analysis; its theoretical aspects as well as practical implementation are discussed by Kadir

et al

[11]. Scale Saliency framework [10] does not work only for pixels but other basis-projected descriptors as well. While wavelet atoms, localization in both time and frequency domain, are possible alternative descriptors, no theoretical analysis and practical solutions have been proposed yet. Our contribution is introducing a mathematical model of utilizing wavelet-based descriptors in a correspondent Wavelet-based Scale Saliency (WSS). It treats wavelet sub-band energy density of two popular discrete wavelet transform (DWT) and dual-tree complex wavelet transform (DTCWT) as basis descriptors instead of pixel-value descriptors for saliency map estimation. Then, ROC, AUC, and NSS quantitative analysis are comparing WSS against PSS as well as other state-of-the-art saliency methods ITT [9], SUN [18], SRS [8] on N. Bruce’s database [4] with human eye-tracking data as ground-truth. Furthermore, qualitative results, different saliency maps, are analyzed case by case for their pros and cons; especially their short-comings in specific situation or insensible results for human perception.

Anh Cat Le Ngo, Li-Minn Ang, Guoping Qiu, Kah Phooi Seng
Feature Subset Selection Using Binary Gravitational Search Algorithm for Intrusion Detection System

Due to control different infrastructures of networked computers in cyber security, intrusion detection system has been an important task essentially. Today, an effective intrusion detection system utilizes computational methods as machine learning techniques to improve detection rate with lowest false positive rate; however large number of irrelevant features as an optimization problem decrease this rate. This study using Binary Search Gravitational Algorithm (BGSA) as a feature selection method decreases irrelevant features in KDD 99 intrusion detection data set in order to improve Multi-layer perceptron performance. Results show that significant and relevant features increase performance of intrusion detection system near to 100% with lowest computational cost.

Amir Rajabi Behjat, Aida Mustapha, Hossein Nezamabadi–pour, Md. Nasir Sulaiman, Norwati Mustapha

Modelling and Optimization Techniques in Information Systems, Database Systems and Industrial Systems

Sparse Signal Recovery by Difference of Convex Functions Algorithms

This paper deals with the problem of signal recovery which is formulated as a

l

0

-minimization problem. Using two appropriate continuous approximations of

l

0

 − 

norm

, we reformulate the problem as a DC (Difference of Convex functions) program. DCA (DC Algorithm) is then developed to solve the resulting problems. Computational experiments on several datasets show the efficiency of our methods.

Hoai An Le Thi, Bich Thuy Nguyen Thi, Hoai Minh Le
DC Programming and DCA Based Cross-Layer Optimization in Multi-hop TDMA Networks

Efficient design of wireless networks is a challenging task. Recently, the concept of cross-layer design in wireless networks has been investigated extensively. In this work, we present a cross-layer optimization framework, i.e., joint rate control, routing, link scheduling and power control for multi-hop time division multiple access (TDMA) networks. In particular, we study a centralized controller that coordinates the routing process and transmissions of links such that the network lifetime is maximized. We show that the aforementioned design can be formulated as a mixed integer-linear program (MILP) which has worst case exponential complexity to compute the optimal solution. Therefore, our main contribution is to propose a computationally efficient approach to solve the cross-layer design problem. Our design methodology is based on a so-called

Difference of Convex functions algorithm

(DCA) to provide either optimal or near-optimal solutions with finite convergence. The numerical results are encouraging and demonstrate the effectiveness of the proposed approach. One of the advantages of the proposed design is the capability to handle very large-scale problems which are the usual scenarios encountered in practice.

Hoai An Le Thi, Quang Thuan Nguyen, Khoa Tran Phan, Tao Pham Dinh
The Multi-flow Necessary Condition for Membership in the Pedigree Polytope Is Not Sufficient- A Counterexample

The multistage insertion formulation (

MI

) for the symmetric traveling salesman problem (STSP), gives rise to a combinatorial object called pedigree. Pedigrees are in one-to-one correspondence with Hamiltonian cycles. The convex hull of all the pedigrees of a problem instance is called the pedigree polytope. The MI polytope is as tight as the subtour elimination polytope when projected into its two-subscripted variable space. It is known that the complexity of solving a linear optimization problem over a polytope is polynomial if the membership problem of the polytope can be solved in polynomial time. Hence the study of membership problem of the pedigree polytope is important. A polynomially checkable necessary condition is given by Arthanari in [5]. This paper provides a counter example that shows the necessary condition is not sufficient.

Laleh Haerian Ardekani, Tiru S. Arthanari
A Linear Integer Program to Reduce Air Traffic Delay in Enroute Airspace

Due to fast growing of sector of air transportation, air traffic management becomes more and more complex. Therefore, the ability of systems to manage air traffic presents difficulties.

In this paper, we address the Air Traffic Flow Management (ATFM) problem, as a new Linear Integer Program (LIP). It takes into account all flights phases, i.e., taking-off, cruising and landing. The model also allows rerouting decisions. All constraints of our model and objective function are linear, differently from the model of Bertsimas et al. [2] which contains non-linear constraints.

Finally, numerical simulations are presented at the end of this article showing the effectiveness of our new formulation.

Ihsen Farah, Adnan Yassine, Thierry Galinho

Intelligent Supply Chains

Modeling the Structure of Recommending Interfaces with Adjustable Influence on Users

Recommending interfaces are usually integrated with marketing processes and are targeted to increasing sales with the use of persuasion and influence methods to motivate users to follow recommendations. In this paper is presented an approach based on decomposition of recommending interface into elements with adjustable influence levels. A fuzzy inference model is proposed to represent the system characteristics with the ability to adjust the parameters of the interface to acquire results and increase customer satisfaction.

Jaroslaw Jankowski
Increasing Website Conversions Using Content Repetitions with Different Levels of Persuasion

A user’s behavior on a website is usually based on a sequence of loaded pages and repetition of content and this is a natural part of communication with a website. Changes of persuasion on a user between repetitions can lead to the increased number of interactions expected by a website operator but can affect user experience as well. In this paper we propose the modeling of the parameters of objects used in repetitions based on the fuzzy inference system towards increased conversions with limited negative impact on a user.

Jaroslaw Jankowski
Virtual Collaboration in the Supply Chains – T-Scale Platform Case Study

Current model of organization of supply chains results in inefficient use of transport resources, high transport costs, increasing congestions and CO

2

emission. This effect has been demonstrated by research conducted by the author as well as by the European Environmental Agency. This situation can be change by development of alternative business model for collaboration in organisation of the transport processes within the supply chains. The aim of this paper is to present practical implementation of the developed by the author T-Scale platform that enables collaboration between independent transport users and transport service providers. Moreover, an overview of existing communication platform with its major functionalities are presented. The work is summarized by the major benefits of collaboration achieved by the group of companies operating in the FMCG sector in Poland.

Marcin Hajdul
Cooperation between Logistics Service Providers Based on Cloud Computing

The paper describes the use of cloud computing in logistics, especially the creation of the multi-modal platform designed for cooperating logistics service providers and their customers. The research conducted within the EU project is presented. The article focuses primarily on the findings of its initial phase – the analysis of information requirements needed for cloud computing platform. The processes maps and use case are proposed.

Arkadiusz Kawa, Milena Ratajczak-Mrozek

Applied Data Mining for Semantic Web

Discovering Missing Links in Large-Scale Linked Data

The explosion of linked data is creating sparse connection networks, primarily because more and more missing links among difference data sources are resulting from asynchronous and independent database development. DHR was proposed in other research to discover these links.However, DHR has limitations in a distributed environment. For example, while deploying on a distributed SPARQL server, the data transfer usually causes overhead on the network. Therefore, we propose a new method of detecting a missing link based on DHR. The method consists of two stages: finding the frequent graph and matching the similarity. In this paper, we enhance some features in the two stages to reduce the data flow before querying. We conduct an experiment using geographic data sources with a large number of triples to discover the missing links and compare the accuracy of our proposed matching method with DHR and the primitive mix similarity method. The experimental results show that our method can reduce a large amount of data flow on a network and increase the accuracy of discovering missing links.

Nam Hau, Ryutaro Ichise, Bac Le
Effective Hotspot Removal System Using Neural Network Predictor

Monitoring and prediction of resource usage are two major methods to manage distributed computing environments such as cluster, grid computing, and most recent cloud computing. In this paper, we propose a novel hotspot removal system using a neural network predictor. The proposed system detects and removes hotspots with resource specific removal algorithm. The system also improves neural network predictor by introducing prediction confidence. The effectiveness of our proposed system is verified with empirical examples, and evaluation results show that our system outperforms a popular hotspot removal system in hotspot predication and hotspot removal.

Sangyoon Oh, Mun-Young Kang, Sanggil Kang
A Case Study on Trust-Based Automated Blog Recommendation Making

We are presented with a situation in which a visitor wants to travel to Malaysia. Several questions arise at this point: Should the visitor believe the information provided in the Malaysian official tourism website? Or should the visitor refer to some other “unofficial” sources like blogs which contain the blogger’s own experiences? In the travel domain, almost all information shared in blogs naturally derives from blogger’s experiences. Positive correlation might exist between blogger and information. This correlation must point to the fact that users tend to be attracted towards finding information through blogs. To support this crucial issue, a survey on the actual people’s opinions in finding the relationship between a person and his/her blog information has been done in the travel blog’s domain. Results have shown that user usually prefers to refer to the information mentioned by people they trust or, more accurately, friends rather than other sources. In addition, the previous works on trust and blogs also share an agreement that the positive correlation between a blogger and his/her information should affect the trust value. This survey has created an inspiration for the recommendation systems based on the trust exerted on blog information.

Nurul Akhmal Mohd Zulkefli, Hai Trong Duong, Baharum Baharudin

Semantic Web and Ontology

Consensus for Collaborative Ontology-Based Vietnamese WordNet Building

Ontology-based Vietnamese WordNet (OVW) has an extremely important role for most of areas relating to Vietnamese language processing. In this paper, we supplement some structural changes to enrich the structure of Ontology-based WordNet and use it to develop the OVW. A consensus-based collaboration method with reliability measurement is proposed for collaborative OVW building. The knowledge contributed through collaborative processes by participants is considered as in consistent data for our consensus method to make a reconciled version. In experiment, OVW is automatically initialized by using Vietnamese word list. Participants collaborate to improve this initial version via our system. To evaluate our method, we compare the accuracy rate of OVW and Vietnamese WordNet using Asian WordNet’s approach.

Tuong Le, Trong Hai Duong, Bay Vo, Sanggil Kang
An Ontological Context-Aware Approach for a Dynamic Business Process Formulation

Ontologies are used in Business Process Management (BPM) to reduce the gap between the business world and information systems, especially in the context of the cross enterprise collaboration. For a dynamic collaboration, virtual enterprises need to establish collaborative process with appropriate matching levels of tasks. However, the problem of solving the semantics mismatching is still not tackled or even harder in the case of querying space between different enterprise profiles as ontologies. This paper proposes an approach based on the ontological and context-awareness during the integration and matching task for forming collaborative processes in the problem of the cross enterprise collaboration.

Hanh Huu Hoang

Integration of Information systems

SMAC - Dataflow and Storage Modeling for Remote Personnel Identification in Restricted Areas

Automated identification of persons staying and working within closed and restricted areas is required for many guarded centers and restricted areas including airports, power plants, railway and sea container terminals, military training grounds, etc. This process is essential to ensure security and support identification of threats over the area to prevent terrorism acts and ensure adequate protection level. In case of large areas and many employers involved, it is necessary to introduce automated identification methods to support or even replace traditional security forces that are usually human guards using optical and infrared vision, also enabling operation in the darkness and heavy weather conditions. This paper represents general assumption for the integrated solution using Global Positioning System (GPS) devices, range radars, communication and software, also contains in-depth description of the database-related part of the system, including dataflow model and underlying Database Management System (DBMS) design as a part of the integrated “Friend” or ”Foe” (IFF) identification solution.

Piotr Czekalski, Krzysztof Tokarz
Infrastructure vs. Access Competition in NGNs

With the introduction of NGNs, operators need to upgrade their access networks because in several cases, existing access networks can no longer meet increasing customer expectations. Evolving consumer expectations will require changes to the existing access network – next generation access. However, existing technologies faces some difficulties and are not ready for large-scale roll-out yet. For example, in the case of DSL technologies, the great majority of operators with copper networks are improving their networks, making investments to deploy fiber optics closer to customers and offering higher-speed access, which is required for new emerging services (reducing the distance between fiber and the users.). The entry of new competitors can be based on the resale of services from the incumbent, on building up their own infrastructures, on renting unbundled infrastructure from incumbents, or, on the combination of the above elements. Then, is important create the right incentive for operators to make an efficient build/buy choice and define the appropriate pricing principles.

João Paulo Ribeiro Pereira

Conceptual Modeling in Advanced Database Systems

Modeling and Verifying DML Triggers Using Event-B

Database trigger is a block code that automatically executes in response to changes of table or view in the database system. The correctness of a trigger usually can be verified when it is executed. It is apparently useful if we can detect the trigger system’s errors in the design phase. In this paper, we introduce an approach to model and verify data manipulation language (DML) triggers in the database system by a formal method. In the first phase, we formalize a database trigger system by an Event-B model. After that, we use the Rodin tool to verify some properties of the system such as termination, preservation of constraint rules. We also run an example to illustrate the approach in detail.

Hong Anh Le, Ninh Thuan Truong
A Conceptual Multi-agent Framework Using Ant Colony Optimization and Fuzzy Algorithms for Learning Style Detection

This paper examines the progress of researches that exploit multi-agent systems for detecting learning styles and adapting educational processes in e-Learning systems. In a summarized survey of the literature, we review and compile the recent trends of researches that applied and implemented multi-agent systems in educational assessment. We discuss both agent and multi-agent systems and focus on the implications of the theory of detecting learning styles that constitutes behaviors of learners when using online learning systems, learner’s profile, and the structure of multi-agent learning systems. We propose a new dimension to detect learning styles, which involves the individuals of learners’ social surrounding such as friends, parents, and teachers in developing a novel agent-based framework. The multi-agent system applies ant colony optimization and fuzzy logic search algorithms as tools to detecting learning styles. Ultimately, a working prototype will be developed to validate the framework using ant colony optimization and fuzzy logic.

Ghusoon Salim Basheer, Mohd Sharifuddin Ahmad, Alicia Y. C. Tang
Backmatter
Metadaten
Titel
Intelligent Information and Database Systems
herausgegeben von
Ali Selamat
Ngoc Thanh Nguyen
Habibollah Haron
Copyright-Jahr
2013
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-36543-0
Print ISBN
978-3-642-36542-3
DOI
https://doi.org/10.1007/978-3-642-36543-0

Premium Partner