Multi-dimensional Data Processing

Generic Operations in the Structured Space of the Music

In this paper we study the problem of performing operations in structured spaces of data. This problem is one of the few most important aspects of intelligent knowledge processing. For complex spaces of data performing operations requires deep analysis, which usually employs description of the operation, its syntactic structuring and semantic analysis. In the study we focus our attention on employing operation in processing music data. The case study carries out transposition accomplished in printed music notation and in Braille music notation. It is shown that semantic analysis is necessary to transpose in Braille music notation and makes transposition clearer and easier in printed music notation.

Tomasz Sitarek, Wladyslaw Homenda

Collective Cubing Platform towards Definition and Analysis of Warehouse Cubes

Multidimensional data analysis, as supported by OLAP (online analytical processing), requires the computation of many aggregate functions over a large volume of historically collected data. Meanwhile, a recent trend in data communities has been the presence of dynamic, interdisciplinary data communities in which users can find and select data from a wide range of data providers. Using this approach, we have designed the Cubing service platform, which allows rapid retrieval of warehouse cubes in a way that would be familiar to any online shopper. In such an open marketplace, cubing services play a role as a metadata layer that maps cube definitions to the underlying schema and defines how the published cubes will be queried. The proposed platform couples an efficient cube selection mechanism with semantic reasoning capabilities, capable of processing large data sources, which expressed in a variety of formalisms, into a collection of warehouse datasets that expose the native metadata in a uniform manner. Thus, the platform is easily extensible and robust to updates of both data and metadata in the warehouse datasets.

Duong Thi Anh Hoang, Ngoc Sy Ngo, Binh Thanh Nguyen

To Approach Cylindrical Coordinates to Represent Multivariable Spatio-temporal Data

Data representing a moving object include the data of time, position, and attributes. The data of positions and attributes of a moving object, which change over time may be recorded asynchronously because of the difference of sampling methods. Mathematically, these data may be synchronized over time by space-time conversions to constitute the data tuples at various time moments. In this article, we proposed the concept of data plane to represent data according to each tuple at each time moment. Subsequently, we integrated the data planes into the dimensions of a cylindrical coordinate system to represent the movement of objects in a space-time cylinder (STCy). In a space-time cylinder, positions of moving objects are indicated on the data planes which are constituted by the cylinder axis employed as the cylindrical axis of the cylindrical coordinate system, and the polar vectors of the cylindrical coordinate system. Each data plane indicates the data of objects at a time moment. The position of a moving object at a time moment is indicated by its coordinates on the data plane and the time moment by the angular coordinate of this plane. The attributes of moving objects are represented on data planes as the attribute bars parallel to the cylinder axis. The space-time path of a moving object surrounds the cylinder axis. Hence, the space-time cylinder is consistent with the representation of cyclic movements.

Phuoc Vinh Tran

EFP-M2: Efficient Model for Mining Frequent Patterns in Transactional Database

Discovering frequent patterns plays an essential role in many data mining applications. The aim of frequent patterns is to obtain the information about the most common patterns that appeared together. However, designing an efficient model to mine these patterns is still demanding due to the capacity of current database size. Therefore, we propose an Efficient Frequent Pattern Mining Model (EFP-M2) to mine the frequent patterns in timely manner. The result shows that the algorithm in EFP-M2l is outperformed at least at 2 orders of magnitudes against the benchmarked FP-Growth.

Tutut Herawan, A. Noraziah, Zailani Abdullah, Mustafa Mat Deris, Jemal H. Abawajy

Improved Sammon Mapping Method for Visualization of Multidimensional Data

Three improvements to the Sammon mapping method are proposed. Two of them concern calculation complexity reduction. Introducing the limit for

delta

parameter allows to eliminate error fluctuations during data projection. Calculating distances not for all data points but for the part of them results in important reduction of the calculation time without worsening the final results. The third improvement allows adding new data to the projected ones without recalculation of all data from the beginning. The paper presents details of the proposed improvements and the performed experimental study.

Halina Kwasnicka, Pawel Siemionko

Ontology Relation Alignment Based on Attribute Semantics

The problem of ontology alignment is based on finding mappings between instances, concepts and relations of two ontologies which (following Gruber’s work [8]) can be defined as explicit specification of decomposition of some part of reality. This specification spreads over three levels of detail: the concept attribute level, the concept level and the relation level. This paper concentrates on identifying matches between relations of concepts which describe how these entities interact with each other. After careful analysis we have noticed that this level can be a source of many inconsistencies when two ontologies are blindly integrated. We take our work on attribute-based concept alignment and the consensus theory as a starting point. We extend it to handle the issues that appear when aligning relations. We give formal definitions along with careful formalization of set of requirements that eventual mapping algorithm should satisfy in order to reliably designate matches between ontologies on relation level.

Marcin Mirosław Pietranik, Ngoc Thanh Nguyen

Data Deduplication Using Dynamic Chunking Algorithm

Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.

Young Chan Moon, Ho Min Jung, Chuck Yoo, Young Woong Ko

Web Systems

Applying MapReduce Framework to Peer-to-Peer Computing Applications

MapReduce is a programming framework for processing large amount of data in distribution. MapReduce implementations, such as Hadoop MapReduce, basically operate on dedicated clusters of workstations to achieve high performance. However, the dedicated clusters can be unrealistic for users who infrequently have a demand of solving large distributed problems. This paper presents an approach of applying the MapReduce framework on peer-to-peer (P2P) networks for distributed applications. This approach aims at exploiting leisure resources including storage, bandwidth and processing power on peers to perform MapReduce operations. The paper also introduces a prototyping implementation of a MapReduce P2P system, where the main functions of peers contain contributing computing resources, forming computing groups and executing the MapReduce operations. The performance evaluation of the system has been compared with the Hadoop cluster using the prevailing word count problem.

Huynh Tu Dang, Ha Manh Tran, Phach Ngoc Vu, An Truong Nguyen

Scalable Adaptation of Web Applications to Users’ Behavior

In this paper we present a comparative study of performance of an adaptive e-banking Web application supporting personalization either on a client or on a server side. Currently, modern applications being developed support various kinds of personalization. One of its types is changing behavior and appearance in response to actions taken by a user. Not only pre-defined rules but also new patterns discovered for different levels of events should be applied. Scaling such “interactive” applications to a large number of users is challenging. First, the stream of events generated by users’ actions may be huge, and second, processing of the adaptation rules per single user requires computing resources that multiply with the number of users.

This paper reports on the efficiency of the method enabling a client-side adaptation after moving adaptation logics from a server to a client.

Krzysztof Węcel, Tomasz Kaczmarek, Agata Filipowska

OCE: An Online Colaborative Editor

In this paper we present the development of an Online Collaborative Editor (OCE) software system. It allows several people, to edit and share computer files using different devices, such as mobiles, PDAs in an easy way.

We use formal methods in order to automatize and describe OCE. Its formalism is very suitable to specify time requirements (both time consumption due to the performance of tasks and timeouts) as well as to represent data communication among different components of the system.

This

exercise

convinced us that a formal approach to develop complex systems can facilitate some of the development phases. In particular, the testing and debugging phases, more precisely, how to chose those tests more suitable to be applied, is simplified since tests are automatically extracted from the specification.

César Andrés, Rui Abreu, Alberto Núñez

Construction of Semantic User Profile for Personalized Web Search

User profile is an essential component for accessing the personalized information from the Web. Efficiency of personalized accessed information highly depends on how to model the user details to construct user profile. Previously, user profile was constructed by collecting list of keywords to inferring user interests. These kinds of approaches are not sufficient for many applications. In this paper, we have proposed a new method for constructing semantic user profile for personalized information access. User’s query is extended using ontological profile for generation of personalized search context. Experimental results show that our method of constructing semantic profile is effective for searching information with individual needs.

Mohammed Nazim Uddin, Trong Hai Duong, Visal Sean, Geun-Sik Jo

Link Prediction in Dynamic Networks of Services Emerging during Deployment and Execution of Web Services

We propose an approach, according to which the Web services interoperability and resulting composition schemes may be used to create the network structures reflecting the patterns according to which the services interact during execution of composition and execution queries. We show how to create so-called networks of Web services which allow to effectively use the network structural analysis and optimization techniques to solve the network composition problems. The service network is created on the basis of the semantic bindings between the services in the repository joined with the actual patterns of the service usage resulting from composition queries. Next we show how available techniques of dynamic network structure prediction and analysis may help to assess the future service usage and resource consumption of the service execution layer. Our approach is illustrated by the real data gathered from the

PlaTel

platform, dedicated to the complex service planning, management, provision, composition, execution and validation.

Adam Grzech, Krzysztof Juszczyszyn, Paweł Stelmach, Łukasz Falas

Towards a Model of Context Awareness Using Web Services

Recent years have witnessed the movement of many applications from the traditional closed environments into open ones. These applications, which are being accessed via web browsers, usually offer a great amount of information and services. Open environments and content explosion may affect the usability of web applications, where usability measures the degree of usage satisfaction of the application provider and the application user. If both sides of a communication (the web application and the device accessing it) collaborate to manage the various issues of context, usability could be improved. This paper focuses on modeling context awareness. We propose two models that organize knowledge in layers, and complement each other, in order to give the web applications’ developers the adequate knowledge and a visualization of what should be performed to develop a context aware application. Some of the major issues that need to be considered are: context representation and the heterogeneity that characterizes the open environment of web applications. We shall employ the object oriented approach to represent context and we shall utilize the web services to make a first step toward developing a notion of universal interoperability that aims to facilitate the communication between the server hosting the web application and the devices used to access it. We aim to enable each device to be responsible for its own context without the need of the web application to know the details of how the device is managing the context. As a case study, we present an implementation of a prototype of a university portal.

Mahran Al-Zyoud, Imad Salah, Nadim Obeid

Short-Term Spatio-temporal Forecasts of Web Performance by Means of Turning Bands Method

This work presents Turning Bands simulation method (TB) as a geostatistical approach for making spatio-temporal forecasts of Web performance. The most significant advantage of this method is requirement for the minimum amount of input data to make accurate and detailed forecasts. For this paper, necessary data were obtained with the Multiagent Internet Measuring System (MWING); however, only those measurements of European servers that were collected by the MWING’s agent in Gdansk were used. The aforementioned agent performed measurements (i.e. download times of the same given resource from the evaluated servers) three times every day, between 07.02.2009 and 28.02.2009, at 06:00 am, 12:00 pm and 06.00 pm. First, the preliminary and structural analyses of the measurement data were performed. Then short-term spatio-temporal forecasts of total downloading times for a four days ahead were made. And finally, thorough analysis of the obtained results was carried out and further research directions were proposed.

Leszek Borzemski, Michal Danielak, Anna Kaminska-Chuchmala

Extreme Propagation in an Ad-Hoc Radio Network - Revisited

We consider the algorithm by Baquero, Almeida and Menezes that computes extreme values observed by nodes of an ad hoc network. We adapt it to meet specific technical features of communication in wireless networks with a single channel based on time multiplexing. Our approach leads to substantial reduction of the number of messages transmitted as well as execution speed-up.

Przemysław Błaśkiewicz, Mirosław Kutyłowski, Wojciech Wodo, Kamil Wolny

A Model for the Performance Analysis of SPL-OBS Core Nodes with Deflection Routing

In optical burst switching networks, techniques like wavelength conversion, optical buffer or deflection routing are often applied to resolve a contention problem that may cause data loss. Instead of the planned output port, the contended burst is sent to a new output port, on a new path to its destination by the deflection routing. In the case of wavelength conversion, the arriving burst is conveyed on a new available wavelength, but only partial wavelength conversion is available due to the technology constraint. This article considers a model for the performance evaluation of OBS core nodes with the Share-Per-Link(SPL) architecture where partial wavelength converters are distributed at each output port. A continuous-time Markov chain model is proposed to analyze the performance of OBS core nodes operated with the deflection routing rule.

Dang Thanh Chuong, Vu Duy Loi, Vo Viet Minh Nhat

Intelligent Decision Making

Ordering of Potential Collaboration Options

This work focuses on investigating the importance of evaluating the likelihood that a particular collaboration option is going to be profitable for a firm. Some collaboration options for a firm are first evaluated by experts on previously agreed upon criteria. Methods from rough set theory are afterwords employed for ordering of the agreed upon criteria with respect to their significance in predicting collaboration outcomes.

Sylvia Encheva

Interface Design for Decision Systems

A key aspect of well-designed decision systems is the recognition of important decision functions implemented as explicit facets of the system. The literature is replete with discussion about what these facets ought to be. The essential ones involve the decision model, agents to handle task distribution and completion, data, solvers that execute model instances, visualization, and scenario development to explore problem instances. The interaction among these facets in conjunction with the user is what makes for effective problem solving. The additional dimension that must be recognized in design is the fact that users typically possess a wide range of abilities but are all vested with decision making authority. This makes it necessary for the interface design to provide the functionality described above while allowing for effective interaction by a range of users. We use a classification scheme for users by labeling them as system builders, professionals, and naïve users. We develop a framework that juxtaposes system facets with user abilities to derive some interface design principles. We present the results of our work with examples in the supply chain domain.

Ching-Shen Dong, Ananth Srinivasan

Opponent Modeling in Texas Hold’em Poker

In this paper a new algorithm for prediction opponent move in Texas Hold’em Poker game is presented. The algorithm is based on artificial intelligence approach – it uses several neural networks, each trained on a specific dataset. The results given by algorithm may be applied to improve players’ game. Moreover, the algorithm may be used as a part of more complex algorithm created for supporting decision making in Texas Hold’em Poker.

Grzegorz Fedczyszyn, Leszek Koszalka, Iwona Pozniak-Koszalka

On Axiomatization of Power Index of Veto

Relations between all constitutional and government organs must be moderated and evaluated depending on their way of decision making. Among their attributes one may find the right to veto. It is known already that a priori veto is rather strengthening the position of beholder. The evaluation of a power to make a decision is directly connected with a way of power measuring, i.e. with power index choice. In the paper we consider axiomatic base for such choice of an index of power evaluation.

Jacek Mercik

STRoBAC – Spatial Temporal Role Based Access Control

The development of geography-based services and systems has created the demands in which access control is the primary concern for geospatial data security. Although there are a variety of models to manage geospatial data access, none of them can fulfil the access control requirements. The objective of this paper is to propose a model that can support both spatio-temporal aspects and other contextual conditions as well as access control based on the role of subject. We call this model Spatial Temporal Role Based Access Control (STRoBAC). In addition, we propose an extension of GeoXACML framework, which is highly scalable and can help in declaring and enforcing various types of rules, to support the proposed model. This is the crucial contribution of our research compared to the existing approaches and models.

Kim Tuyen Le Thi, Tran Khanh Dang, Pierre Kuonen, Houda Chabbi Drissi

Methods for Scheduling

Rescheduling of Concurrently Flowing Cyclic Processes

The paper presents a declarative modeling framework enabling to evaluate the cyclic steady state of a given system of concurrently flowing cyclic processes (SCCP) on the base of the assumed topology of transportation routes, dispatching rules employed, resources and operation times as well as an initial processes allocation. The objective is to provide sufficient conditions guaranteeing rescheduling among cyclic schedules reachable in a given SCCP. The properties providing such conditions as well as illustrative examples are presented.

Grzegorz Bocewicz, Zbigniew A. Banaszak

Comparison of Allocation Algorithms in Mesh Oriented Structures for Different Scheduling Techniques

The paper concerns task allocation problem in mesh structured system. The dynamic case is considered. Four allocation algorithms have been evaluated. The research was focused on the impact of task scheduling technique co-operated with allocation algorithms. Two queuing schemes were compared: well-known First Come First Served and newly created, by the authors of this paper, heuristic scheduling technique called First Few Random. The comparison of efficiencies of different allocation algorithms combined with different queuing schemes has been done on the basis of simulation experiments made with a designed experimentation system. The discussion of the obtained results confirms that the proposed approach and created queuing scheme seem to be promising.

Bartosz Bodzon, Leszek Koszalka, Iwona Pozniak-Koszalka, Andrzej Kasprzak

Reachability of Cyclic Steady States Space: Declarative Modeling Approach

The paper presents a new modeling framework enabling to evaluate the cyclic steady state of a given system of concurrently flowing cyclic processes (SCCP) sharing common system resources while interacting on the base of a mutual exclusion protocol. Assuming a given topology of cyclic routes passing on by subsets of system resources, a set of dispatching rules aimed at recourses’ conflicts resolution, operation times as well as the given frequencies of mutual appearance of local processes the main objective is to provide the declarative modeling framework enabling to refine conditions guaranteeing the cyclic steady state space reachability.

Grzegorz Bocewicz, Robert Wójcik, Zbigniew A. Banaszak

Image and Video Processing

Caption Text and Keyframe Based Video Retrieval System

In this paper, we present a framework for video retrieval using caption text and keyframe similarity. To extract caption text, we applied methods detecting and extracting image areas contain caption text and we used Tesseract-OCR engine to convert into plain text, then use Hunspell library for spell words. Next, we used Clucene search engine index and query on this text. We applied shape descriptors APR and ECM to descript keyframes of the video shots and use those descriptors as a feature vector of video shots. From the feature vectors were obtained, we used ANN library to index and search. The system which is built on the web-based application using ASP.NET support keyword-based and keyframe-based query. The results obtained from experiments produced very promising.

Dung Mai, Kiem Hoang

E-Commerce Video Annotation Using GoodRelations-Based LODs with Faceted Search in Smart TV Environment

TV-commerce is a new form of shopping that allows consumer to view, select and buy products from Smart TV. To do so, sellers annotate videos and associate it with information from online e-commerce systems in a semantic manner. In this work, we propose an e-commerce information derivation mechanism for video annotation using Linked Open Data (LOD) with faceted search. Annotation information is derived from e-commerce LODs, which linked distributed data across e-commerce web. We incorporated faceted search to allow consumer to easily make a information derivation query defined by GoodRelations ontology. The derived information is displayed as a faceted graph facilitating information choice.

Trong Hai Duong, Ahmad Nurzid Rosli, Visal Sean, Kee-Sung Lee, Geun-Sik Jo

Nearest Feature Line Discriminant Analysis in DFRCT Domain for Image Feature Extraction

A novel subspace learning algorithm based on nearest feature line in time-frequency domain is proposed in this paper. The proposed algorithm combines neighborhood discriminant nearest feature line analysis and fractional cosine transform to extract the local discriminant features of the samples. A new discriminant power criterion based on nearest feature line is also presented in this paper. Some experiments are implemented to evaluate the proposed algorithm and the experimental results demonstrate the efficiency of the proposed algorithm.

Lijun Yan, Cong Wang, Jeng-Shyang Pan

Collective Intelligence in Web Systems – Web Systems Analysis

Adaptive Scheduling System Guaranteeing Web Page Response Times

The problem of guaranteeing Quality of Web Services (QoWS) is now crucial for farther development and application in new areas of internet services. In this paper we present WEDF (Web Earliest Deadline First) adaptive, an intelligent Web system which guarantees the quality of service in Web systems with one Web server. The proposed system keeps the page response time within established boundaries, in such a way that at a heavy workload, the page response time both for small and complex pages, would not exceed the imposed time limit. We show in experiments conducted with the use of a real modern Web server that the system can be used to guarantee a higher quality of service than other referenced and widely used in practice scheduling systems.

Krzysztof Zatwarnicki

A Smart and Tangible AR Dress Fitting System

The human-computer interaction plays a critical role between the communication of the machine and the human beings. The ability of 3D visualization system to realistically and quickly render the appearance of products or architectures makes it an attractive and affordable solution for product demonstration. Furthermore, the tangible interface of augmented technology, allowing users to interact with products of interest via hand gestures, offers immersive, joyful, and lifelike product interaction experience to users. By integrating the technologies of 3D visualization and augmented reality, this paper proposes a smart, tangible, and gesture-based visualization system for feminine dress-fitting. To evaluate the usability of the proposed system, a field study experimental method is adopted where every participant is asked to experiment with the proposed system first, and fill out a usability questionnaire afterwards. The questionnaire consists of six constructs: effectiveness, ease of use, interactivity, joyfulness, lifelikeness, and satisfaction. The subjects consist of 34 females, with age ranging from 20 to 29 years old. The results of the questionnaire indicate that the subjects show positive attitude toward the proposed system.

Heien-Kun Chiang, Long-Chyr Chang, Feng-Lan Kuo, Hui-Chen Huang

Consensus as a Tool for RESTful Web Service Identification

Our recent studies show that the market of RESTful Web services is rapidly growing. However, still there is lack of identification methods of this class of services so they remain invisible for their potential users. In this work we propose a tool for solving above problem using the consensus methods. The identification process is based on recognising URI structural patterns subjected to opinions of different agents with different knowledge. The task of consensus method is to determine version of knowledge which best reflects given versions reflected in agent’s opinions on individual components. The research includes defining the structure of knowledge, determining conflict situations, conflict profiles, defining consensus function and assigning distance functions which allow to resolve conflicting views. Moreover, our research is supported with implementations of proposed approach by which we conducted preliminary experiments. Experimental results show high effectiveness and performance of proposed approach in contrast to the other chosen methods.

Adam Czyszczoń, Aleksander Zgrzywa

Detection of Tennis Court Lines for Sport Video Categorization

Digital video data are stored and offered by many public Web services such as Internet video collections, TV shows archives, Internet video-on-demand systems, personal video archives, and others. New methods and technologies of video indexing and retrieval in the Web are developed. Content-based indexing of TV sports news is based on the automatic segmentation, then recognition and classification of scenes reporting the sport events in a given discipline. Automatic classification of sports in TV sports news is one of the basic process in video indexing. There are many different strategies how to recognize a sport discipline. It may be achieved by player scenes analyses leading to the detection of playing fields, of superimposed text like player or team names, recognition of player faces or sport objects, detection of player and audience emotions, and also to the detection of lines typical for a given playing fields and for a given sport discipline. The paper proposes a framework of the automatic line detection of a tennis court for the selection of tennis shots from TV sports news. Categorization of sport videos is based on the minimum set of lines for the detection of a tennis court. The framework has been verified and tested in the Automatic Video Indexer AVI.

Kazimierz Choroś

Advanced Data Mining Techniques and Applications

The Application of Orthogonal Subspace Projection in Multi-spectral Images Processing for Cancer Recognition in Human Skin Tissue

This paper analyses multi-spectral images and their application in the process of cancer recognition in human skin. Cancerous part of a tissue can be characterized by higher accumulation of photosensitive substances then healthy. In order to detect the spectrum of Protoporphyrin IX in the human skin images Orthogonal Subspace Projection classifier was presented. For every pixel it calculates the content of Protoporphyrin IX spectrum in the global pixel spectrum. After pixel classification it was necessary to separate regions with cancer from healthy parts of a tissue by applying non-linear mapping with low frequency removal or mean shift segmentation enhanced with edge detection for better region recognition. Both proposals gave successful results.

Andrzej Zacher, Aldona Drabik, Jerzy Paweł Nowacki, Konrad Wojciechowski

Length and Coverage of Inhibitory Decision Rules

Authors present algorithms for optimization of inhibitory rules relative to the length and coverage. Inhibitory rules have a relation “attribute ≠ value” on the right-hand side. The considered algorithms are based on extensions of dynamic programming. Paper contains also comparison of length and coverage of inhibitory rules constructed by a greedy algorithm and by the dynamic programming algorithm.

Fawaz Alsolami, Igor Chikalov, Mikhail Moshkov, Beata Marta Zielosko

Refining the Judgment Threshold to Improve Recognizing Textual Entailment Using Similarity

In recent years, Recognizing Textual Entailment (RTE) catches strongly the attention of the Natural Language Processing (NLP) community. Using Similarity is an useful method for RTE, in which the Judgment Threshold plays an important role as the learning model. This paper proposes an RTE model based on using similarity. We describe clearly the solutions to determine and to refine the Judgment Threshold for Improvement RTE. The measure of the synonym similarity also is considered. Experiments on a Vietnamese version of the RTE3 corpus are showed.

Quang-Thuy Ha, Thi-Oanh Ha, Thi-Dung Nguyen, Thuy-Linh Nguyen Thi

Optimization of β-Decision Rules Relative to Number of Misclassifications

In the paper, we present an algorithm for optimization of approximate decision rules relative to the number of misclassifications. The considered algorithm is based on extensions of dynamic programming and constructs a directed acyclic graph Δ

β

(

T

). Based on this graph we can describe the whole set of so-called irredundant

β

-decision rules. We can optimize rules from this set according to the number of misclassifications. Results of experiments with decision tables from the UCI Machine Learning Repository are presented.

Beata Marta Zielosko

Advance Missing Data Processing for Collaborative Filtering

Memory-based collaborative filtering (CF) is widely used in the recommendation system based on the similar users or items. But all of these approaches suffer from data sparsity. In many cases, the user-item matrix is quite sparse, which directly leads to inaccurate recommend results. This paper focuses the memory-based collaborative filtering problem on the factor: missing data processing. We propose an advance missing data processing includes two steps: (1) using enhanced CHARM algorithm for mining closed subsets – group of users that share interest in some items, (2) using adjusted Slope One algorithm base on subsets for utilizing not only information of both users and items but also information that fall neither in the user array nor in the item array. After that, we use Pearson Correlation Coefficient algorithm for predicting rating for active user. Finally, the empirical evaluation results reveal that the proposed approach outperforms other state-of-the-art CF algorithms and it is more robust against data sparsity.

Nguyen Cong Hoan, Vu Thanh Nguyen

Improving Nearest Neighbor Classification Using Particle Swarm Optimization with Novel Fitness Function

A new method of feature selection is presented in this paper. The proposed idea uses Particle Swarm Optimization (PSO) with fitness function in order to assign higher weights to informative features while noisy irrelevant features are given low weights. The measure of Area Under the receiver operating characteristics Curve (AUC) is used as the fitness function of the particles. Experimental results claim that the PSO-based feature weighting can improve the classification performance of the

k

-NN algorithm in comparison with the other important method in realm of feature weighting such as Mutual Information, Genetic Algorithm, Tabu Search and chi-squared (

χ

2

). Additionally, on synthetic data sets, this method is able to allocate very low weight to the noisy irrelevant features which may be considered as the eliminated features from the data set.

Ali Adeli, Ahmad Ghorbani-Rad, M. Javad Zomorodian, Mehdi Neshat, Saeed Mozaffari

Sentiment Classification: A Combination of PMI, SentiWordNet and Fuzzy Function

Discerning a consensus opinion about a product or service is difficult due to the many opinions on the web. To overcome this problem, sentiment classification has been applied as an important approach for evaluation in sentiment mining. Recently, researchers have proposed various approaches for evaluation in sentiment mining by applying several techniques such as unsupervised and machine learning methods. This paper proposes an unsupervised method for classifying the polarity of reviews using a combination of methods including PMI, SentiWordNet and adjusting the phrase score in the case of modification. The experiment results show that the proposed system achieves accuracy ranging from 69.36% for movie reviews to 80.16% for automotive reviews.

Anh-Dung Vo, Cheol-Young Ock

Interestingness Measures for Classification Based on Association Rules

This paper proposes a new algorithm for classification based on association rule with interestingness measures. The proposed algorithm uses a tree structure for maintenance of related information in each node, thus making the process of generating rules fast. Besides, the proposed algorithm can be easily extended to integrate some measures together for ranking rules. Experiments are also made to show the efficiency of the proposed approach for different settings. The mining time for different interestingness measures is varied only a little when ten measures are integrated.

Loan T. T. Nguyen, Bay Vo, Tzung-Pei Hong, Hoang Chi Thanh

MSGPs: A Novel Algorithm for Mining Sequential Generator Patterns

Sequential generator pattern mining is an important task in data mining. Sequential generator patterns used together with closed sequential patterns can provide additional information that closed sequential patterns alone are not able to provide. In this paper, we proposed an algorithm called MSGPs, which based on the characteristics of sequential generator patterns and sequence extensions by doing depth-first search on the prefix tree, to find all of the sequential generator patterns. This algorithm uses a vertical approach to listing and counting the support, based on the prime block encoding approach of the prime factorization theory to represent candidate sequences and determine the frequency for each candidate. Experimental results showed that the proposed algorithm is effective.

Thi-Thiet Pham, Jiawei Luo, Tzung-Pei Hong, Bay Vo

A Genetic Algorithm with Elite Mutation to Optimize Cruise Area of Mobile Sinks in Hierarchical Wireless Sensor Networks

In this paper, a new genetic algorithm with elite mutation is proposed for optimization problems. The proposed elite mutation scheme (EM) improves traditional genetic algorithms with a better ability to locate and to approach fast to optimal solutions, even in cases of huge data set. The proposed EM is to select elite chromosomes and mutate according to the similarity between elite chromosomes and selected chromosomes. The designed similarity guides effectively the search toward optimal solutions with less generation. The proposed EM is applied to optimize the cruise area of mobile sinks in hierarchical wireless sensor networks (WSNs). Numeric results show that (1) the proposed EM benefits the discovery of optimal solutions in a large solution space; (2) the approach to optimal solutions is more stable and faster; (3) the search guidance derived from the chromosome similarity is critical to the improvements of optimal solution discovery. Besides, the minimization of cruise are been proved to have the advantages of energy-saving, time-saving and reliable data collection in WSNs.

Mong-Fong Horng, Yi-Ting Chen, Shu-Chuan Chu, Jeng-Shyang Pan, Bin-Yih Liao, Jang-Pong Hsu, Jia-Nan Lin

Cooperative Problem Solving

An Algebraic Structure for Duration Automata

Algebraic structures such as group, lattice and fuzzy algebra play an important role in the field of information technology, especially in modelling systems. The previous researches [1, 2, 3, 4] have introduced a formal tool named duration automata, by which we modelled systems and developed algorithms to solve problems on scheduling jobs for a cluster computer, routing on priority network or dealing with uncertain processing time jobs. Here we introduce a new weak semigroup with the aim to combine duration automata together with algebraic structures to solve some optimization problems.

Bui Vu Anh, Phan Trung Huy

Study of the Migration Scheme Influence on Performance of A-Teams Solving the Job Shop Scheduling Problem

The paper investigates the impact of migration scheme on performance in systems with A-Teams working in parallel, in the architecture designed for solving difficult combinatorial optimization problems and used for solving Job Shop Scheduling Problem. A-Teams, or islands, belonging to the team of A-Teams, cooperate through exchange of intermediary computation results, following certain migration scheme. The number of results forwarded from one A-Team to another, e.g. migration size, is experimentally investigated in combination with different numbers and sizes of the islands.

Piotr Jędrzejowicz, Izabela Wierzbowska

A New Cooperative Search Strategy for Vehicle Routing Problem

Cooperation as a problem-solving strategy is a widely used approach to solving complex hard optimization problems. It involves a set of highly autonomous programs (agents), each implementing a particular solution method, and a cooperation scheme combining these autonomous programs into a single problem-solving strategy. It is expected that such a collective of agents can produce better solutions than any individual members of such collective. The main goal of the paper is to propose a new population-based cooperative search approach for solving the Vehicle Routing Problem. It uses a set of search procedures, which attempt to improve solutions stored in a common, central memory. Access to a single common memory allows exploitation by one procedure solutions obtained by another procedure in order to guide the search through a new promising region of the search space, thus increasing chances for reaching the global optimum.

Dariusz Barbucha

A-Team for Solving the Resource Availability Cost Problem

In this paper the agent system based on A-Team and E-JABAT architecture for solving the resource availability cost problem (RACP) is proposed and experimentally tested. RACP known also as RIP (resource investment problem) belongs to the NP-hard problem class. To solve this problem an A-Team consisting of an asynchronous agents implemented using E-JABAT middleware have been proposed. Three kinds of optimization agent have been used. Computational experiment involves evaluation of the proposed approach.

Piotr Jędrzejowicz, Ewa Ratajczak-Ropel

Agent-Based Approach to RBF Network Training with Floating Centroids

In this paper the agent-based population learning algorithm designed to train RBF networks (RBFN’s) is proposed. The algorithm is used to network initialization and estimation of its output weights. The approach is based on the assumption that a location of the radial based function centroids can be modified during the training process. It is shown that such a floating centroids may help to find the optimal neural network structure. In the proposed implementation of the agent-based population learning algorithm, RBFN initialization and RBFN training based on the floating centroids are carried-out by a team of agents, which execute various local search procedures and cooperate to find-out a solution to the considered RBFN training problem. Two variants of the approach are suggested in the paper. The approaches are implemented and experimentally evaluated.

Ireneusz Czarnowski, Piotr Jędrzejowicz

Computational Swarm Intelligence

New Differential Evolution Selective Mutation Operator for the Nash Equilibria Problem

Differential Evolution (DE) is a simple and powerful optimization method, which is mainly applied to numerical optimization. In this article we present a new selective mutation operator for the Differential Evolution. We adapt the Differential Evolution algorithm to the problem of finding the approximate Nash equilibrium in

n

person games in the strategic form. Finding the Nash equilibrium may be classified as continuous problem, where two probability distributions over the set of pure strategies of both players should be found. Every deviation from the global optimum is interpreted as the Nash approximation and called

ε

-Nash equilibrium. The fitness function in this approach is based on the

max

function which selects the maximal value from the set of payoffs. Every element of this set is calculated on the basis of the corresponding genotype part. We propose an approach, which allows us to modify only the worst part of the genotype. Mainly, it allows to decrease computation time and slightly improve the results.

Urszula Boryczka, Przemysław Juszczuk

Ant Colony Decision Forest Meta-ensemble

This paper is devoted to the study of an extension of Ant Colony Decision Tree (ACDT) approach to Random Forests (RF) – an arisen meta-ensemble technique called Ant Colony Decision Forest (ACDF). To the best of our knowledge this is the first time that Ant Colony Optimization is being applied as an ensemble method in data mining tasks. Meta-ensemble ACDF as a hybrid RF and ACO based algorithm is evolved and experimentally shown high accuracy and good effectiveness of this technique motivate us to further development.

Urszula Boryczka, Jan Kozak

Ant Colony System with Selective Pheromone Memory for TSP

Ant Colony System (ACS) is a well known metaheuristic algorithm for solving difficult optimization problems inspired by the foraging behaviour of social insects (ants). Artificial ants in the ACS cooperate indirectly through deposition of pheromone trails on the edges of the problem representation graph. All trails are stored in a pheromone memory, which in the case of the Travelling Salesman Problem (TSP) requires

O

(

n

2

) memory storage, where

n

is the size of the problem instance. In this work we propose a novel

selective pheromone memory model

for the ACS in which pheromone values are stored only for the selected

subset

of trails. Results of the experiments conducted on several TSP instances show that it is possible to significantly reduce ACS memory requirements (by a constant factor) without impairing the quality of the solutions obtained.

Rafał Skinderowicz

Ant Colony Optimization for the Pareto Front Approximation in Vehicle Navigation

This paper presents an example of a multi-criteria optimization problem for vehicle navigation in the presence of multiple criteria and method of employing Ant Colony Optimization metaheuristic to solve it. The paper presents an approach based on the concept of Pareto optimality and approximation of set of non dominated solutions forming the so-called Pareto front.

Wojciech Bura, Mariusz Boryczka

A Hybrid Discrete Particle Swarm Optimization with Pheromone for Dynamic Traveling Salesman Problem

This paper introduces a new Discrete Particle Swarm Optimization algorithm for solving Dynamic Traveling Salesman Problem (DTSP). An experimental environment is stochastic and dynamic, based on Benchmark Generator was prepared for testing DTSP solvers. Changeability requires quick adaptation ability from the algorithm. The introduced technique presents a set of advantages that fulfill this requirement. The proposed solution is based on the virtual pheromone first applied in Ant Colony Optimization. The pheromone serves as a communication topology and information about the landscape of global discrete space. Experimental results demonstrate the effectiveness and efficiency of the algorithm.

Urszula Boryczka, Łukasz Strąk

A Modified Shuffled Frog Leaping Algorithm with Genetic Mutation for Combinatorial Optimization

In this work, we propose modified versions of shuffled frog leaping algorithm (SFLA) to solve multiple knapsack problems (MKP). The proposed algorithm includes two important operations: repair operator and genetic mutation with a small probability. The former is utilizing the pseudo-utility to repair infeasible solutions, and the later can effectively prevent the algorithm from trapping into the local optimal solution. Computational experiments with a large set of instances show that the proposed algorithm can be an efficient alternative for solving 0/1 multidimensional knapsack problem.

Kaushik Kumar Bhattacharjee, Sarada Prasad Sarmah

Semantic Methods for Knowledge Discovery and Communication

Integrating Curriculum and Instruction System Based on Objective Weak Tie Approach

In order to improving the quality of higher education, sever strategies or models are proposed from universities. Evaluation system is one of these strategies for checking the performance and keeping the high quality. However, it is easy to describe the evaluation system but it will be hard to implement in higher education environment. When there is a goal to be achieved, evaluation system is used. By the time past, more paper works and more tables are required to fill in for different evaluation systems. Due to these problems, the purpose of this paper is to propose an objective weak tie system to integrate four different curriculum and instruction systems. The benefits of this weak tie system will not increase the loading for teachers and will increase the efficiency for administration by remaining the original system and using the objective to make linking with each other as an integrating curriculum and instruction system.

Chia-Ling Hsu, Hsuan-Pu Chang, Ren-Her Wang, Shiu-huang Su Hsu

Business Opportunity: The Weak-Tie Roaming among Tribes

In this study we assume that the opportunity is already exists in the world. Furthermore, the weak-tie strategy is used to recognize two or more useful tribes and compare what differences between tribes to find out useful infor-mation, such as business opportunity for creating innovative service. At last, this model is used to design new Bali service to evidences our model is useful.

Chao-Fu Hong, Mu-Hua Lin, Hsiao-Fang Yang

Emerging Technology Exploration Using Rare Information Retrieval and Link Analysis

To explore the clues of an emerging technology is essential for a company or an industry so that the company can consider the feasibility of resource allocation to the technology and the industry can observe the developing directions of the technology. Patent data contains plentiful technological information from which it is worthwhile to extract further knowledge. Therefore, a research framework for emerging technology exploration has been formed where rare information retrieval is designed to sift out the rare patents, cluster analysis is employed to generate the clusters, and link analysis is adopted to measure the link strength between the rare patents and clusters. Consequently, the rare patents were found, the clusters were generated and named, the notable rare patents were recognized, and the potentiality of emerging technology was discussed. Finally, the notable rare patents and the potentiality of emerging technology would be provided to the decision makers of companies and industries.

Tzu-Fu Chiu, Chao-Fu Hong, Yu-Ting Chiu

Introducing Fuzzy Labels to Agent-Generated Textual Descriptions of Incomplete City-Traffic States

An aim of this research is to create methods for a provision of textual information to users of a distributed multi-agent information system. In particular we focus on a traffic information system where agents transform the numerical data about states of the city traffic obtained using a distributed sensor network into natural language summaries. The basis for the transformation from numerical data into a linguistic domain are zadehian fuzzy-linguistic models of concepts. Unlike in typical Natural Language Generation approaches, this paper focuses on the provision of summaries in situations where data is incomplete and on conveying this incompleteness to the user using belief-based language statements. We provide an algorithm based on a theory of grounding for an agent-based evaluation of local summaries with autoepistemic operators of possibility, belief, and knowledge. We also propose a method for an aggregation of summaries generated by local agents in order to obtain a textual summary of complex structures of the road network (e.g. areas, districts, precise routes).

Grzegorz Popek, Ryszard Kowalczyk, Radosław P. Katarzyniak

Springer Professional

About this book

Table of Contents

Frontmatter