Skip to main content

About this book

This book constitutes the refereed proceedings of the 6th International Conference on Collective Intelligence, ICCCI 2014, held in Seoul, Korea, in September 2014. The 70 full papers presented were carefully reviewed and selected from 205 submissions. They address topics such as knowledge integration, data mining for collective processing, fuzzy, modal and collective systems, nature inspired systems, language processing systems, social networks and semantic web, agent and multi-agent systems, classification and clustering methods, multi-dimensional data processing, Web systems, intelligent decision making, methods for scheduling, image and video processing, collective intelligence in web systems, computational swarm intelligence, cooperation and collective knowledge.

Table of Contents


Keynote Speech

Agreement Technologies – Towards Sophisticated Software Agents

Nowadays, agreements and all the processes and mechanisms implicated in reaching agreements between different kinds of agents are a subject of perspective interdisciplinary scientific research. Newest trend in Agent Technology is to enhance agents with "social" abilities. Agreement Technologies brings new flavor in implementation of more sophisticated autonomous software agents that negotiate to achieve acceptable agreements.

The paper presents key concepts in this area and highlights influence of Agreement Technologies on development of more sophisticated multi-agent systems.

Mirjana Ivanović, Zoran Budimac

Fuzzy Systems

False Positives Reduction on Segmented Multiple Sclerosis Lesions Using Fuzzy Inference System by Incorporating Atlas Prior Anatomical Knowledge: A Conceptual Model

Detecting abnormalities in medical images is an important application of medical imaging. MRI as an imaging technique sensitive to soft tissues shows Multiple Sclerosis (MS) lesions as hyper-intense or hypo-intense signals. As manual segmentation of these lesions is a laborious and time consuming task, many methods for automatic MS lesion segmentation have been proposed. Because of inherent complexities of MS lesions together with acquisition noises and inaccurate pre-processing algorithms, automatic segmentation methods come up with some False Positives (FP). To reduce these FPs a model based on fuzzy inference system by incorporating atlas prior anatomical knowledge have been proposed. The inputs of proposed model are MRI slices, initial lesion mask, and atlas information. In order to mimic experts inferencing, proper linguistic variable are derived from inputs for better description of FPs. The experts knowledge is stored into knowledge-base in if-then like statement. This model can be developed and attached as a module to MS lesion segmentation methods for reducing FPs.

Hassan Khastavaneh, Habibollah Haron

Fuzzy Splicing Systems

In this paper we introduce a new variant of splicing systems, called

fuzzy splicing systems

, and establish some basic properties of language families generated by this type of splicing systems. We study the “fuzzy effect” on splicing operations, and show that the “fuzzification” of splicing systems can increase and decrease the computational power of splicing systems with finite components with respect to fuzzy operations and cut-points chosen for threshold languages.

Fariba Karimi, Sherzod Turaev, Nor Haniza Sarmin, Wan Heng Fong

A Preference Weights Model for Prioritizing Software Requirements

Software requirements prioritization is the act of ranking user’s requirements in order to plan for release phases. The essence of prioritizing requirements is to avoid breach of contract, trust or agreement during software development process. This is crucial because, not all the specified requirements could be implemented in a single release due to inadequate skilled programmers, time, budget, and schedule constraints. Major limitations of existing prioritization techniques are rank reversals, scalability, ease of use, computational complexities and accuracy among others. Consequently, an innovative model that is capable of addressing these problems is presented. To achieve our aim, synthesized weights are computed for criteria that make up requirements and functions were defined to display prioritized requirements based on the global weights of attributes across project stakeholders. An empirical case scenario is described to illustrate the adaptability processes of the proposed approach.

Philip Achimugu, Ali Selamat, Roliana Ibrahim

Fuzzy Logic-Based Adaptive Communication Management on Wireless Network

This paper presents a fuzzy logic-based adaptive communication management on a wireless network. A combination of both wireless network and handheld device is most widely used in the world today. The wireless network depends on the radio signal to communicate with the device. And the handheld device is the mobile node, which is difficult to determine the certain location. These unstable features have a negative influence on the communication QoS (quality of service). Therefore, we adopt the fuzzy logic to improve the communication efficiency. The access point (AP) may evaluate the communication state with the fuzzy logic. Through this, the relay station utilizes the evaluation result to handle the communication throughput. The simulation demonstrates the efficiency of our proposed model.

Taeyoung Kim, Youngshin Han, Jaekwon Kim, Jongsik Lee

Application of Self-adapting Genetic Algorithms to Generate Fuzzy Systems for a Regression Problem

Six variants of self-adapting genetic algorithms with varying mutation, crossover, and selection were developed. To implement self-adaptation the main part of a chromosome which comprised the solution was extended to include mutation rates, crossover rates, and/or tournament size. The solution part comprised the representation of a fuzzy system and was real-coded whereas to implement the proposed self-adapting mechanisms binary coding was employed. The resulting self-adaptive genetic fuzzy systems were evaluated using real-world datasets derived from a cadastral system and included records referring to residential premises transactions. They were also compared in respect of prediction accuracy with genetic fuzzy systems optimized by a classical genetic algorithm, multilayer perceptron and radial basis function neural network. The analysis of the results was performed using statistical methodology including nonparametric tests followed by post-hoc procedures designed especially for multiple



Tadeusz Lasota, Magdalena Smętek, Zbigniew Telec, Bogdan Trawiński, Grzegorz Trawiński

Information Retrieval

Analysis of Profile Convergence in Personalized Document Retrieval Systems

Modeling user interests in personalized document retrieval system is currently a very important task. The system should gather information about the user to recommend him better results. In this paper a mathematical model of user preference and profile is considered. The main assumption is that the system does not know the preference. The main aim of the system is to build a profile close to user preference based on observations of user activities. The method for building and updating user profile is presented and a model of simulation user behaviour in such system is proposed. The analytical properties of this method are considered and two theorems are presented and proved.

Bernadetta Maleszka

SciRecSys: A Recommendation System for Scientific Publication by Discovering Keyword Relationships

In this work, we propose a new approach for discovering various relationships among keywords over the scientific publications based on a Markov Chain model. It is an important problem since keywords are the basic elements for representing abstract objects such as documents, user profiles, topics and many things else. Our model is very effective since it combines four important factors in scientific publications: content, publicity, impact and randomness. Particularly, a recommendation system (called SciRecSys) has been presented to support users to efficiently find out relevant articles.

Vu Le Anh, Vo Hoang Hai, Hung Nghiep Tran, Jason J. Jung

Grouping Like-Minded Users Based on Text and Sentiment Analysis

With the growth of social media usage, the study of online communities and groups has become an appealing research domain. In this context, grouping like-minded users is one of the emerging problems. Indeed, it gives a good idea about group formation and evolution, explains various social phenomena and leads to many applications, such as link prediction and product suggestion. In this dissertation, we propose a novel unsupervised method for grouping like-minded users within social networks. Such a method detects groups of users sharing the same interest centers and having similar opinions. In fact, the proposed method is based on extracting the interest centers and retrieving the polarities from the user’s textual posts.

Soufiene Jaffali, Salma Jamoussi, Abdelmajid Ben Hamadou

A Preferences Based Approach for Better Comprehension of User Information Needs

Within Mobile information retrieval research, context information provides an important basis for identifying and understanding user’s information needs. Therefore search process can take advantage of contextual information to enhance the query and adapt search results to user’s current context. However, the challenge is how to define the best contextual information to be integrated in search process. In this paper, our intention is to build a model that can identify which contextual dimensions strongly influence the outcome of the retrieval process and should therefore be in the user’s focus. In order to achieve these objectives, we create a new query language model based on user’s pereferences. We extend this model in order to define a relevance measure for each contextual dimension, which allow to automatically classify each dimension. This latter is used to compute the degree of change in result lists for the same query enhanced by different dimensions. Our experiments show that our measure can analyze the real user’s context of up to 8000 of dimensions. We also show experimentally the quality of the set of contextual dimensions proposed, and the interest of the measure to understand mobile user’s needs and to enhance his query.

Sondess Missaoui, Rim Faiz

Interlinked Personal Story Information and User Interest in Weblog by RSS, FOAF, and SIOC Technology

Interlinked Personal Story and user interest in Weblogs to re-use information in the blog contents is a new way of communication in a Weblog field. With many existing vocabularies such as Rich Site Summary (RSS), Friend of a friend (FOAF) and Semantically Interlinked Online Community (SIOC), interlinked among blogs can successfully help users especially bloggers to find the relationship that occurs inside the contents of the blogs itself. Furthermore, nowadays, personal blog contents are more useful to serve as the answer to the Internet users’ search for information rather than existing search engines where personal blog contents are always updated. Our proposed framework system is designed to accomplish the motivation to interlink personal information weblogs and also to improve the uses of the blog among the online community.

Nurul Akhmal binti Mohd Zulkefli, Baharum bin Baharudin

Social Networks

Sustainable Social Shopping System

Shopping is one of the key activities that humans undertake that has an overwhelming influence on the economic, environmental, and health facets of their life and ultimately their sustainability. More recently social media has been used to connect vendors and consumers together to discover, share, recommend and transact goods and services. However there is a paucity of academic literature on sustainable social shopping as well as systems in industry to support the same. There is no single online shopping system that provides a holistic shopping experience to customers that allows them to balance financial, health, and environmental dimensions. To address this lacuna we propose Sustainable Social Shopping Systems as a means by which we can practically support individuals to become more sustainable and ultimately transform their lives. In this paper we propose and implement concepts, models, processes and a framework that are fundamental for the design of such systems.

Claris Yee Seung Chung, Roman Proskuryakov, David Sundaram

Grey Social Networks

A Facebook Case Study

Facebook is one of the largest socializing networks nowadays, gathering among his users a whole array of persons from all over the world, with a diversified background, culture, opinions, age and so on. Here is the meeting point for friends (both real and virtual), acquaintances, colleagues, team-mates, class-mates, co-workers, etc. Also, here is the land where the information is spreading so fast and where you can easily exchange your opinions, feelings, traveling informations, ideas, etc. But what happens when one is reading the news feed or is seeing his Facebook friends’ photos? Is he thrilled, excited? Is he feeling that the life is good? Or contrary: he is feeling lonely, isolated? Is he doing a comparison with his friends? These are some of the questions this paper in trying to answer and shaping some of these relationships, the grey system theory will be used.

Camelia Delcea, Liviu-Adrian Cotfas, Ramona Paun

Event Detection from Social Data Stream Based on Time-Frequency Analysis

Social data have been emerged as a special big data resource of rich information, which is raw materials for diverse research to analyse a complex relationship network of users and huge amount of daily exchanged data packages on Social Network Services (SNS). The popularity of current SNS in human life opens a good challenge to discover meaningful knowledge from senseless data patterns. It is an important task in academic and business fields to understand user’s behaviour, hobbies and viewpoints, but difficult research issue especially on a large volume of data. In this paper, we propose a method to extract real-world events from Social Data Stream using an approach in time-frequency domain to take advantage of digital processing methods. Consequently, this work is expected to significantly reduce the complexity of the social data and to improve the performance of event detection on big data resource.

Duc T. Nguyen, Dosam Hwang, Jason J. Jung

Understanding Online Social Networks’ Users – A Twitter Approach

Twitter messages, also known as tweets, are increasingly used by marketers worldwide to determine consumer sentiments towards brands, products or events. Currently, most existing approaches used for social networks sentiment analysis only extract simple feedbacks in terms of positive and negative perception. In this paper, TweetOntoSense is proposed - a semantic based approach that uses ontologies in order to infer the actual user’s emotions. The extracted sentiments are described using a WordNet enriched emotional categories ontology. Thus, feelings such as happiness, affection, surprise, anger, sadness, etc. are put forth. Moreover, compared to existing approaches, TweetOntoSense also takes into consideration the fact that a single tweet message might express several, rather than a single emotion. A case study on Twitter is performed, also showing this approach’s practical applicability.

Camelia Delcea, Liviu-Adrian Cotfas, Ramona Paun

E-learning Systems

Intelligent e-Learning/Tutoring – The Flexible Learning Model in LMS Blackboard

An insight into a current concept of teaching/learning is introduced in this paper. The article encompasses two main areas which inherently blend together: didactic area representing the theoretical background and the practical level covering the real current educational situation in teaching/learning through online courses. The paper introduces an example of smart solution of e-learning system adjusting to individual learning preferences of each student.

Ivana Simonova, Petra Poulova, Pavel Kriz, Michal Slama

Building Educational and Marketing Models of Diffusion in Knowledge and Opinion Transmission

Group communication and diffusion of information and opinion are important but unresearched aspect of collective intelligence. In this paper a number of hypotheses are proposed in discussed. Each hypothesis proven would be a considerable step towards creating a complete and coherent model of group communication, that could be used both in computer and human sciences. This paper also discusses some methodology that may be used by researchers to determine the hypotheses.

Marcin Maleszka, Ngoc Thanh Nguyen, Arkadiusz Urbanek, Miroslawa Wawrzak-Chodaczek

Semantic Model of Syllabus and Learning Ontology for Intelligent Learning System

The syllabus is a blueprint of course for teaching and learning because it contains the important meaning of promise between instructor and students in higher education and university. However, the current most of all syllabus management systems provide simple functionalities including creation, modification, and retrieval of the unstructured syllabus. In this paper, our approach consists of a definition of the ontological structure of the syllabus and semantic relationships of syllabuses, classification and integration of the syllabus based on ACM/IEEE computing curriculum, and formalization of learning goals, learning activity, and learning evaluation in syllabus using Bloom’s taxonomy for improving the usability of the syllabus. Also, we propose an effective method for enhancing the learning effect of students through the construction of subject ontology, which is used in discussion, visual presentation, and knowledge sharing between instructor and students. We prove the retrieval and classification correctness of our proposed methods according to experiments and performance evaluations.

Hyun-Sook Chung, Jung-Min Kim

Creating Collaborative Learning Groups in Intelligent Tutoring Systems

Intelligent Tutoring Systems offer an attractive learning environment where learning process is adapted to students’ needs and preferences. More than 20 years of academic research demonstrates that learning in groups is more effective than learning individually. Therefore, it is motivating to work out procedure allowing a collaborative learning in Intelligent Tutoring Systems. In this paper original algorithm for creating collaborative learning groups is proposed. The research showed that students working in groups (generated by the proposed algorithm) achieved 18% better results than students working in randomly generated groups. It proves the effectiveness of the proposed algorithm and demonstrates that creating suitable learning groups is very important.

Jarosław Bernacki, Adrianna Kozierkiewicz-Hetmańska

Pattern Recognition

Method of Driver State Detection for Safety Vehicle by Means of Using Pattern Recognition

Evolution of preventive safety devices for vehicles is highly expected to reduce the number of traffic accidents. Driver’s state adaptive driving support safety function may be one of solutions of the challenges to lower the risk of being involved in the traffic accident. In the previous study, distraction was identified as one of anormal states of a driver by introducing the Internet survey. This study reproduced driver’s cognitive distraction on a driving simulator by imposing cognitive loads, which were arithmetic and conversation. For classification of a driver’s distraction state, visual features such as gaze direction and head orientation, pupil diameter and heart rate from ECG were employed as recognition features. This study focused to acquire the best classification performance of driver’s distraction by using the AdaBoost, the SVM and Loss-based Error-Correcting Output Coding (LD-ECOC) as classification algorithm. LD-ECOC has potential to further enhance the classification capability of the driver’s psychosomatic states. Finally this study proposed next generation driver’s state adaptive driving support safety function to be extendable to Vehicle-Infrastructure cooperative safety function.

Masahiro Miyaji

Motion Segmentation Using Optical Flow for Pedestrian Detection from Moving Vehicle

This paper proposes a pedestrian detection method using optical flows analysis and Histogram of Oriented Gradients (HOG). Due to the time consuming problem in sliding window based, motion segmentation proposed based on optical flow analysis to localize the region of moving object. A moving object is extracted from the relative motion by segmenting the region representing the same optical flows after compensating the ego-motion of the camera. Two consecutive images are divided into grid cells 14x14 pixels, then tracking each cell in current frame to find corresponding cells in the next frame. At least using three corresponding cells, affine transformation is performed according to each corresponding cells in the consecutive images, so that conformed optical flows are extracted. The regions of moving object are detected as transformed objects are different from the previously registered background. Morphological process is applied to get the candidate human region. The HOG features are extracted on the candidate region and classified using linear Support Vector Machine (SVM). The HOG feature vectors are used as input of linear SVM to classify the given input into pedestrian/non-pedestrian. The proposed method was tested in a moving vehicle and shown significant improvement compare with the original HOG.

Joko Hariyono, Van-Dung Hoang, Kang-Hyun Jo

Articular Cartilage Defect Detection Based on Image Segmentation with Colour Mapping

This article addresses a possible approach for a higher quality diagnosis and detection of the pathological defects of articular cartilage. The defects of articular cartilage are one of the most common pathologies of articular cartilage that a physician encounters. In clinical practice, doctors can only estimate visually whether or not there is a pathological defect with the use of magnetic resonance images. Our proposed methodology is able to accurately and precisely localize ruptures of cartilaginous tissue and thus greatly contribute to improving a final diagnosis. When analysing MRI data, we work only with grey-levels, which is rather complicated for producing a quality diagnosis. Our proposed algorithm, based on fuzzy logic, brings together various shades of grey. Each set is assigned a colour that corresponds to the density of the tissue. With this procedure, it is possible to create a contrast map of individual tissue structures and very clearly identify where cartilaginous tissues have been interrupted. The suggested methodology has been tested using real data from magnetic resonance images of 60 patients from Podlesí Hospital in Třinec and currently this method is being put into clinical practice.

Jan Kubicek, Marek Penhaker, Iveta Bryjova, Michal Kodaj

Enhanced Face Preprocessing and Feature Extraction Methods Robust to Illumination Variation

This paper presents an enhanced facial preprocessing and feature extraction technique for an illumination-roust face recognition system. Overall, the proposed face recognition system consists of a novel preprocessing descriptor, a differential two-dimensional principal component analysis technique, and a fusion module as sequential steps. In particular, the proposed system additionally introduces an enhanced center-symmetric local binary pattern as preprocessing descriptor to achieve performance improvement. To verify the proposed system, performance evaluation was carried out using various binary pattern descriptors and recognition algorithms on the extended Yale B database. As a result, the proposed system showed the best recognition accuracy of 99.03% compared to other approaches, and we confirmed that the proposed approach is effective for consumer applications.

Dong-Ju Kim, Myoung-Kyu Sohn, Hyunduk Kim, Nuri Ryu

Facial Expression Recognition Using Binary Pattern and Embedded Hidden Markov Model

This paper proposes a robust facial expression recognition approach using an enhanced center-symmetric local binary pattern (ECS-LBP) and embedded hidden Markov model (EHMM). The ECS-LBP operator encodes the texture information of a local face region by emphasizing diagonal components of a previous center-symmetric local binary pattern (CS-LBP). Here, the diagonal components are emphasized because facial textures along the diagonal direction contain much more information than those of other directions. Generally, feature extraction and categorization for facial expression recognition are the most key issue. To address this issue, we propose a method to combine ECS-LBP and EHMM, which is the key contribution of this paper. The performance evaluation of proposed method was performed with the CK facial expression database and the JAFFE database, and the proposed method showed performance improvements of 2.65% and 2.19% compared to conventional method using two-dimensional discrete cosine transform (2D-DCT) and EHMM for CK database and JAFFE database, respectively. Through the experimental results, we confirmed that the proposed approach is effective for facial expression recognition.

Dong-Ju Kim, Myoung-Kyu Sohn, Hyunduk Kim, Nuri Ryu

Expert Systems and Applications

Creating a Knowledge Base to Support the Concept of Lean Manufacturing Using Expert System NEST

This article deals with lean manufacturing principles and its model. We describe basic principles, metrics and rules for creating lean manufacturing knowledge base. The case study included in this paper deals with creating of a knowledge base that supports an implementation of the concept of lean manufacturing. The knowledge base could be used for identification of waste in each level of production areas. The knowledge base also can be used for a recommendation of appropriate methods and tools of industrial engineering to reduce the waste. The knowledge base is build using the expert system NEST.

Radim Dolák, Jan Górecki, Lukáš Slechan, Michael Kubát

A Cognitive Integrated Management Support System for Enterprises

This paper presents the design and implementation of the scalable and open multi-agent Cognitive Integrated Management Information System (CIMIS) as an application of computational collective intelligence. The system allows for supporting the management processes related with all the domain of enterprise’s functioning. The system is based on LIDA cognitive agent architecture, described shortly in the first part of the paper. The main part of article presents the logical architecture of CIMIS. The examples of selected agent’s functionality are discussed at the last part of article.

Marcin Hernes

Combining Time Series and Clustering to Extract Gamer Profile Evolution

Video-games industry is specially focused on user entertainment. It is really important for these companies to develop interactive and usable games in order to satisfy their client preferences. The main problem for the game developers is to get information about the user behaviour during the game-play. This information is important, specially nowadays, because gamers can buy new extra levels, or new games, interactively using their own consoles. Developers can use the gamer profile extracted from the game-play to create new levels, adapt the game to different user, recommend new video games and also match up users. This work tries to deal with this problem. Here, we present a new game, called “Dream”, whose philosophy is based on the information extraction process focused on the player game-play profile and its evolution. We also present a methodology based on time series clustering to group users according to their profile evolution. This methodology has been tested with real users which have played Dream during several rounds.

Héctor D. Menéndez, Rafael Vindel, David Camacho

Rehandling Problem of Pickup Containers under Truck Appointment System

This paper studies rehandling strategies for pickup containers in marine container terminals where a truck appointment system (TAS) is in place. The main purpose of the TAS is to address the imbalance of peaks and troughs of truck arrival times, thereby reducing the number of external trucks during peak hours and improving their turnaround time. This study suggests that the TAS can also be used to improve the efficiency of yard handlings for pickup containers, thus improving the productivity of yard handling equipment. To this end, a stochastic dynamic programming (SDP) model was proposed considering the truck appointment information. A branch-and-bound (B&B) approach was shown to be able to provide the exact solution to calculate the expected number of rehandlings in the decision tree. To overcome the computational restriction of the exact solution, a heuristic was proposed and its performance was compared with that of the B&B approach.

Dusan Ku

Emergent Concepts on Knowledge Intensive Processes

An approach to refine and revise the general framework of KiP (Knowledge Intensive Process) is presented. The specific case of collaborative KiP is studied and the prominent role of collaborative KiPs in the general context of Business Processes is revealed. The approach is based on Formal Concept Analysis.

Gonzalo A. Aranda-Corral, Joaquín Borrego-Díaz, Juan Galán-Páez, Antonio Jiménez-Mavillard

GIS Applications

Optimal Partial Rotation Error for Vehicle Motion Estimation Based on Omnidirectional Camera

This paper presents a method for robust motion estimation using an optimal partial rotation error based on spirits of the rotation averaging and the minimum spanning tree approaches. The advantage of an omnidirectional camera is that allows tracking landmarks over long-distance travel and large rotation of vehicle motions. The method does not process the optimal rotation at every frame due to the computational time, instead that, the optimal rotation error is applied for each interval of motion called partial motion so that the set of landmarks are tracked in all sequent images. This approach takes advantage of partial optimal error for reducing the divergences of estimated trajectory results in long-distance travel. The global motion of the vehicle is estimated in high accuracy based on utility of the optimal partial rotation error based on the rotation averaging method, which contrasts with traditional bundle adjustment using the minimum Euclid distance of back-projection errors. The experimental results demonstrate the effectiveness of this method under the large view scene in the outdoor environments.

Van-Dung Hoang, Kang-Hyun Jo

A Smart Mobility System Implemented in a Geosocial Network

The continuous evolution of internet and web 2.0 technologies facilitates the creation of dynamic content. Social networks with georreference can be helpful to handle information from different sources and provide user-oriented services. Among these applications we can consider the intelligent systems for mobility.In this paper we introduce our geosocial network platform called Vidali. The open source social platform Vidali is developed provides a set of tools for the benefit of interactivity and collaboration between people and the provision of location-based services, which creates an environment that enhances collective intelligence. Starting with this platform as base, we developed a solution to improving mobility in local environments, which includes among other features the management of shared vehicles. We discuss the design and implementation of Vidali and of a smart mobility system.

Cristopher David Caamana Gómez, Julio Brito Santana

A Prototype of Mobile Speed Limits Alert Application Using Enhanced HTML5 Geolocation

This study proposes the HTML5 geolocation-based vehicle speed alert application aims to facilitate the passengers who could not see the information on the dashboard of vehicle. The traditional vehicle speed determined from HTML5 geolocation API is improved using haversine distance calculation. The speed limit value is automatically set according to the specify type of vehicle and the current type of road. The prototype was developed and tested under the transportation regulations in Thailand. The result reveals that the enhanced HTML5 geolocation speed determination using haversine distance significantly improves the accuracy of vehicle speed detection compared with the traditional HTML5 geolocation API.

Worapot Jakkhupan

Data-Driven Pedestrian Model: From OpenCV to NetLogo

Our objective was to replicate the movement of real pedestrians in NetLogo agent-based model using the video recording of pedestrians as the source of reliable data. To achieve this, it was necessary to develop the video-processing extension for NetLogo. The paper presents the principles of video data transformation, the implementation of the extension and the experiment with a sample video stream that demonstrates the self-organization of bi-directional flows of walkers. The extension builds on the computer vision library OpenCV.

Jan Procházka, Kamila Olševičová

Extending HITS Algorithm for Ranking Locations by Using Geotagged Resources

The paper focuses on using geotagged resources from the social network service (SNS) for searching the famous places from keyword. We extend the HITS[9] algorithm in order to rank locations which are collected from geotagged resources on SNS. Our approach not only uses the similarity measurement between locations’tags for computing the value of locations but also calculate the term frequency of tags which occur in each location to modify the value of tags for ranking. We implement and show the experimental results with the set of locations from the geotagged resources.

Xuan Hau Pham, Tuong Tri Nguyen, Jason J. Jung, Dosam Hwang

Computational Intelligence

Solving the Permutation Problem Efficiently for Tabu Search on CUDA GPUs

NVIDIA’s Tesla Graphics Processing Units (GPUs) have been used to solve various kinds of long running-time applications because of their high performance compute power. A GPU consists of hundreds or even thousands processor cores and adopts (Single Instruction Multiple Threading) SIMT) architecture. This paper proposes an approach that optimizes the Tabu Search algorithm for solving the Permutation Flowshop Scheduling Problem (PFSP) on a GPU. We use a math function to generate all different permutations, avoiding the need of placing all the permutations in the global memory. Experimental results show that the GPU implementation of our proposed Tabu Search for PFSP runs up to 90 times faster than its CPU counterpart.

Liang-Tsung Huang, Syun-Sheng Jhan, Yun-Ju Li, Chao-Chin Wu

A Genetic Programming Based Framework for Churn Prediction in Telecommunication Industry

Customer defection is critically important since it leads to serious business loss. Therefore, investigating methods to identify defecting customers (i.e. churners) has become a priority for telecommunication operators. In this paper, a churn prediction framework is proposed aiming at enhancing the ability to forecast customer churn. The framework combine two heuristic approaches: Self Organizing Maps (SOM) and Genetic Programming (GP). At first, SOM is used to cluster the customers in the dataset, and then remove outliers representing abnormal customer behaviors. After that, GP is used to build an enhanced classification tree. The dataset used for this study contains anonymized real customer information provided by a major local telecom operator in Jordan. Our work shows that using the proposed method surpasses various state-of-the-art classification methods for this particular dataset.

Hossam Faris, Bashar Al-Shboul, Nazeeh Ghatasheh

Genetic Programming with Dynamically Regulated Parameters for Generating Program Code

Genetic Programming (GP) is one of the Evolutionary Algorithms. There are many theories concerning automatic code generation. In this article we present the latest research of using our dynamic scaling parameter in Genetic Programming to create a code. We have created practically functioning program code with the dynamic instruction set for


language. For testing we have chosen the best known problems. Our investigations of the best range of each parameter were based on our preliminary experiments.

Tomasz Łysek, Mariusz Boryczka

A Guidable Bat Algorithm Based on Doppler Effect to Improve Solving Efficiency for Optimization Problems

A new guidable bat algorithm (GBA) based on Doppler Effect is proposed to improve problem-solving efficiency of optimization problems. Three searching polices and three exploration strategies are designed in the proposed GBA. The bats governed by GBA are enabled the ability of guidance by frequency shift based on Doppler Effect so that the bats are able to rapidly fly toward the current best bat in guidable search. Both refined search and divers search is employed to explore the better position near the current best bat and develop new searching area. These searching polices benefit discover the eligible position to upgrade the quality of position with the current best bat in a short time. In addition, next-generation evolutionary computing (EC 2.0) is created to breaks the bottleneck of traditional ECs to create the new paradigm in ECs. In EC 2.0, conflict theory is introduced to help the efficiency of solution discovery. Conflict between individuals is healthful behavior for population evolution. Constructive conflict promotes the overall quality of population. Conflict, competition and cooperation are the three pillars of collective effects investigated in this study. The context-awareness property is another feature of EC 2.0. The context-awareness indicates that the individuals are able to perceive the environmental information by physic laws.

Yi-Ting Chen, Chin-Shiuh Shieh, Mong-Fong Horng, Bin-Yih Liao, Jeng-Shyang Pan, Ming-Te Tsai

Collective Detection of Potentially Harmful Requests Directed at Web Sites

The number of web-based activities and websites is growing every day. Unfortunately, so is cyber-crime. Every day, new vulnerabilities are reported and the number of automated attacks is constantly rising. Typical signature-based methods rely on expert knowledge and the distribution of updated information to the clients (e.g. anti-virus software) and require more effort to keep the systems up to date. At the same time, they do not protect against the newest (e.g. zero-day) threats. In this article, a new method is proposed, whereas cooperating systems analyze incoming requests, identify potential threats and present them to other peers. Each host can then utilize the findings of the other peers to identify harmful requests, making the whole system of cooperating servers “remember” and share information about the threats.

Marek Zachara

Ontologies, Graphs and Networks

Increasing the Efficiency of Ontology Alignment by Tracking Changes in Ontology Evolution

In this paper we present a development of our ontology alignment framework based on varying semantics of attributes. Emphasising the analysis of explicitly given descriptions of how attributes change meanings they entail while being included within different concepts have been proved useful. Moreover, we claim that it is consistent with the intuitive way how people see the real world and how they find similarities and correspondences between its elements. In this paper we concentrate on the issue of tracking changes that may occur within aligned ontologies and how these potential changes can influence the process of finding new mappings or validating ones that have already been found.

Marcin Pietranik, Ngoc Thanh Nguyen, Cezary Orłowski

Rule-Based Reasoning System for OWL 2 RL Ontologies

In this paper we present a method of transforming OWL 2 ontologies into a set of rules which can be used in a forward chaining rule engine. We use HermiT reasoner to perform the TBox reasoning and to produce classified form of an ontology. The ontology is automatically transformed into a set of Abstract Syntax of Rules and Facts. Then, it can be transformed into any forward chaining reasoning engine. We present an implementation of our method using two engines: Jess and Drools. We evaluate our approach by performing the ABox reasoning on the number of benchmark ontologies. Additionally, we compare obtained results with inferences provided by the HermiT reasoner. The evaluation shows that we can perform the ABox reasoning with considerably better performance than HermiT. We describe the details of our approach as well as future research and development.

Jaroslaw Bak, Czeslaw Jedrzejek

A Consensus-Based Method for Solving Concept-Level Conflict in Ontology Integration

Ontology reuse has been an important factor in developing shared knowledge in Semantic Web. The ontology reuse enables knowledge sharing more easily between intelligent ontology-based systems. However, this cannot completely reduce conflict potentials in ontology integration. This paper presents a method based on the consensus theory and a evaluation function of similarity measure between concepts which is used for a proposed algorithm for ontology integration at the concept level.

Trung Van Nguyen, Hanh Huu Hoang

Betweenness versus Linerank

In our paper we compare two centrality measures of networks, namely betweenness and Linerank. Betweenness is a popular, widely used measure, however, its computation is prohibitively expensive for large networks, which strongly limits its applicability in practice. On the other hand, the calculation of Linerank remains manageable even for graphs of billion nodes, therefore it was offered as a substitute of betweenness in [4]. Nevertheless, to the best of our knowledge the relationship between the two measures has never been seriously examined. As a first step of our experiments we calculate the Pearson’s and Spearman’s correlation coefficients for both the node and edge variants of these measures. In the case of the edges the correlation is varying but tends to be rather low. Our tests with the Girvan-Newman algorithm for detecting clusters in networks [7] also underlie that edge betweenness cannot be substituted with edge Linerank in practice. The results for the node variants are more promising. The correlation coefficients are close to 1 almost in all cases. Notwithstanding, in the practical application in which the robustness of social and web graphs to node removal is examined node betweenness still outperforms node Linerank, which shows that even in this case the substitution still remains a problematic issue. Beside these investigations we also clarify how Linerank should be computed on undirected graphs.

Balázs Kósa, Márton Balassi, Péter Englert, Attila Kiss

On Decomposing Integration Tasks for Hierarchical Structures

Hierarchical structures have became a common data structure in modern applications, thus the needs to process them efficiently has increased. In this paper we provide a full description of a mathematical model of complex tree integration, whicha allows the representation of various types of hierarchical structures and integration tasks. We use this model to show that it is possible to decompose some integration tasks by splitting them into subtasks. Thanks to this procedure it is possible to solve large integration tasks faster, without the need to develop new, less computationally complex, algorithms. Decomposition is an important step towards developing methods of multi-stage integration of hierarchical structures.

Marcin Maleszka

Machine Learning

A Web-Based Multi-Criteria Decision Making Tool for Software Requirements Prioritization

Multiple-criteria decision making (MCDM) is widely used in ranking choices from a set of available alternatives with respect to multiple criteria. To analytically rank requirements under various criteria, we propose a tool called requirements prioritizer (RP) which has the capacity of keeping records of project stakeholders with their relative weights against each requirement, utilized by the system to compute an ordered list of prioritized requirements. The proposed approach offers a novel way of involving stakeholders in the entire decision making process irrespective of their numbers in an automated fashion. In this proposed approach, the relative weights assigned by each stakeholder are normalized and aggregated. The output of the system consists of prioritized requirements with an automatically generated graph showing the relative values of requirements across project stakeholders in a chronological order.

Philip Achimugu, Ali Selamat, Roliana Ibrahim

The Selection of Multicriteria Method Based on Unstructured Decision Problem Description

Decision support processes and methods require applying numerous mathematical transformations, including one of the developed processes of multicriteria analysis. The core of most existing processes is usually one of the multicriteria decision aid methods (MCDA). The paper presents research focused on identifying which factors of a decision situation are significant for selecting a multicriteria method. The identified factors were analyzed with data-mining methods. Conclusions contain an outline of factors of decision situations that support MCDA methods to support decisions in particular situations.

Jarosław Wątróbski, Jarosław Jankowski, Zbigniew Piotrowski

Multi-criteria Utility Mining Using Maximum Constraints

Most of the existing studies in utility mining use a single minimum utility threshold to determine whether an item is a high utility item. This way is, however, hard to reflect the nature of items. This work thus presents another viewpoint about defining the minimum utilities of itemsets. The maximum constraint is adopted, which is well explained in the text and suitable to some mining domains when items have different utility values. In addition, an effective two-phase mining approach is proposed to cope with the problem of multi-criteria utility mining under maximum constraints. The experimental results show the performance of the proposed approach.

Guo-Cheng Lan, Tzung-Pei Hong, Yu-Te Chao

Evaluation of Neural Network Ensemble Approach to Predict from a Data Stream

We have recently worked out a method for building reliable predictive models from a data stream of real estate transactions which applies the ensembles of genetic fuzzy systems and neural networks. The method consists in building models over the chunks of a data stream determined by a sliding time window and enlarging gradually an ensemble by models generated in the course of time. The aged models are utilized to compose ensembles and their output is updated with trend functions reflecting the changes of prices in the market. In the paper we present the next series of extensive experiments to evaluate our method with the ensembles of artificial neural networks. We examine the impact of the number of aged models used to compose an ensemble on the accuracy and the influence of the degree of polynomial trend functions employed to modify the results on the performance of neural network ensembles. The experimental results were analysed using statistical approach embracing nonparametric tests followed by post-hoc procedures designed for multiple



Zbigniew Telec, Bogdan Trawiński, Tadeusz Lasota, Grzegorz Trawiński

Data Mining

Some Novel Improvements for MDL-Based Semi-supervised Classification of Time Series

In this paper, we propose two novel improvements for semi-supervised classification of time series: an improvement technique for Minimum Description Length-based stopping criterion and a refinement step to make the classifier more accurate. Our first improvement applies the non-linear alignment between two time series when we compute Reduced Description Length of one time series exploiting the information from the other. The second improvement is a post-processing step that aims to identify the class boundary between positive and negative instances accurately. Experimental results show that our two improvements can construct more accurate semi-supervised time series classifiers.

Vo Thanh Vinh, Duong Tuan Anh

A Novel Method for Mining Class Association Rules with Itemset Constraints

Mining class association rules with itemset constraints is very popular in mining medical datasets. For example, when classifying which populations are at high risk for the HIV infection, epidemiologists often concentrate on rules which include demographic information such as sex, age, and marital status in the rule antecedents. However, two existing methods, post-processing and pre-processing, require much time and effort. In this paper, we propose a lattice-based approach for efficiently mining class association rules with itemset constraints. We first build a lattice structure to store all frequent itemsets. We then use paternity relations among nodes to discover rules satisfying the constraint without re-building the lattice. The experimental results show that our proposed method outperforms other methods in the mining time.

Dang Nguyen, Bay Vo, Bac Le

A PWF Smoothing Algorithm for K-Sensitive Stream Mining Technologies over Sliding Windows

The development of Streaming Mining technologies as a hotspot entered the limelight, which is more effectively to avoid big data and distributed streams mining problems. Especially for the


and Ubiquitous Computing may interact with the real world’s humans and physical objects in a sensory manner. They require quantitative guarantees regarding the precision of approximate answers and support distributed processing of high-volume, fast, and variety streams. Recent works on mining Top-


synopsis processing over data streams is that utilize all the data between a particular point of landmark and the current time for mining. Actually, the landmark and parameter


are two more important factors to obtain high-quality approximate results. Therefore, we proposed a Proper-Wavelet Function (


) algorithm to smooth the approximate approach, in order to reduce


-effect to the final approximate results. Finally, we demonstrate the effectiveness of our algorithm in achieving high-quality


-nearest neighbors mining results with applying wider proper



Ling Wang, Zhao Yang Qu, Tie Hua Zhou, Xiu Ming Yu, Keun Ho Ryu

Subsume Concept in Erasable Itemset Mining

In recent year, erasable itemset mining is an interesting problem in supply chain optimization problem. In the previous works, we presented dPidset structure, a very effective structure for mining erasable itemsets. The dPidset structure improves the preferment compared with the previous structures. However, the mining time is still large. Therefore, in this paper, we propose a new approach using the subsume concept for mining effectively erasable itemsets. The subsume concept helps early determine information of a large number of erasable itemsets without usual computational cost. The experiment was conducted to show the effectiveness of using subsume concept in the mining erasable itemsets process.

Giang Nguyen, Tuong Le, Bay Vo, Bac Le, Phi-Cuong Trinh

Analyzing the Behavior and Text Posted by Users to Extract Knowledge

With the explosion of Web 2.0 platforms such as blogs, discussion forums, andsocial networks, Internet users can express their feelings and share information among themselves. This behavior leads to an accumulation of an enormousamount of information.Among these platforms are so-called microblogs. Microblogging(e.g. Twitter1), as a new form of online communication in whichusers talk about their daily lives, publish opinions or share information by short posts, hasbecome one of the most popular social networking services today, which makes it potentially alarge information base attracting increasing attention of researchers in the field of knowledgediscovery and data mining.Several works have proposed tools for tweets search, but, this area is still not well exploited. Our work consists of examining the role and impact of social networks, in particular microblogs, on public opinion. We aim to analyze the behavior and text posted by users to extract knowledge that reflect the interests and opinions of a population.This gave us the idea to offer new tool more developed that uses new features such as audience and RetweetRank for ranking relevant tweets. We investigate the impact of these criteria on the search’s results for relevant information. Finally, we propose a new metric to improve the results of the searches in microblogs. More accurately, we propose a research model that combines content relevance, tweet relevance and author relevance. Each type of relevance is characterized by a set of criteria such as audience to assess the relevance of the author, OOV (Out Of Vobulary) to measure the relevance of content and others. To evaluate our model, we built a knowledge management system. We used a collection of subjective tweets talking about Tunisian actualities in 2012.

Soumaya Cherichi, Rim Faiz

Cooperation and Collective Knowledge

Common-Knowledge and Cooperation Management II S4n-Knowledge Model Case

Issues of moral hazard and adverse selection abound in each and every contract where one has a self interest and information that the other party does not possess, and there is still need for more information on how you handle a party to a contract with more information than you. This paper re-examines the issue in the framework of a principal-agent model under uncertainty. We highlight epistemic conditions for a possible resolution of the moral hazard between the principal and the agents with


-knowledge, and we show that if the principalr and agents commonly know each agent’s belief on the others’ efforts, then all effort levels such that the expected marginal costs actually coincide for them can be characterised as the critical points of the refunded proportional rate function. This implies our recommendation that, for removing out such moral hazard in the principal-agents cooperation, the principal and agents should commonly know their beliefs on the others’ effort levels.

Takashi Matsuhisa

Modelling Mediator Assistance in Joint Decision Making Processes Involving Mutual Empathic Understanding

In this paper an agent model for mediation in joint decision-making processes is presented for establishing mutual empathic understanding. Elicitation of affective states is an important criterion of empathy. In unassisted joint decision-making it can be difficult to recognise whether empathic responses are the result of experiencing the other individual’s affective state, or whether these affective states are at least partly blended with own states that would also have developed in individual decision-making. The mediator agent assists two individual social agents in establishing and expressing empathy, as a means to develop solidly grounded joint decisions.

Rob Duell, Jan Treur

Real-Time Head Pose Estimation Using Weighted Random Forests

In this paper we proposed to real-time head pose estimation based on weighted random forests. In order to make real-time and accurate classification, weighted random forests classifier, was employed. In the training process, we calculate accuracy estimation using preselected out-of-bag data. The accuracy estimation determine the weight vector in each tree, and improve the accuracy of classification when the testing process. Moreover, in order to make robust to illumination variance, binary pattern operators were used for preprocessing. Experiments on public databases show the advantages of this method over other algorithm in terms of accuracy and illumination invariance.

Hyunduk Kim, Myoung-Kyu Sohn, Dong-Ju Kim, Nuri Ryu

An Integer Programming Approach for Two-Sided Matching with Indifferences

To make use of the collective intelligence of many autonomous self-interested agents, it is important to form a team on which all the agents agree. Two-sided matching is one of the basic approaches to form a team that consists of agents from two disjoint agent groups. Traditional two-sided matching assumes that an agent has a totally ordered preference list of the agents it is to be paired with, but it is unrealistic to have a totally ordered list for a large-scale two-sided matching problem. In this paper, we propose an integer programming based approach to solve a two-sided matching program that allows indifferences in agents’ preferences, and show how an objective function can be defined to find a matching that minimizes the maximum discontentedness of agents in one group.

Naoki Ohta, Kazuhiro Kuwabara

DC Programming and DCA for Nonnegative Matrix Factorization

Techniques of matrix factorization or decomposition always play a central role in numerical analysis and statistics with many applications in real-world problems. Recently, the NMF dimension-reduction technique, popularized by Lee and Seung with their multiplicative update algorithm (an adapted gradient approach) has drawn much attention of researchers and practitioners. Since many of existing algorithms lack a firm theoretical foundation, and designing efficient scalable algorithms for NMF still is a challenging problem, we investigate DC programming and DCA for NMF.

Hoai An Le Thi, Tao Pham Dinh, Xuan Thanh Vo

Computational Swarm Intelligence

An Ant Colony Optimization Algorithm for an Automatic Categorization of Emails

This article presents a new approach to an automatic categorization of email messages which is based on Ant Colony Optimization algorithms (ACO). The aim of this paper is to create an algorithm that would allow one to improve the classification of emails into folders (the email foldering problem) by using solutions that have been applied in Ant Colony algorithms, data mining and Social Network Analysis (SNA). The new algorithm which is proposed here has been tested on the publicly available Enron email data set. The obtained results confirm that this approach allows one to improve the accuracy with which new emails are assigned to particular folders based on an analysis of previous correspondence.

Urszula Boryczka, Barbara Probierz, Jan Kozak

Goal-Oriented Requirements for ACDT Algorithms

This paper is devoted to the new application of the ACDF approach. In this work we propose a new way of an virtual-ant performance evaluation. This approach concentrates on the decision tree construction using ant colony metaphor the goal of experiments is to show that decision trees construction may by oriented not only at accuracy measure. The proposed approach enables (depending on the decision tree quality measure) the decision tree construction with high value of accuracy, recall, precision, F-measure or Matthews correlation coefficient. It is possible due to use of nondeterministic, probabilistic approach - Ant Colony Optimization. The algorithm proposed was examined and the experimental study confirmed that the goal-oriented ACDT can create expected decision trees, accordance to the specified measures.

Jan Kozak, Urszula Boryczka

Implementing Population-Based ACO

Population-based ant colony optimization (PACO) is one of the most efficient ant colony optimization (ACO) algorithms. Its strength results from a pheromone memory model in which pheromone values are calculated based on a population of solutions. In each iteration an iteration-best solution may enter the population depending on an update strategy specified. When a solution enters or leaves the population the corresponding pheromone trails are updated. The article shows that the PACO pheromone memory model can be utilized to speed up the process of selecting a new solution component by an ant. Depending on the values of parameters, it allows for an implementation which is not only memory efficient but also significantly faster than the standard approach.

Rafał Skinderowicz

Finding Optimal Strategies in the Coordination Games

In this article we present a new algorithm which is capable to find optimal strategies in the coordination games. The coordination game refers to a large class of environments where there are multiple equilibria. We propose a approach based on the Differential Evolution where the fitness function is used to calculate the maximum deviation from the optimal strategy. The Differential Evolution (DE) is a simple and powerful optimization method, which is mainly applied to continuous problems. Thanks to the special operator of the adaptive mutation, it is possible to direct the searching process within the solution space. The approach used in this article is based on the probability of chosing the single pure strategy.

Przemyslaw Juszczuk

Cryptanalysis of Transposition Cipher Using Evolutionary Algorithms

This paper presents how techniques such as evolutionary algorithms (


) can optimize complex cryptanalysis processes. The main goal of this article is to introduce a special algorithm, which allows executing an effective cryptanalysis attack on a ciphertext encoded with a classic transposition cipher. In this type of cipher, the plaintext letters are modified by permutation. The most well-known problem, which is often solved with optimization techniques operating on a set of permutations, is the Travelling Salesman Problem (


). The mentioned algorithm uses a specially prepared function of assessment of the individuals with a set of genetic operators, used in the case of



Urszula Boryczka, Kamil Dworak

Collective Intelligence in Web Systems - Web Systems Analysis

Improved Video Scene Detection Using Player Detection Methods in Temporally Aggregated TV Sports News

Many strategies of content-based indexing have been proposed to recognize sports disciplines in sports news videos. It may be achieved by player scenes analyses leading to the detection of playing fields, of superimposed text like player or team names, identification of player faces, detection of lines typical for a given playing field and for a given sports discipline, recognition of player and audience emotions, and also detection of sports objects and clothing specific for a given sports category. The analysis of TV sports news usually starts by the automatic temporal segmentation of videos, recognition, and then classification of player shots and scenes reporting the sports events in different disciplines. Unfortunately, it happens that two (or even more) consecutive shots presenting two different sports events although events of the same discipline are detected as one shot. The strong similarity mainly of colour of playing fields makes it difficult to detect a cut. The paper examines the usefulness of player detection methods for the reduction of undetected cuts in temporally aggregated TV sports news videos leading to better detection of events in sports news. This approach has been tested in the Automatic Video Indexer AVI.

Kazimierz Choroś

An Overlapped Motion Compensated Approach for Video Deinterlacing

In this paper a block-based motion compensated, contour-preserving deinterlacing method is proposed. It classifies the frame texture according to its contours content and adapts the motion estimation and interpolation in order to ensure high quality image reconstruction. As frame reconstruction is block-based, overlapped motion compensation with adaptive low-pass filters is employed in order to avoid blocking artifacts. The experimental results show significant improvement of the proposed method over classical motion compensated and adaptive deinterlacing techniques.

Shaunak Ganguly, Shaumik Ganguly, Maria Trocan

Enhancing Collaborative Filtering Using Semantic Relations in Data

Recommender Systems (RS) pre-select and filter information according to the needs and preferences of the user. Users express their interest in items by giving their opinion (explicit data) and navigating through the webpages (implicit data). In order to personalize users experience, recommender systems exploit this data by offering the items that the user could be more interested in. However, most of the RS do not deal with domain independency and scalability. In this paper, we propose a scalable and reliable recommender system based on semantic data and Matrix Factorization. The former increases the recommendations quality and domain independency. The latter offers scalability by distributing treatments over several machines. Consequently, our proposition offers quality in user’s personalization in interchangeable item’s environments, but also alleviates the system by balancing load among distributed machines.

Manuel Pozo, Raja Chiky, Zakia Kazi-Aoul

Security Incident Detection Using Multidimensional Analysis of the Web Server Log Files

The paper presents the results of the research related to security analysis of web servers. The presented method uses the web server log files to determine the type of the attack against the web server. The web server log files are collections of text strings describing users’ requests, so one of the most important part of the work was to propose the method of conversion informative part of the requests, to numerical values to make possible further automatic processing. The vector of values obtained as the result of web server log file processing is used as the input to Self-Organizing Map (SOM) network. Finally, the SOM network has been trained to detect SQL injections and brute force password guessing attack. The method has been validated using the data obtained from a real data center.

Grzegorz Kołaczek, Tomasz Kuzemko

Analysis of Differences between Expected and Observed Probability of Accesses to Web Pages

The paper introduces an alternative method for website analysis that combines two web mining research fields - discovering of web users’ behaviour patterns as well as discovering knowledge from the website structure. The main objective of the paper is to identify the web pages, in which the value of importance of these web pages, estimated by the website developers, does not correspond to the actual perception of these web pages by the visitors. The paper presents a case study, which used the proposed method of the identification suspicious web pages using the analysis of expected and observed probabilities of accesses to the web pages. The expected probabilities were calculated using the PageRank method and observed probabilities were obtained from the web server log file. The observed and expected data were compared using the residual analysis. The obtained results can be successfully used for the identification of potential problems with the structure of the observed website.

Jozef Kapusta, Michal Munk, Martin Drlík

Method of Criteria Selection and Weights Calculation in the Process of Web Projects Evaluation

The article outlines the issues of website quality assessment, reduction of website assessment criteria and factors that users employ in the assessment of websites. The research presents a selection procedure concerning significant choice criteria and revealing undisclosed user preferences based on the website quality assessment models. The formulated procedure utilizes feature selection methods derived from machine learning. Results concerning undisclosed preferences were verified through a comparison with those declared by website users.

Paweł Ziemba, Mateusz Piwowarski, Jarosław Jankowski, Jarosław Wątróbski

Latent Semantic Indexing for Web Service Retrieval

This paper presents a novel approach for Web Service Retrieval that utilizes Latent Semantic Indexing method to index both SOAP and RESTful Web Services. Presented approach uses modified term-document matrix that allows to store scores for different service components separately. Service data is collected and extracted using web crawlers. To determine similarities between user query and services the cosine measure is used. Presented research results are compared to standard Latent Semantic Indexing method. We also introduce our Web Service test collection that can be used for many benchmarks and make research results comparable.

Adam Czyszczoń, Aleksander Zgrzywa


Additional information

Premium Partner

    Image Credits