nach oben

2017 | Buch

Kapitel lesen Erstes Kapitel lesen

Intelligent Information and Database Systems

9th Asian Conference, ACIIDS 2017, Kanazawa, Japan, April 3-5, 2017, Proceedings, Part I

herausgegeben von: Ngoc Thanh Nguyen, Satoshi Tojo, Le Minh Nguyen, Bogdan Trawiński

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The two-volume set LNAI 10191 and 10192 constitutes the refereed proceedings of the 9th Asian Conference on Intelligent Information and Database Systems, ACIIDS 2017, held in Kanazawa, Japan, in April 2017. The total of 152 full papers accepted for publication in these proceedings was carefully reviewed and selected from 420 submissions.

They were organized in topical sections named: Knowledge Engineering and Semantic Web; Social Networks and Recommender Systems; Text Processing and Information Retrieval; Intelligent Database Systems; Intelligent Information Systems; Decision Support and Control Systems; Machine Learning and Data Mining; Computer Vision Techniques; Advanced Data Mining Techniques and Applications; Intelligent and Context Systems; Multiple Model Approach to Machine Learning; Applications of Data Science; Artificial Intelligence Applications for E-services; Automated Reasoning and Proving Techniques with Applications in Intelligent Systems; Collective Intelligence for Service Innovation, Technology Opportunity, E-Learning and Fuzzy Intelligent Systems; Intelligent Computer Vision Systems and Applications; Intelligent Data Analysis, Applications and Technologies for Internet of Things; Intelligent Algorithms and Brain Functions; Intelligent Systems and Algorithms in Information Sciences; IT in Biomedicine; Intelligent Technologies in the Smart Cities in the 21st Century; Analysis of Image, Video and Motion Data in Life Sciences; Modern Applications of Machine Learning for Actionable Knowledge Extraction; Mathematics of Decision Sciences and Information Science; Scalable Data Analysis in Bioinformatics and Biomedical Informatics; and Technological Perspective of Agile Transformation in IT organizations.

Inhaltsverzeichnis

Frontmatter

Knowledge Engineering and Semantic Web

Frontmatter

The Knowledge Increase Estimation Framework for Ontology Integration on the Instance Level

Integration of big collections of data is time and cost consuming process. It can be beneficial when the knowledge is distributed in different sources, or not – when different sources contain redundant information or, what worse, inconsistent information. Authors propose a formula to estimate knowledge increase after the process of ontology integration on the instance level. The validity of the formula was checked by a questionnaire and confirmed by a statistical study. The formula allows to estimate knowledge increase in objective manner.

Adrianna Kozierkiewicz-Hetmańska, Marcin Pietranik, Bogumiła Hnatkowska

Online Integration of Fragmented XML Documents

Online data integration of large XML documents provides the most up-to-date results from the processing of user requests issued at a central site of heterogeneous multi-database system. The fragments of large XML documents received from the remote sites are continuously combined with the most current state of integrated documents. Online integration of fragmented XML documents has a positive impact on performance of entire online data integration system.This paper presents the online integration procedures for the fragments of large XML documents. We propose a new model of data for fragmented XML documents and we define a set of operations to manipulate the fragments. A new optimisation procedure presented in the paper finds the smallest core of each new fragment that can be integrated with the documents available at a central site. We show that processing of the smallest cores of XML fragments significantly reduces overall processing time.

Handoko, Janusz R. Getta

Merging Possibilistic Belief Bases by Argumentation

Belief merging is one of active research fields with a large range of applications in Artificial Intelligence. Most of the work in this research field is in the centralized approach, however, it is difficult to apply to interactive systems such as multi-agent systems. In this paper, we introduce a new argumentation framework for belief merging. To this end, a constructive model to merge possiblistic belief bases built based on the famous general argumentation framework is proposed. An axiomatic model, including a set of rational and intuitive postulates to characterize the merging result is introduced and several logical properties are mentioned and discussed.

Thi Hong Khanh Nguyen, Trong Hieu Tran, Tran Van Nguyen, Thi Thanh Luu Le

Towards Common Vocabulary for IoT Ecosystems—preliminary Considerations

The INTER-IoT project aims at delivering a comprehensive solution to the problem of interoperability of Internet of Things platforms. Henceforth, semantic interoperability also has to be addressed. This should involve a hierarchy of ontologies, starting from an upper ontology, through core and domain ontologies. As a starting point, we have analyzed ontological models of the concepts of thing, device, observation and deployment, as occurring in the IoT domain. We have chosen five popular ontologies: SSN, SAREF, oneM2M Base Ontology, IoT-Lite, and OpenIoT, as candidates for a central INTER-IoT ontology.

Maria Ganzha, Marcin Paprzycki, Wiesław Pawłowski, Paweł Szmeja, Katarzyna Wasielewska

Graphical Interface for Ontology Mapping with Application to Access Control

Proliferation of smart, connected devices brings new challenges to data access and privacy control. Fine grained access control policies are typically complex, hard to maintain and tightly bound to the internal structure of the processed information. We thus discuss how semantic inference can be used together with an intuitive ontology management tool to ease the management of Attribute Based Access Control policies, even by users not experienced with semantic technologies.

Michał Drozdowicz, Motasem Alwazir, Maria Ganzha, Marcin Paprzycki

Influence of Group Characteristics on Agent Voting

A collective of identical agents in a multi-agent system often works together towards the common goal. In situations where no supervisor agents are present to make decisions for the group, these agents must achieve some consensus via negotiations and other types of communications. We have previously shown that the structure of the group and the priority of communication has a high influence on the group decision if consensus theory methods are used. In this paper, we explore the influence of preferential communication channels in asynchronous group communication in situations, where majority vote and dominant value are used. We also show how this relates to consensus approach in such groups and how to use a combination of both approaches to improve performance of real-life multi-agent systems.

Marcin Maleszka

Collective Knowledge: An Enhanced Analysis of the Impact of Collective Cardinality

In this paper, we present an enhanced analysis of the impact of collective cardinality on the quality of collective knowledge. Collective knowledge is a knowledge state determined on the basis of collective members’ knowledge states on the same subject (matter) in the real world. The collective members (which are often autonomous units) have their own knowledge bases, thus their knowledge states can be different from each other. The quality is based on the difference between the collective knowledge and the real state of the subject. For this aim, we introduce a new factor named diam presenting the maximal difference between a knowledge state in the collective and the real state. The simulation experiments reveal that the quality is not only dependent on the collective cardinality but also dependent on the diam value. Additionally, the number of collective members needed to achieve a difference level between the collective knowledge and the real state is also investigated.

Van Du Nguyen

A New Ontology-Based Approach for Construction of Domain Model

Domain model is one of the most important artefacts in software engineering. It can be built with the use of domain ontologies. The objective of the authors’ research is to elaborate an effective approach to domain model construction based on knowledge extraction from existing ontologies. A significant element of the approach is knowledge extraction algorithm. In this paper, a modified, more flexible version of the extraction algorithm is presented. A comparison of the new algorithm with the old one is conducted based on a case study. Both algorithms produce similar results regarding quality measures. In contrast to the old algorithm, the new is parameterized and therefore can be applied in an incremental way what is a valuable feature.

Bogumiła Hnatkowska, Zbigniew Huzar, Lech Tuzinkiewicz, Iwona Dubielewicz

Social Networks and Recommender Systems

Frontmatter

A Power-Graph Analysis of Non-fast Information Transmission

Specific types of information (e.g. knowledge, intellectual capital, conversation) require different models in the analysis of transmission, called non-fast transmission model. This paper introduces graphs with logical structure and proposes a method for evaluating the components of such graphs of transmission of information. The method may allow for optimization of transmission. This involves the maximization of the dissemination of non-fast information, and analysis of the impact of the information or its speed to reach various participants of the process of transmission.

Jacek Mercik

Group Recommendation Based on the Analysis of Group Influence and Review Content

With the development of internet, users not only receive information passively but also share their own opinions on the social networking websites. Accordingly, users’ preferences for items may be affected by others through opinion sharing and social interactions. Moreover, users with similar preferences usually form a group to share related information with others. Users’ preferences may be affected by group members. Existing researches often focus on analyzing personal preferences and group recommendation approaches without user influence. In this work, we propose a novel group recommendation approach which combines the group influence, rating-based score and profile similarity to predict group preference. The group influence is composed of group member influences, review influence and recommendation influence. The profile similarity is derived from the analysis of item descriptions and review content. The experimental results show that considering the group influence and content information in group recommendation approach can effectively improve the recommendation performance.

Chin-Hui Lai, Pei-Ru Hong

On Rumor Source Detection and Its Experimental Verification on Twitter

This paper analysis the rumor source detection on three Twitter networks of different sizes: 1K, 10K and 100K tweets. At first step, an algorithm was designed, that selects from all users a set of potential rumormongers, who initiated the fake content tweet. The next step was based on tracking of propagation trails by (1) randomly distributed, (2) maximum, (3) minimum, and (4) median weight of node in the retweet trees. Given these postulates, the study describes an empirical investigation of finding the position of the rumor-teller, calculating the length of propagation path and using statistical methods to interpret and then report basic results. The results showed that we are not able to separate the initial rumor users from the most influential spreaders in the small networks. However, in the big network - 100K - those classifications are expected to bring a satisfactory result.

Dariusz Król, Karolina Wiśniewska

Topic Preference-based Random Walk Approach for Link Prediction in Social Networks

Link prediction is a challenging problem in complex graph, but has many impacts on various fields, such as detecting missed linkages in general graphs, collaborative filtering in co-authorship networks, and predicting protein-protein interactions in Bioinformatics. In this paper, we present a new link prediction method applied for friend suggestion in social networks. The benefits not only help social players easily find new friends but also enhance their loyalties to the social sites. Unlike existing methods that commonly employ statistical attributes of vertices (e.g., in- and out-degree) and topological structures (e.g., distance and path), we contribute to exploit topical information extracted from users’ posted messages. We also introduce a new similarity measure that takes into account users’ topic preferences and popularity in topical domains for effective ranking associated friends. Experimental results conducted on a real Twitter data show that the proposed approach outperforms other three state-of-the-art methods in all the cases.

Thiamthep Khamket, Arnon Rungsawang, Bundit Manaskasemsak

Level of Education and Previous Experience in Acquiring ICT/Smart Technologies by the Elderly People

The article introduces the results of research conducted within the elderly people of Municipality of Hradec Kralove, Czech Republic. The main research objective is to discover whether (1) the level of education and (2) exploitation of ICT/smart technologies on work positions in the productive age correlate to the education and training in this field in the post-productive age. The method of questionnaire applied to the research sample of 432 respondents to provide answers to five questions and to verify two hypotheses. The collected data were processed by the IBM SPSS Statistics software. The findings, correlating to similar studies conducted in other countries, proved the necessity to develop methodologies on acquiring new technologies by the elderly people in the future.

Ivana Simonova, Petra Poulova

The Effect of Presentation in Online Advertising on Perceived Intrusiveness and Annoyance in Different Emotional States

Online advertising is a rapidly growing area with high commercial relevance. This paper investigates the effect of different types of ad presentation, varying in frame size, position and animation level on visual intrusiveness and annoyance as perceived by users. Furthermore, we investigate the influence of users’ emotional states on perceived intrusiveness and annoyance. This research has been carried out through a survey study. The analysis of the data shows a linear correlation between the visual attention of the ads and its features. Also, a positive influence of emotion has been found on various types of ad presentations. In addition, the participants with emotions of positive valence and low arousal showed more tolerance to the same ad as the users with a different emotional state. This research proposes a new aspect in computational advertising to adapt the recommendations based on the user’s emotional state and the parameters of the online advertisements.

Kaveh Bakhtiyari, Jürgen Ziegler, Hafizah Husain

Runtime Verification and Quality Assessment for Checking Agent Integrity in Social Commerce System

In social commerce systems, integrity is an important quality factor to ensure trust and reputation. Buyer agents need to trust Seller agents before performing online purchases. Since the interaction and transaction are performed online, there are certain conditions due to the dynamic of Seller agent activities and preferences that affect integrity. Hence, in this research, a solution is proposed to detect Seller agent integrity violations during runtime. Identifiability, interaction availability, trustability and information accessibility are some of the identified requirements that contribute to agent integrity. The proposed solution includes the definition of integrity as Seller agent quality properties and the implementation of Runtime Verification and Quality Assessment process. The effectiveness of the proposed solution is evaluated by implementing the checking and assessment process within social commerce system model.

Najwa Abu Bakar, Mohd Hafiz Selamat, Ali Selamat

Evaluation of Tensor-Based Algorithms for Real-Time Bidding Optimization

In this paper we evaluate tensor-based approaches to the Real-Time Bidding (RTB) Click-Through Rate (CTR) estimation problem. We propose two new tensor-based CTR prediction algorithms. We analyze the evaluation results collected from several papers – obtained with the use of the iPinYou contest dataset and the Area Underneath the ROC curve measure. We accompany these results with analogical results of our experiments – conducted with the use of our implementations of tensor-based algorithms and approaches based on the logistic regression. In contrast to the results of other authors, we show that biases – in particular those being low-order expectation value estimates – are at least as useful as outcomes of high-order components’ processing. Moreover, on the basis of Average Precision results, we postulate that ROC curve should not be the only characteristic used to evaluate RTB CTR estimation performance.

Andrzej Szwabe, Paweł Misiorek, Michał Ciesielczyk

A Consensus-Based Method to Enhance a Recommendation System for Research Collaboration

With the development of scientific societies, research problems are increasingly complex, requiring scientists to collaborate to solve them. The quality of collaboration between researchers is a major factor in determining their achievements. This study proposes a collaboration recommendation method that takes into account previous research collaboration and research similarities. Research collaboration is measured by combining the collaboration time and the number of co-authors who already collaborated with an author. Research similarity is based on authors’ previous publications and academic events they attended. In addition, a consensus-based algorithm is proposed to integrate bibliography data from different sources, such as the DBLP Computer Science Bibliography, ResearchGate, CiteSeer, and Google Scholar. The experimental results show that this proposal improves the accuracy of the recommendation systems, in comparison with other methods.

Dinh Tuyen Hoang, Van Cuong Tran, Tuong Tri Nguyen, Ngoc Thanh Nguyen, Dosam Hwang

International Business Matching Using Word Embedding

Recommender systems, which help users discover information or knowledge they might need without requiring them to have specific previous knowledge, are gaining popularity in our age of information overload. In addition, natural language processing techniques like word embedding offer new possibilities for extracting information from a massive amount of text data. This work explores the possibility of applying word embedding as the foundation for a recommender system to help international businesses identify appropriate counterparts for their activities. In this paper, we describe our system and report preliminary experiments using Wikipedia as a corpus. Our experiments attempt to provide answers to support business decision makers when they are considering entering a relatively unknown market and are seeking better understanding or appropriate partners. Our experiment shows promising results that will pave the way for future research.

Didier Gohourou, Daiki Kurita, Kazuhiro Kuwabara, Hung-Hsuan Huang

Mixture Seeding for Sustainable Information Spreading in Complex Networks

A high intensity of online advertising often elicits a negative response from web users. Marketing companies are looking for more sustainable solutions, especially in the area of visual advertising. However, the research efforts related to information spreading processes and viral marketing are focused mainly on the maximization of coverage. This paper presents a sustainable seed selection solution based on mixtures of seeds with different characteristics. The proposed solution makes it possible for the information spreading processes to reach more diverse audiences. Mixture seeding avoids overrepresentation of nodes with similar characteristics, and thus decreases campaign intensity while maintaining acceptable coverage.

Jarosław Jankowski

Complex Networks in the Epidemic Modelling

The spread of infectious diseases is analyzed in the paper using social networks. Study of this phenomenon is very important due to the safety of all of us. After the analysis of properties of generated networks using known algorithms a novel method for generating scale-free complex networks was proposed. Features of the epidemic course were studied using computer experiments. The proposed method was compared with three known strategies in terms of influence of vaccination strategies on the tempo and mode of epidemic spread and the number of infected individuals.

Tomasz Biegus, Halina Kwasnicka

Text Processing and Information Retrieval

Frontmatter

Identification of Biomedical Articles with Highly Related Core Contents

Given a biomedical article a, identification of those articles with similar core contents (including research goals, backgrounds, and conclusions) as a is essential for the survey and cross-validation of the highly related biomedical evidence presented in a. We thus present a technique CCSE (Core Content Similarity Estimation) that retrieves these highly related articles by estimating and integrating three kinds of inter-article similarity: goal similarity, background similarity, and conclusion similarity. CCSE works on titles and abstracts of biomedical articles, which are publicly available. Experimental results show that CCSE performs better than PubMed (a popular biomedical search engine) and typical techniques in identifying those scholarly articles that are judged (by biomedical experts) to be the ones whose core contents focus on the same gene-disease associations. The contribution is essential for the retrieval, clustering, mining, and validation of the biomedical evidence in literature.

Rey-Long Liu

An Efficient Hybrid Model for Vietnamese Sentiment Analysis

Sentiment analysis from the text is an exciting and challenging task which can be useful in many applications of exploiting people interests for improving the quality of services. Especially, text collected from social networks, websites or forums is usually represented by spoken language that is unstructured and difficult to handle. In this paper, we present a novel hybrid model that is based on Hierarchical Dirichlet Process (HDP) and adopts a combination of lexicon-based and Support Vector Machine (SVM) methods in the task of topic-based sentiment classification for Vietnamese text. The proposed model has been evaluated on five different topic-datasets, and the experimental results show the efficiency of our proposed model when the average accuracy is nearly 87%. Although this proposed model is initially designed for Vietnamese language, it is applicable and adaptable to other languages.

Thanh Hung Vo, Thien Tin Nguyen, Hoang Anh Pham, Thanh Van Le

Simple and Accurate Method for Parallel Web Pages Detection

This paper presents language independent method for measuring structural similarity between web pages from bilingual websites. First we extract a new feature from those which are used by the STRAND architecture and combine it with the existing one. Next we analyze properties of this feature and develop an iterative algorithm to infer the parameters of our model. Finally, we propose an unsupervised algorithm for detecting parallel pairs of web pages based on these features and parameters. Our approach appears to benefit the structural similarity measure: in the task of distinguishing parallel web pages from five different bilingual websites the proposed method is competitive with other unsupervised methods.

Alibi Jangeldin, Zhenisbek Assylbekov

Combining Latent Dirichlet Allocation and K-Means for Documents Clustering: Effect of Probabilistic Based Distance Measures

This paper evaluates through an empirical study eight different distance measures used on the LDA + K-means model. We performed our analysis on two miscellaneous datasets that are commonly used. Our experimental results indicate that the probabilistic-based distance measures are better than the vector based distance measures including Euclidean when it comes to cluster a set of documents in the topic space. Moreover, we investigate the implication of the number of topics and show that K-means combined to the results of the Latent Dirichlet Allocation model allows us to have better results than the LDA + Naive and Vector Space Model.

Quang Vu Bui, Karim Sayadi, Soufian Ben Amor, Marc Bui

A Hybrid Method for Named Entity Recognition on Tweet Streams

Information extraction from microblogs has recently attracted researchers in the fields of knowledge discovery and data mining owing to its short nature. Annotating data is one of the significant issues in applying machine learning approaches to these sources. Active learning (AL) and semi-supervised learning (SSL) are two distinct approaches to reduce annotation costs. The SSL approach exploits high-confidence samples and AL queries the most informative samples. Thus they can produce better results when jointly applied. This paper proposes a combination of AL and SSL to reduce the labeling effort for named entity recognition (NER) from tweet streams by using both machine-labeled and manually-labeled data. The AL query algorithms select the most informative samples to label those done by a human annotator. In addition, Conditional Random Field (CRF) is chosen as an underlying model to select high-confidence samples. The experiment results on a tweet dataset demonstrate that the proposed method achieves promising results in reducing the human labeling effort and that it can significantly improve the performance of NER systems.

Van Cuong Tran, Dinh Tuyen Hoang, Ngoc Thanh Nguyen, Dosam Hwang

A Method for User Profile Learning in Document Retrieval System Using Bayesian Network

User modeling methods are developed by many researches in area of document retrieval systems. The main reason is that the system can not present the same results for every user. Each user can have different information needs even if he uses the same terms to formulate his query. In this paper we present the solution for the problem. We propose a method for user profile building and updating using Bayesian network approaches which allows to discover dependencies between terms. Additionally, we use domain ontology of terms to simplify the calculations. Performed experiments have shown that the quality of presented methods is promising.

Bernadetta Maleszka

Intelligent Database Systems

Frontmatter

Answering Temporal Analytic Queries over Big Data Based on Precomputing Architecture

Big data explosion brings revolutionary changes to many aspects of our lives. Huge volume of data, along with its complexity poses big challenges to data analytic applications. Techniques proposed in data warehousing and online analytical processing (OLAP), such as precomputed multidimensional cubes, dramatically improve the response time of analytic queries based on relational databases. There are some recent works extending similar concepts into NoSQL such as constructing cubes from NoSQL stores and converting existing cubes into NoSQL stores. However, only few works are studying the precomputing structure deliberately within NoSQL databases. In this paper, we present an architecture for answering temporal analytic queries over big data by precomputing the results of granulated chunks of collections which are decomposed from the original large collection. By using the precomputing structure, we are able to answer the drill-down and roll-up temporal queries over large amount of data within reasonable response time.

Nigel Franciscus, Xuguang Ren, Bela Stantic

Functional Querying in Graph Databases

The paper is focused on a functional querying in graph databases. An attention is devoted to functional modelling of graph databases both at a conceptual and data level. The notions of graph conceptual schema and graph database schema are considered. The notion of typed attribute is used as a basic structure both on the conceptual and database level. As a formal approach to declarative graph database querying a version of typed lambda calculus is used.

Jaroslav Pokorný

Online Transaction Processing (OLTP) Performance Improvement Using File-Systems Layer Transparent Compression

In this research, we use three Swingbench OLTP benchmark scenarios to examine that three compression algorithm in ZFS file-systems namely LZ4, LZJB and ZLE can improve OLTP database performance. Beside the database performance, we also compare how much storage can be saved, impact to maximum response time, and the increase of CPU utilization from the three compression algorithms. The acquired data were then be analyzed using Analytic Hierarchy Process to find out the highest ranking compression in terms of benefits and benefits to cost ratio. The result indicates that LZJB achieved the highest performance improvement, LZ4 achieved the highest storage saving and ZLE achieved the smallest CPU utilization overhead. The safest algorithm that did not experience any reduction in database performance or increase of maximum response time in this research is LZJB.

Suharjito, Adrianus B. Kurnadi

A Multi-database Access System with Instance Matching

Organizations that use several separately-developed information systems face a common problem. The data which are used by different systems have no standard. Different databases that keep information of same entity instances use different representations. Attribute names are different. Attribute values are different. Even unique identifiers which are used to identify object instances are different. Yet the data need to be referred to and used by some mission-critical applications. This paper presents a multi-database instance matching system which is developed to bring data from separate sources that refer to different unique identifiers and attribute details. Entity resolution techniques are employed to match the database instances. After matched entity instances are identified, an ontology is used to keep the matched identifiers. Queries from the users then refer to the ontology and are rewritten to refer to the correct instances of the original database.

Thanapol Phungtua-Eng, Suphamit Chittayasothorn

Intelligent Information Systems

Frontmatter

Enhancing Product Innovation Through Smart Innovation Engineering System

This paper illustrates the idea of Smart Innovation Engineering (SIE) System that helps in carrying the process of product innovation. The SIE system collects the experiential knowledge from the formal decisional events. This experiential knowledge is collected from the set of similar products having some common functions and features. Due to the fact that SIE system collects, captures and reuses the experiential knowledge of all the similar products apart from the knowledge about new technological advancements, it behaves like a group of experts in its domain. Through this system, the innovation process of manufactured products can be greatly enhanced. Moreover, entrepreneurs and manufacturing organizations will be able to take proper, enhanced decisions and most importantly at appropriate time. The expertise of SIE System is ever increasing as every decision taken is stored in the form of set of experience that can be used in future for similar queries.

Mohammad Maqbool Waris, Cesar Sanin, Edward Szczerbicki

Analyses of Aspects of Knowledge Diffusion Based on the Example of the Green Supply Chain

The diffusion of knowledge in the area of green supply chain (GSC) is connected with the outcomes obtained as a result. This article concentrates on the issues of knowledge diffusion and is based on own research of manufacturing companies on the example of the GSC. As a result of the research, it was established that exchanging knowledge between a supplier and a manufacturer has a higher measurement than between a manufacturer and a receiver, while, generally speaking, a low level of knowledge exchange is observed within the area of GSC. Companies need a wider access to sources of knowledge in the field of GSC and need the development of the culture of measurement of the executed undertakings. The knowledge diffusion in the area of GSC is connected with outcomes obtained as a result.

Anna Maryniak, Łukasz Strąk

Comparative Evaluation of Bluetooth and Wi-Fi Direct for Tablet-Oriented Educational Applications

This study conducted a survey to implement educational applications that can share information even in environments where access points cannot be used. In particular, we investigated whether Bluetooth (widely used for many years) or Wi-Fi Direct (developed recently) is more suitable when creating educational applications using an ad hoc network. To survey the influence of hand movements on delay time while operating tablets, we created a paint application that shares a drawing screen across two tablets and conducted an experiment. In addition, to survey the influence of human presence on delay time, we conducted an experiment in which we changed the number of students seated between the two tablets in the classroom. From the results of these experiments, we conclude that Bluetooth is less influenced by hand movements and human presence than Wi-Fi Direct.

Keiichi Endo, Ayame Onoyama, Dai Okano, Yoshinobu Higami, Shinya Kobayashi

Comparing TPC-W and RUBiS via PCA

We aim to understand the fundamental design correspondences between TPC-W and RUBiS, two benchmark applications modeled after the well-known E-commerce solutions Amazon and eBay, respectively. Furthermore, we investigate how these benchmarks reflect the design principles of real-world applications by comparing them against Qualitas Corpus, offering an effective domain context of curated Java software systems. To perform this study, we employ Principal Component Analysis (PCA) to distill the important information (i.e., the principal components) from a set of observations of possibly correlated variables (i.e., software metrics). The results of our analysis reveal that TPC-W and RUBiS are comprised of surprisingly dissimilar features that clearly show that TPC-W and RUBiS do not share too many design commonalities. Moreover, we demonstrate that PCA is a powerful tool to uncover key software quality attributes.

Markus Lumpe, Quoc Bao Vo

Analysis and Solution Model of Distributed Computing in Scientific Calculations

Processing huge amounts of data is currently of concern in various fields of science and commercial data processing, such as pharmaceutical drug development, astronomical probe data processing, security analysis of large amounts of communication data, etc. Generally, centrally administered methods are used, but their employment and operation are very expensive. The aim of this paper is to present a model of high-capacity data processing that is based on the technology of Apache Hadoop with emphasis on use of volunteer host devices with the service distribution via the Internet.

Josef Horalek, Vladimír Soběslav

Decision Support Control Systems

Frontmatter

Failures in Discrete Event Systems and Dealing with Them by Means of Petri Nets

An approach based on Petri nets pointing to the manner how to deal with failures in discrete-event systems is presented. It uses the reachability tree and/or graph of the Petri net-based model of the real system as well as the synthesis of a supervisor to remove the possible deadlock(s).

František Čapkovič

Defining Deviation Sub-spaces for the A*W Robust Planning Algorithm

The paper presents further results from the development of the A*W hybrid planning algorithm aimed at determining robust plans for multiple entities co-existing in a common environment under uncertain conditions. The main focus is on strategies to determine deviation sub-spaces, i.e. the areas for which multi-variant plans are generated, as that selection determines the balance between computational efficiency and robustness. A general strategy is presented, followed by examples used to discuss the influence of the parameters on the behaviour of the algorithm. Guidelines for sub-space identification are provided, and further directions for research are outlined.

Igor Wojnicki, Sebastian Ernst

Creative Expert System: Comparison of Proof Searching Strategies

This paper presents comparison of time cost of three proof searching strategies in a creative expert system. Initially, model of the creative expert system and inference algorithm are proposed. The algorithm searches for a proof up to a given maximal depth, using one of the following strategies: finding all possible proofs, finding the first proof by depth-first and finding the first proof by breadth-first. Calculation time is measured in inference scenarios from a casting domain. Creativity of the expert system is achieved thanks to integration of inference and machine learning. The learning algorithm can be automatically executed during inference process, because its execution is formalized as a complex inference rule. Such a rule can be fired during inference process. During execution, training data is prepared from facts already stored in the knowledge base and new implications are learned from it. These implications can be used in the inference process. Therefore, it is possible to infer decisions in cases not covered by the knowledge base explicitly.

Bartlomiej Sniezynski, Grzegorz Legien, Dorota Wilk-Kołodziejczyk, Stanislawa Kluska-Nawarecka, Edward Nawarecki, Krzysztof Jaśkowiec

Spatial Planning as a Hexomino Puzzle

Exact cover problem is a well-known NP-complete decision problem to determine if the exact cover really exists. In this paper, we show how to solve a modified version of the famous Hexomino puzzle (being a noteworthy example of an exact cover problem) using a Dancing-links based algorithm. In this modified problem, a limited number of gaps in the rectangular box may be left uncovered (this is a common scenario in a variety of spatial planning problems). Additionally, we present the benchmark generator which allows for elaborating very demanding yet solvable problem instances. These instances were used during the qualifying round of Deadline24—an international 24-h programming marathon. Finally, we confront our baseline solutions with those submitted by the contestants, and elaborated using our two solvers.

Marcin Cwiek, Jakub Nalepa

Ramp Loss Support Vector Data Description

Data description is an important problem that has many applications. Despite the great success, the popular support vector data description (SVDD) has problem with generalization and scalability when training data contains a significant amount of outliers. We propose in this paper the so-called ramp loss SVDD then prove its scalability and robustness. For solving the proposed problem, we develop an efficient algorithm based on DC (Difference of Convex functions) programming and DCA (DC Algorithm). Preliminary experiments on both synthetic and real data show the efficiency of our approach.

Vo Xuanthanh, Tran Bach, Hoai An Le Thi, Tao Pham Dinh

The Temporal Supplier Evaluation Model Based on Multicriteria Decision Analysis Methods

An important element of the provided chain managing process is the rating of suppliers. The paper presents a new framework to identify a temporal supplier evaluation model by using Multi-Criteria Decision Analysis (MCDA) methods. Proposed approach extends classical MCDA paradigm with aspects of temporal evaluation and dedicated aggregation strategies. Afterwards, the proposed framework is used in the identification process for as an illustrative example. Finally, the accuracy of the obtained results is compared and discussed.

Jarosław Wątróbski, Wojciech Sałabun, Grzegorz Ladorucki

Machine Learning and Data Mining

Frontmatter

A Novel Entropy-Based Approach to Feature Selection

The amount of features in datasets has increased significantly in the age of big data. Processing such datasets requires an enormous amount of computing power, which exceeds the capability of traditional machines. Based on mutual information and selection gain, the novel feature selection approach is proposed. With Mackey-Glass, S&P 500, and TAIEX time series datasets, we investigated how good the proposed approach could perform feature selection for a compact subset of feature variables optimal or near optimal, through comparing the results by the proposed approach to those by the brute force method. With these results, we determine the proposed approach can establish a subset solution optimal or near optimal to the problem of feature selection with very fast calculation.

Chia-Hao Tu, Chunshien Li

An Artificial Player for a Turn-Based Strategy Game

This paper describes the design of an artificial intelligent opponent in the Empire Wars turn-based strategy computer game. Several approaches to make the opponent in the game, that has complex rules and a huge state space, are tested. In the first phase, common methods such as heuristics, influence maps, and decision trees are used. While they have many advantages (speed, simplicity and the ability to find a solution in a reasonable time), they provide rather average results. In the second phase, the player is enhanced by an evolutionary algorithm. The algorithm adjusts several parameters of the player that were originally determined empirically. In the third phase, a learning process based on recorded moves from previous games played is used. The results show that incorporating evolutionary algorithms can significantly improve the efficiency of the artificial player without necessarily increasing the processing time.

Filip Maly, Pavel Kriz, Adam Mrazek

Recognizing the Pattern of Binary Hermitian Matrices by a Quantum Circuit

The chapter contains a description of quantum circuit for pattern recognition. The task is to distinguish Hermitian and Non-Hermitian matrices. The quantum circuit is constructed to accumulate the elements of a learning set and with use of Hamming distance and some Hamiltonian designed for so-called quantum summing operation it is able to distinguish if a tested element fits to the pattern from the learning set. The efficiency of the this solution is shown in a computational experiment.

Joanna Wiśniewska, Marek Sawerwain

Fuzzy Maximal Frequent Itemset Mining Over Quantitative Databases

Fuzzy frequent itemset mining is an important problem in quantitative data mining. In this paper, we define the problem of fuzzy maximal frequent itemset mining, which, to the best of our knowledge, has never been addressed before. A simple tree-based data structure called FuzzyTree is constructed, in which the fuzzy itemsets are sorted dynamically based the supports. Then, we propose an algorithm named FMFIMiner to build the FuzzyTree. In FMFIMiner, we can ignore processing the other children nodes once the supports between the parent node and one child node are equal; moreover, we conduct pruning the certain support computing by checking whether an itemset is in the final results. Theoretical analysis and experimental studies over 4 datasets demonstrate that our proposed algorithm can efficiently decrease the runtime and memory cost, and significantly outperform the baseline algorithm MaxFFI-Miner.

Haifeng Li, Yue Wang, Ning Zhang, Yuejin Zhang

A Method for Early Pruning a Branch of Candidates in the Process of Mining Sequential Patterns

Mining patterns in sequence databases are one of the fields that many researchers study due to their high applicability in many areas such as trademarks, medical, education, or prediction. However, the main challenge of this is runtime and memory usage. Many approaches are proposed to improve the efficiency of mining algorithms. However, another issue needs to be studied further. In this paper, we propose a method of effectively solving the problem of sequential pattern mining, based on the event information represented in the vertical data format of the sequence dataset. With a proposed state table, a map table of co-occurrence of events and an early pruning search space technique, the paper proposes the efficiency mining algorithm called STATE-SPADE. The experimental results have shown the advantages of the proposed method in terms of execution time and memory usage.

Bac Le, Minh-Thai Tran, Duy Tran

Analyzing Performance of High Frequency Currency Rates Prediction Model Using Linear Kernel SVR on Historical Data

We analyze the performance of various models constructed using linear kernel SVR and trained on historical bid data for high frequency currency trading. The bid tick data is converted into equally spaced (1 min) data. Different values for the number of training samples, number of features, and the length of the timeframes are used when conducting the experiments. These models are used to conduct simulated currency trading in the following year. We record the profits, hit ratios and number of trades executed from using these models. Our results indicate it is possible to obtain a profit as well as good hit ratio from a linear model trained only on historical data under certain pre-defined conditions. On examining the parameters for the linear models generated, we observe that a large number of models have all co-efficient values as negative while giving profit and good hit ratio, suggesting a simple yet effective trading strategy.

Chanakya Serjam, Akito Sakurai

Unsupervised Language Model Adaptation by Data Selection for Speech Recognition

In this paper, we present a language model (LM) adaptation framework based on data selection to improve the recognition accuracy of automatic speech recognition systems. Previous approaches of LM adaptation usually require additional data to adapt the existing background LM. In this work, we propose a novel two-pass decoding approach that uses no additional data, but instead, selects relevant data from the existing background corpus that is used to train the background LM. The motivation is that the background corpus consists of data from the different domains and as such, the LM trained from it is generic and not discriminative. To make the LM more discriminative, we will select sentences from the background corpus that are similar in some linguistic characteristics to the utterances recognized in the first-pass and use them to train a new LM which is employed during the second-pass decoding. In this work, we examine the use of n-gram and bag-of-words features as linguistic characteristics of selection criteria. Evaluated on the 11 talks in the test-set of TED-LIUM corpus, the proposed adaptation framework produced a LM that reduced the word error rate by up to $$10\%$$ relatively and the perplexity by up to $$47\%$$ relatively. When the LM was adapted for each talk individually, further word error rate reduction was achieved.

Yerbolat Khassanov, Tze Yuang Chong, Benjamin Bigot, Eng Siong Chng

Metaheuristic Optimization on Conventional Freeman Chain Code Extraction Algorithm for Handwritten Character Recognition

In Handwritten Character Recognition (HCR), interest in feature extraction has been on the increase with the abundance of algorithms derived to increase the accuracy of classification. In this paper, a metaheuristic approach for feature extraction technique in HCR based on Harmony Search Algorithm (HSA) was proposed. Freeman Chain Code (FCC) was used as data representation. However, the FCC representation is dependent on the route length and branch of the character node. To solve this problem, the metaheuristic approach via HSA was proposed to find the shortest route length and minimum computational time for HCR. At the end, comparison of the result with other metaheuristic approaches namely, Differential Equation (DE), Particle Swarm Optimization (PSO), Genetic Algorithm (GA) and Ant Colony Optimization (ACO) was performed.

Muhammad A. Mohamad, Habibollah Haron, Haswadi Hasan

A Novel Learning Vector Quantization Inference Classifier

One of the popular tools in pattern recognition is a neuro-fuzzy system. Most of the neuro-fuzzy systems are based on a multi-layer perceptrons. In this paper, we incorporate learning vector quantization in a neuro-fuzzy system. The prototype update equation is based on the learning vector quantization while the gradient descent technique is used in the weight update equation. Since weights contain informative information, they are exploited to select a good feature set. There are 8 data sets used in the experiment, i.e., Iris Plants, Wisconsin Breast Cancer (WBC), Pima Indians Diabetes, Wine, Ionosphere, Colon Tumor, Diffuse Large B-Cell Lymphoma (DLBCL), and Glioma Tumor (GLI_85). The results show that our algorithm provides good classification rates on all data sets. It is able to select a good feature set with a small number of features. We compare our results indirectly with the existing algorithms as well. The comparison result shows that our algorithm performs better than those existing ones.

Chakkraphop Maisen, Sansanee Auephanwiriyakul, Nipon Theera-Umpon

Mining Periodic High Utility Sequential Patterns

The aim of mining High Utility Sequential Patterns (HUSPs) is to discover sequential patterns having a high utility (e.g. high profit) based on a user-specified minimum utility threshold. The existing algorithms for mining HUSPs are capable of discovering the complete set of all HUSPs. However, they usually generate a large number of patterns which may be redundant in some cases. The periodic appearance of HUSPs can be regarded as an important criterion to consider the purchase behaviour of customers and measure the interestingness of HUSPs which is very common in real-life applications. In this paper, we focus on periodic high utility sequential patterns (PHUSPs) that are periodically bought by customers and generate a high profit. We proposed an algorithm named PHUSPM (Periodic High Utility Sequential Patterns Miner) to efficiently discover all PHUSPs. The experimental evaluation was performed on six large-scale datasets to evaluate the performance of PHUSPM in terms of execution time, memory usages and scalability. The experimental results show that the PHUSPM algorithm is very efficient by reducing the search space and discarding a considerable amount of non-PHUSPs.

Tai Dinh, Van-Nam Huynh, Bac Le

Mining Class Association Rules with Synthesis Constraints

Constraint-based methods for mining class association rules (CARs) have been developed in recent years. Currently, there are two kinds of constraints including itemset constraints and class constraints. In this paper, we solve the problem of combination of class constraints and itemset constraints are called synthesis constraints. It is done by applying class constraints and removing rules that do not satisfy itemset constraints after that. This process will consume more time when the number of rules is large. Therefore, we propose a method to mine all rules satisfying these two constraints by one-step, i.e., we will put these two constraints in the process of mining CARs. The lattice is also used to fast generate CARs. Experimental results show that our approach is more efficient than mining CARs using two steps.

Loan T. T. Nguyen, Bay Vo, Hung Son Nguyen, Sinh Hoa Nguyen

Towards Auto-structuring Harmony Transcription

In recent years one can observe a significant progress in transcribing harmony from songs. New methods and applications had made it easy to automatically retrieve harmony, i.e. the chord progression for any song one can find. The shortcoming of these applications however is presentation of the transcription. Even though in most cases verses, choruses or other parts of a song share the same harmony, theyre never presented in a compact form. Even when two chords are repeated throughout the entire song one has to look at the entire transcription - from the first to the last occurrence of a chord - to realize that. This paper researches approaches to structuring the transcription, like using repetition notation (e.g. “x2”) or finding the shortest commonly repeated chord progression, which may be a riff.

Marek Kopel

Computer Vision Techniques

Frontmatter

Boosting Detection Results of HOG-Based Algorithms Through Non-linear Metrics and ROI Fusion

Practical application of object detection systems, in research or industry, favors highly optimized black box solutions. We show how such a highly optimized system can be further augmented in terms of its reliability with only a minimal increase of computation times, i.e. preserving realtime boundaries. Our solution leaves the initial (HOG-based) detector unchanged and introduces novel concepts of non-linear metrics and fusion of ROIs. In this context we also introduce a novel way of combining feature vectors for mean-shift grouping. We evaluate our approach on a standarized image database with a HOG detector, which is representative for practical applications. Our results show that the amount of false-positive detections can be reduced by a factor of 4 with a negligable complexity increase. Although introduced and applied to a HOG-based system, our approach can easily be adapted for different detectors.

Darius Malysiak, Anna-Katharina Römhild, Christoph Nieß, Uwe Handmann

Automatic Interactive Video Authoring Method via Object Recognition

Interactive video is a type of video which provides interactions for obtaining video related information or participating in video content. However, authors of interactive video need to spend much time to create the interactive video content. Many researchers have presented methods and features to solve the time-consuming problem. However, the methods are still too complicated to use and need to be automated. In this paper, we suggest an automatic interactive video authoring method via object recognition. Our proposed method uses deep learning based object recognition and an NLP-based keyword extraction method to annotate objects. To evaluate the method, we manually annotated the objects in the selected video clips, and we compared proposed method and manual method. The method achieved an accuracy rate of 43.16% for the whole process. This method allows authors to create interactive videos easily.

Ui-Nyoung Yoon, Myung-Duk Hong, Geun-Sik Jo

Target Object Tracking-Based 3D Object Reconstruction in a Multiple Camera Environment in Real Time

The visualization of a three-dimensional target object reconstruction from multiple cameras is an important issue in high-dimensional data representations with application for medical uses, sports scene analysis, and event creation for film. In this paper, we propose an efficient 3D reconstruction methodology to voxelize and carve the 3D scene in focus on 3D tracking of the object in a large environment. We applied sparse representation-based target object tracking to efficiently trace the movement of the target object in a background clutter and reconstruct the object based on the estimated 3D position captured from multiple images. The voxelized area is optimized to the target by tracking the 3D position and then effectively reduce the process time while keeping the details of the target. We demonstrate the experiments by carving the voxels within the 3D tracked area of the target object.

Jinjoo Song, Heeryon Cho, Sang Min Yoon

Boosting Discriminative Models for Activity Detection Using Local Feature Descriptors

This paper presents a method for daily living activity prediction based on boosting discriminative models. The system consists of several steps. First, local feature descriptors are extracted from multiple scales of the sequent images. In this experiment, the basic feature descriptors based on HOG, HOF, MBH are considered to process. Second, local features based BoW descriptors are studied to construct feature vectors, which are then fed to classification machine. The BoW feature extraction is a pre-processing step, which is utilized to avoid strong correlation data, and to distinguish feature properties for uniform data for classification machine. Third, a discriminative model is constructed using the BoW features, which is based on the individual local descriptor. Sequentially, final decision of action classes is done by the classifier using boosting discriminative models. Different to previous contributions, the sequent-overlap frames are considered to convolute and infer action classes instead of an individual set of frames is used for prediction. An advantage of boosting is that it supports to construct a strong classifier based on a set of weak classifiers associated with appropriate weights to obtain results in high performance. The method is successfully tested on some standard databases.

Van-Huy Pham, My-Ha Le, Van-Dung Hoang

Highlights Extraction in Sports Videos Based on Automatic Posture and Gesture Recognition

Content-based indexing of sports videos is usually based on the automatic detection of video highlights. Highlights can be detected on the basis of players’ or referees’ gestures and postures. Some gestures and postures of players are very typical for special sports events. These special gestures and postures can be recognized mainly in close-up and medium close view shots. The effective view classification method should be first applied. In the paper sports video shots favorable to detect gesture and posture of players are characterized and then experimental results of the tests with video shot categorization based on gesture recognition are presented. Then important and interesting moments in soccer games are detected when referees hold the penalty card above the head and look towards the player that has committed a serious offense. This recognition process is based only on visual information of sports videos and does not use any sensors.

Kazimierz Choroś

Advanced Data Mining Techniques and Applications

Frontmatter

A High-Performance Algorithm for Mining Repeating Patterns

A repeating pattern is a sequence composed of identical elements, repeating in a regular manner. In real life, there are lots of applications such as musical and medical sequences containing valuable repeating patterns. Because the repeating patterns hidden in sequences might contain implicit knowledge, how to retrieve the repeating patterns effectively and efficiently has been a challenging issue in recent years. Although a number of past studies were proposed to deal with this issue, the performance cannot still earn users’ satisfactions especially for large datasets. To aim at this issue, in this paper, we propose an efficient algorithm named Fast Mining of Repeating Patterns (FMRP), which achieves high performance for finding repeating patterns by a novel index called Quick-Pattern-Index (QPI). This index can provide the proposed FMRP algorithm with an effective support due to its information of pattern positions. Without scanning a given sequence iteratively, the repeating patterns can be discovered by only one scan of the sequence. The experimental results reveal that our proposed algorithm performs better than the compared methods in terms of execution time.

Ja-Hwung Su, Tzung-Pei Hong, Chu-Yu Chin, Zhi-Feng Liao, Shyr-Yuan Cheng

Recognition of Empathy Seeking Questions in One of the Largest Woman CQA in Japan

Many questions are posted on community websites in the world. Some of these questions are actually asked in order to receive empathy for the feelings of questioners, instead of getting specific answers to the questions asked. However, it is difficult to receive answers for these questions compared with questions that are asked for seeking responses other than for empathy. If such questions that are asked for the purpose of receiving empathy can get responses, it serves as an important factor to increase satisfaction of users. This paper reports on our attempt to improve response rate to the questions by classifying those questions that are asked for seeking empathy and those that are not by using machine learning and showing the questions classified as the ones seeking empathy to the prospective respondents who have been answered to these questions with higher rate.

Tatsuro Shimada, Akito Sakurai

A Content-Based Image Retrieval Method Based on the Google Cloud Vision API and WordNet

Content-Based Image Retrieval (CBIR) method analyzes the content of an image and extracts the features to describe images, also called the image annotations (or called image labels). A machine learning (ML) algorithm is commonly used to get the annotations, but it is a time-consuming process. In addition, the semantic gap is another problem in image labeling. To overcome the first difficulty, Google Cloud Vision API is a solution because it can save much computational time. To resolve the second problem, a transformation method is defined for mapping the undefined terms by using the WordNet. In the experiments, a well-known dataset, Pascal VOC 2007, with 4952 testing figures is used and the Cloud Vision API on image labeling implemented by R language, called Cloud Vision API. At most ten labels of each image if the scores are over 50. Moreover, we compare the Cloud Vision API with well-known ML algorithms. This work found this API yield 42.4% mean average precision (mAP) among the 4,952 images. Our proposed approach is better than three well-known ML algorithms. Hence, this work could be extended to test other image datasets and as a benchmark method while evaluating the performances.

Shih-Hsin Chen, Yi-Hui Chen

A Personalized Recommendation Method Considering Local and Global Influences

Social Media is one of the largest media data storage in the website. Many researchers utilize this to do some research about user interest and recommendation system. This data is like a treasure vault waiting to be utilized to develop the recommendation systems. Social common interest is one of the methodologies to implement the recommendation system among users. It performs well in community with similar interest. The drawback of it ignores the outside influence from other communities. In this paper, a methodology to calculate the global influence from outside community and to implement the recommendation system is proposed. The results could be utilized to make the recommendation system not only in local communities but also notice the outside influence of item in social media.

Hendry, Rung-Ching Chen, Lijuan Liu

Virtual Balancing of Decision Classes

It has been observed in the literature and practice that the quality of classifying based on confidences of decision rules is poor when a decision table consists of decision classes which significantly differ in the number of objects. A typical approach to overcome negative consequences of the occurrence of this phenomenon is to apply oversampling of minority decision classes and/or undersampling of majority decision classes. In this paper, we introduce a notion of a virtual balancing of decision classes, which does not require any replication of data, but produces the same results as a physical balancing of decision classes. Also, we derive a number of properties of selected evaluation measures (coverage, confidence, lift and growth) of decision rules and relations among them w.r.t. virtually (and by this, physically) balanced decision classes. In particular, we show how to determine threshold values for confidence, lift, growth and coverage so that resulting sets of decision rules were identical.

Marzena Kryszkiewicz

Evaluation of Speech Perturbation Features for Measuring Authenticity in Stress Expressions

Expressions can vary by the authenticity level, i.e. the real amount of emotion present within the person when expressing it. They are often sincere, and thus authentic and natural; the person expresses what he/she feels. But play-acted expressions are also present in our lives in a form of deception, movies, theater, etc. It was shown in the literature that those two type of expressions are often hard to distinguish. While some studies concluded that play-acted expressions are more intense, exaggerated or stereotypical than the natural ones, other authors failed to detect such a behavior. The goal of our analysis is to investigate whether speech perturbation features, i.e. jitter, shimmer, variance and features of disturbances in laryngeal muscle coordination, can be used as a robust measure for the analysis of the stress expression authenticity. Two subsets of the SUSAS database (Speech Under Simulated and Actual Stress) – the Roller-coaster subset and the Talking Styles Domain – are used for this purpose. It was shown that perturbation features in general show statistically significant difference between realistic and acted expressions, only the jitter features generally failed to discriminate these two type of expressions. The rising trend of perturbation feature values is observed from acted- to real-stress expressions.

Branimir Dropuljić, Leo Mršić, Robert Kopal, Sandro Skansi, Andrijana Brkić

Intelligent and Context Systems

Frontmatter

Context Injection as a Tool for Measuring Context Usage in Machine Learning

Machine learning (ML) methods used to train computational models are one of the most valuable elements of the modern artificial intelligence. Thus preparing tools to evaluate ML training algorithms abilities to find inside the training data information (the context) crucial to build successful models is still an important topic. Within this text we introduce a new method of quantitative estimation of effectiveness of context usage by the ML training algorithms based on injection of predefined context to the training data sets. The results indicate that the proposed solution can be used as a general method of analyzing differences in context processing between ML training methods.

Maciej Huk

An Approach for Multi-Relational Data Context in Recommender Systems

Matrix factorization technique has been successfully used in recommender systems. Currently, many variations are developed using this technique, e.g., biased matrix factorization, non-negative matrix factorization, multi-relational matrix factorization, etc. In the context of multi-relational data, this paper proposes another multi-relational approach for recommender systems by including all of the information from latent factor matrices to the prediction functions so that the models have more data to learn. To validate the proposed approach, experiments are conducted on standard datasets in recommender systems. Experimental results show that the proposed approach is promising.

Nguyen Thai-Nghe, Mai Nhut-Tu, Huu-Hoa Nguyen

A Hybrid Feature Selection Method Based on Symmetrical Uncertainty and Support Vector Machine for High-Dimensional Data Classification

MicroRNA (miRNA) is a small, endogenous, and non-coding RNA that plays a critical regulatory role in various biological processes. Recently, researches based on microRNA expression profiles showed a new aspect of multiclass cancer classification. Due to the high dimensionality, however, classification of miRNA expression data contains several computational challenges. In this paper, we proposed a hybrid feature selection method for accurately classification of various cancer types based on miRNA expression data. Symmetrical uncertainty was employed as a filter part and support vector machine with best first search were used as a wrapper part. To validate the efficiency of the proposed method, we conducted several experiments on a real bead-based miRNA expression datasets and the results showed that our method can significantly improve the classification accuracy and outperformed the existing feature selection methods.

Yongjun Piao, Keun Ho Ryu

Selecting Important Features Related to Efficacy of Mobile Advertisements

With growing use of mobile devices, mobile advertisement is playing increasingly important role. It can reach potential customers at any time and place based on individual’s real-time needs. Factors for success of mobile advertisements are different from similar media like Television or large screen monitors. We investigated the important factors to enhance click through rate (CTR) for a mobile Ad. As CTR is directly related to revenue, it is used to measure success of a mobile Ad. To identify important factors that determine CTR, we took two approaches - one directly asking subjects, and the other, from analyzing their selective attention. Subjects were asked to respond to questionnaire. From the responses important features were selected using Least Absolute Shrinkage and Selection Operator (LASSO). For the other approach, selective attention was inferred from subjects’ eye-tracking data. When results from two approaches were compared, the findings were similar. Those features will be helpful for designing Ads favored by users, as well as could earn more revenues.

Goutam Chakraborty, L. C. Cheng, L. S. Chen, Cedric Bornand

A Comparative Study of Evolutionary Algorithms with a New Penalty Based Fitness Function for Feature Subset Selection

Feature subset selection is an important task for knowledge extraction from high dimensional huge data which reduces dimension of data, accelerates processing of data and improves classification accuracy. For mining of knowledge from huge data, feature subset selection acts as extraction of the context for classification process. Feature subset selection is basically an optimization problem in which a search technique is used to find out the best possible feature subset from all possible subsets of a large feature set with the use of a feature evaluation function. Evolutionary techniques are well known for their efficiency as search algorithms and are used in feature subset selection problem. In this work, a comparative study of well known EC algorithms (GA, PSO) and not so known algorithms (CS, GSA, FireFly and BAT) used for feature subset selection has been done with classification accuracy of a SVM classifier as the wrapper fitness function. A new fitness function with an added penalty term based on two objectives of improving classification accuracy while reducing dimension is proposed and its efficiency over classification accuracy alone is examined by simulation experiments with bench mark data sets. The simulation results show that the new fitness function is more effective than classification accuracy based fitness function. It also produces better results in reducing dimension and improving classification accuracy than using popular multi-objective search algorithm NSGA II for feature selection.

Atsushi Kawamura, Basabi Chakraborty

A Context-Aware Fitness Function Based on Feature Selection for Evolutionary Learning of Characteristic Graph Patterns

We propose a context-aware fitness function based on feature selection for evolutionary learning of characteristic graph patterns. The proposed fitness function estimates the fitness of a set of correlated individuals rather than the sum of fitness of the individuals, and specifies the fitness of an individual as its contribution degree in the context of the set. We apply the proposed fitness function to our evolutionary learning, based on Genetic Programming, for obtaining characteristic graph patterns from positive and negative graph data. We report some experimental results on our evolutionary learning of characteristic graph patterns, using the context-aware fitness function and a previous fitness function ignoring context.

Fumiya Tokuhara, Tetsuhiro Miyahara, Tetsuji Kuboyama, Yusuke Suzuki, Tomoyuki Uchida

Multiple Model Approach to Machine Learning

Frontmatter

Authenticating ANN-NAR and ANN-NARMA Models Utilizing Bootstrap Techniques

Neural system procedures have a colossal reputation in the space of gauging. In any case, there is yet to be a sure strategy that can well accept the last model of the neural system time arrangement demonstrating. Thus, this paper propose a way to deal with accepting the said displaying utilizing time arrangement square bootstrap. This straightforward technique is different compared to the traditional piece bootstrap of time-arrangement based, where it was composed by making utilization of every information set in the information apportioning procedure of neural system demonstrating; preparing set, testing set and approval set. At this point, every information set was separated into two little squares, called the odd and even pieces (non-covering pieces). At that point, from every piece, an arbitrary inspecting with substitution in a rising structure was made, and these duplicated tests can be named as odd-even square bootstrap tests. In time, the examples were executed in the neural system preparing for last voted expectation yield. The proposed strategy was forced on both manufactured neural system time arrangement models, which were nonlinear autoregressive (NAR) and nonlinear autoregressive moving normal (NARMA). In this study, three changing genuine modern month to month information of Malaysian development materials value records from January 1980 to December 2012 were utilized. It was found that the suggested bootstrapped neural system time arrangement models beat the first neural system time arrangement models.

Nor Azura Md. Ghani, Saadi bin Ahmad Kamaruddin, Norazan Mohamed Ramli, Ali Selamat

Method for Aspect-Based Sentiment Annotation Using Rhetorical Analysis

This paper fills a gap in aspect-based sentiment analysis and aims to present a new method for preparing and analysing texts concerning opinion and generating user-friendly descriptive reports in natural language. We present a comprehensive set of techniques derived from Rhetorical Structure Theory and sentiment analysis to extract aspects from textual opinions and then build an abstractive summary of a set of opinions. Moreover, we propose aspect-aspect graphs to evaluate the importance of aspects and to filter out unimportant ones from the summary. Additionally, the paper presents a prototype solution of data flow with interesting and valuable results. The proposed method’s results proved the high accuracy of aspect detection when applied to the gold standard dataset.

Łukasz Augustyniak, Krzysztof Rajda, Tomasz Kajdanowicz

On Quality Assesement in Wikipedia Articles Based on Markov Random Fields

This article investigates the possibility of accurate quality prediction of resources generated by communities based on the crowd-generated content. We use data from Wikipedia, the prime example of community-run site, as our object of study. We define the quality as a distribution of user-assigned grades across a predefined range of possible scores and present a measure of distribution similarity to quantify the accuracy of a prediction. The proposed method of quality prediction is based on Markov Random Field and its Loopy Belief Propagation implementation. Based on our results, we highlight key problems in the approach as presented, as well as trade-offs caused by relying solely on network structure and characteristics, excluding metadata. The overall results of content quality prediction are promising in homophilic networks.

Rajmund Kleminski, Tomasz Kajdanowicz, Roman Bartusiak, Przemyslaw Kazienko

Is a Data-Driven Approach Still Better Than Random Choice with Naive Bayes Classifiers?

We study the performance of data-driven, a priori and random approaches to label space partitioning for multi-label classification with a Gaussian Naive Bayes classifier. Experiments were performed on 12 benchmark data sets and evaluated on 5 established measures of classification quality: micro and macro averaged F1 score, subset accuracy and Hamming loss. Data-driven methods are significantly better than an average run of the random baseline. In case of F1 scores and Subset Accuracy - data driven approaches were more likely to perform better than random approaches than otherwise in the worst case. There always exists a method that performs better than a priori methods in the worst case. The advantage of data-driven methods against a priori methods with a weak classifier is lesser than when tree classifiers are used.

Piotr Szymański, Tomasz Kajdanowicz

An Expert System to Assist with Early Detection of Schizophrenia

Schizophrenia is one of neurobiological disorders whose symptoms appear in young age. Psychiatrists use client-centered therapy to recognize five symptoms of schizophrenia. They comprise: delusions, hallucinations, negative symptoms, grossly disorganized or abnormal motor behavior, and disorganized thinking and speech. Patients experiencing their acute psychotic episode must be quarantined to prevent their unsocial behavior. An early detection of the schizophrenia symptoms is necessary to avoid acute psychotic episodes. An expert system to aid in diagnosing early schizophrenia symptoms is presented in the paper. The system represents psychiatrist knowledge in the form of rules, facts and events and uses them to assess whether a patient suffers from schizophrenia. The knowledge-base is displayed in the form of questions to the patient. The expert system uses forward chaining while gathering the answers from patient. All answers are transformed into facts and processed using Boolean reasoning to generate the diagnosis. The diagnosis states whether the patient has schizophrenia, and if so, it indicates also the type of schizophrenia.which

Sonya Rapinta Manalu, Bahtiar Saleh Abbas, Ford Lumban Gaol, Lukas, Bogdan Trawiński

Backmatter

Titel: Intelligent Information and Database Systems
herausgegeben von: Ngoc Thanh Nguyen
Satoshi Tojo
Le Minh Nguyen
Bogdan Trawiński
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-54472-4
Print ISBN: 978-3-319-54471-7
DOI: https://doi.org/10.1007/978-3-319-54472-4