main-content

## Über dieses Buch

The inspiration for this book came from the Industrial Session of the ISMIS 2017 Conference in Warsaw. It covers numerous applications of intelligent technologies in various branches of the industry. Intelligent computational methods and big data foster innovation and enable the industry to overcome technological limitations and explore the new frontiers. Therefore it is necessary for scientists and practitioners to cooperate and inspire each other, and use the latest research findings to create new designs and products. As such, the contributions cover solutions to the problems experienced by practitioners in the areas of artificial intelligence, complex systems, data mining, medical applications and bioinformatics, as well as multimedia- and text processing. Further, the book shows new directions for cooperation between science and industry and facilitates efficient transfer of knowledge in the area of intelligent information systems.

## Inhaltsverzeichnis

### Nonlinear Forecasting of Energy Futures

Abstract
This paper proposes the use of the Brownian distance correlation for feature selection and for conducting a lead-lag analysis of energy time series. Brownian distance correlation determines relationships similar to those identified by the linear Granger causality test, and it also uncovers additional non-linear relationships among the log return of oil, coal, and natural gas. When these linear and non-linear relationships are used to forecast the direction of energy futures log return with a non-linear classification method such as support vector machine, the forecast of energy futures log return improve when compared to a forecast based only on Granger causality.
Germán G. Creamer

### Implementation of Generic Steering Algorithms for AI Agents in Computer Games

Abstract
This paper proposes a set of generic steering algorithms for autonomous AI agents along with the structure of the implementation of a movement layer designed to work with said algorithms. The algorithms are meant for further use in computer animation in computer games - they provide a smooth and realistic base for the animation of the agent’s movement and are designed to work with any graphic environment and physics engine, thus providing a solid, versatile layer of logic for computer game AI engines. Basic algorithms (called steering behaviors) based on dynamics have been thoroughly described, as well as some methods of combining the behaviors into more complex ones. Applications of the algorithms are demonstrated along with possible problems in their usage and the solutions to said problems. The paper also presents results of studies upon the behaviors within a closed, single-layered AI module consisting only out of a movement layer, thus removing the bias inflicted by pathfinding and decision making.
Mateusz Modrzejewski, Przemysław Rokita

### The Dilemma of Innovation–Artificial Intelligence Trade-Off

Abstract
Dialectic that confronts pros and cons is a long-time methodology pursued for a better understanding of old and new problems. It was already practiced in ancient Greece to help get insight into the current and expected issues. In this paper we make use of this methodology to discuss some relationships binding innovation, technology and artificial intelligence, and culture. The main message of this paper is that even sophisticated technologies and advanced innovations such as for example those that are equipped with artificial intelligence are not a panacea for the increasing contradictions, problems and challenges contemporary societies are facing. Often we have to deal with a trade-off dilemma that confronts the gains provided by innovations with downsides they may cause. We claim that in order to resolve such dilemmas and to work out plausible solutions one has to refer to culture sensu largo.
Mieczyslaw Muraszkiewicz

### Can We Build Recommender System for Artwork Evaluation?

Abstract
The aim of this paper is to propose a strategy of building recommender system for assigning a price tag to an artwork. The other goal is to verify a hypothesis about existence of a co-relation between certain attributes used to describe a painting and its price. The paper examines the possibility of using methods of data mining in the field of art marketing. It also describes the main aspects of system architecture and performed data mining experiments as well as processes connected with data collection from the World Wide Web.
Cezary Pawlowski, Anna Gelich, Zbigniew W. Raś

### Modelling OpenStreetMap Data for Determination of the Fastest Route Under Varying Driving Conditions

Abstract
We propose a network graph for determining the fastest route under varying driving conditions based on OpenStreetMap data. The introduced solution solves the fastest point-to-point path problem. We present a method of transformation the OpenStreetMap data into a network graph and a few transformation for improving the graph obtained by almost directly mapping the source data into a destination model. For determination of the fastest route we use the modified version of Dijkstra’s algorithm and a time-dependent model of network graph where the flow speed of each edge depends on the time interval.
Grzegorz Protaziuk, Robert Piątkowski, Robert Bembenik

### Evolution Algorithm for Community Detection in Social Networks Using Node Centrality

Abstract
Community structure identification has received a great effort among computer scientists who are focusing on the properties of complex networks like the internet, social networks, food networks, e-mail networks and biochemical networks. Automatic network clustering can uncover natural groups of nodes called communities in real networks that reveals its underlying structure and functions. In this paper, we use a multiobjective evolution community detection algorithm, which forms center-based communities in a network exploiting node centrality. Node centrality is easy to use for better partitions and for increasing the convergence of evolution algorithm. The proposed algorithm reveals the center-based natural communities with high quality. Experiments on real-world networks demonstrate the efficiency of the proposed approach.
Krista Rizman Žalik

### High Performance Computing by the Crowd

Abstract
Nunziato Cassavia, Sergio Flesca, Michele Ianni, Elio Masciari, Giuseppe Papuzzo, Chiara Pulice

### Zero-Overhead Monitoring of Remote Terminal Devices

Abstract
This paper presents a method of delivering diagnostic information from data acquisition terminals via legacy low-throughput transmission system with no overhead. The solution was successfully implemented in an intrinsically safe RFID system for contactless identification of people and objects developed for coal mines in the end of 1990s. First, the goals and main characteristics of the application system are described, with references to underlying technologies. Then transmission system and the idea of diagnostic solution is presented. Due to confidentiality reasons some technical and business details have been omitted.
Jerzy Chrząszcz

### Asynchronous Specification of Production Cell Benchmark in Integrated Model of Distributed Systems

Abstract
There are many papers concerning well-known Karlsruhe Production Cell benchmark. They focus on specification of the controller—which leads to a synthesis of working controller—or verification of its operation. The controller is modeled using various methods: programming languages, algebras or automata. Verification is based on testing, bisimulation or temporal model checking. Most models are synchronous. Asynchronous specifications use one- or multi-element buffers to relax the dependence of component subcontrollers. We propose the application of fully asynchronous IMDS (Integrated Model of Distributed Systems) formalism. In our model the subcontrollers do not use any common variables or intermediate states. We apply distributed negotiations between subcontrollers using a simple protocol. The verification is based on CTL (Computation Tree Logic) model checking integrated with IMDS.
Wiktor B. Daszczuk

### Implementing the Bus Protocol of a Microprocessor in a Software-Defined Computer

Abstract
The paper describes a concept of a software-defined computer implemented using a classic 8-bit microprocessor and a modern microcontroller with ARM Cortex-M core for didactic and experimental purposes. Crucial to this design is the timing analysis and implementation of microprocessor’s bus protocol using hardware and software resources of the microcontroller. The device described in the paper, SDC_One, is a proof-of-concept design, successfully demonstrating the software-defined computer idea and showing the possibility of implementing time-critical logic functions using a microcontroller. The project is also a complex exercise in real-time embedded system design, pushing the microcontroller to its operational limits by exploiting advanced capabilities of selected hardware peripherals and carefully crafted firmware. To achieve the required response times, the project uses advanced capabilities of microcontroller peripherals – timers and DMA controller. Event response times achieved with the microcontroller operating at 80 MHz clock frequency are below 200 ns and the interrupt frequency during the computer’s operation exceeds 500 kHz.
Julia Kosowska, Grzegorz Mazur

### ISMIS 2017 Data Mining Competition: Trading Based on Recommendations - XGBoost Approach with Feature Engineering

Abstract
This paper presents an approach to predict trading based on recommendations of experts using XGBoost model, created during ISMIS 2017 Data Mining Competition: Trading Based on Recommendations. We present a method to manually engineer features from sequential data and how to evaluate its relevance. We provide a summary of feature engineering, feature selection, and evaluation based on experts recommendations of stock return.
Katarzyna Baraniak

### Fast Discovery of Generalized Sequential Patterns

Abstract
Knowledge in the form of generalized sequential patterns finds many applications. In this paper, we focus on optimizing GSP, which is a well-known algorithm for discovering such patterns. Our optimization consists in more selective identification of nodes to be visited while traversing a hash tree with candidates for generalized sequential patterns. It is based on the fact that elements of candidate sequences are stored as ordered sets of items. In order to reduce the number of visited nodes in the hash tree, we also propose to use not only parameters windowSize and maxGap as in original GSP, but also parameter minGap. As a result of our optimization, the number of candidates that require final time-consuming verification may be considerably decreased. In the experiments we have carried out, our optimized variant of GSP was several times faster than standard GSP.
Marzena Kryszkiewicz, Łukasz Skonieczny

### Seismic Attributes Similarity in Facies Classification

Abstract
Seismic attributes are one of the component of reflection seismology. Formerly the advances in computer technology have led to an increase in number of seismic attributes and thus better geological interpretation. Nowadays, the overwhelming number and variety of seismic attributes make the interpretation less unequivocal and can lead to slow performance. Using the correlation coefficients, similarities and hierarchical grouping the analysis of seismic attributes was carried out on several real datasets. We try to identify key seismic attributes (also the weak ones) that help the most with machine learning seismic attribute analysis and test the selection with Random Forest algorithm. Obtained quantitative factors help with the overall look at the data. Initial tests have shown some regularities in the correlations between seismic attributes. Some attributes are unique and potentially very helpful with information retrieval while others form non-diverse groups. These encouraging results have the potential for transferring the work to practical geological interpretation.
Marcin Lewandowski, Łukasz Słonka

### Efficient Discovery of Sequential Patterns from Event-Based Spatio-Temporal Data by Applying Microclustering Approach

Abstract
Discovering various types of frequent patterns in spatiotemporal data is gaining attention of researchers nowadays. We consider spatiotemporal data represented in the form of events, each associated with location, type and occurrence time. The problem is to discover all significant sequential patterns denoting spatial and temporal relations between event types. In the paper, we adapted a microclustering approach and use it to effectively and efficiently discover sequential patterns and to reduce size of dataset of instances. Appropriate indexing structure has been proposed and notions already defined in the literature have been reformulated. We modify algorithms already defined in literature and propose an algorithm called Micro-ST-Miner for discovering sequential patterns in event-based spatiotemporal data.
Piotr S. Macia̧g

### Unsupervised Machine Learning in Classification of Neurobiological Data

Abstract
In many cases of neurophysiological data analysis, the best results can be obtained using supervised machine learning approaches. Such very good results were obtained in detection of neurophysiological recordings recorded within Subthalamic Nucleus ($${ STN}$$) during deep brain stimulation (DBS) surgery for Parkinson disease. Supervised machine learning methods relay however on external knowledge provided by an expert. This becomes increasingly difficult if the subject’s domain is highly specialized as is the case in neurosurgery. The proper computation of features that are to be used for classification without good domain knowledge can be difficult and their proper construction heavily influences quality of the final classification. In such case one might wonder whether, how much and to what extent the unsupervised methods might become useful. Good result of unsupervised approach would indicate presence of a natural grouping within recordings and would also be a further confirmation that features selected for classification and clustering provide good basis for discrimination of recordings recorded within Subthalamic Nucleus ($${ STN}$$). For this test, the set of over 12 thousand of brain neurophysiological recordings with precalculated attributes were used. This paper shows comparison of results obtained from supervised - random forest based - method with those obtained from unsupervised approaches, namely K-Means and Hierarchical clustering approaches. It is also shown, how inclusion of certain types of attributes influences the clustering based results.

### Incorporating Fuzzy Logic in Object-Relational Mapping Layer for Flexible Medical Screenings

Abstract
Introduction of fuzzy techniques in database querying allows for flexible retrieval of information and inclusion of imprecise expert knowledge into the retrieval process. This is especially beneficial while analyzing collections of patients’ biomedical data, in which similar results of laboratory tests may lead to the same conclusions, diagnoses, and treatment scenarios. Fuzzy techniques for data retrieval can be implemented in various layers of database client-server architecture. However, since in the last decade, the development of real-life database applications is frequently based on additional object-relational mapping (ORM) layers, inclusion of fuzzy logic in data analysis remains a challenge. In this paper, we show our extensions to the Doctrine ORM framework that supply application developers with the possibility of fuzzy querying against collections of crisp data stored in relational databases. Performance tests prove that these extensions do not introduce a significant slowdown while querying data and can be successfully used in development of applications that benefit from fuzzy information retrieval.
Bożena Małysiak-Mrozek, Hanna Mazurkiewicz, Dariusz Mrozek

### Multimodal Learning Determines Rules of Disease Development in Longitudinal Course with Parkinson’s Patients

Abstract
Parkinson’s disease (PD) is neurodegenerative disease (ND) related to the lost of dopaminergic neurons that elevates first by motor and later also by non-motor (dementia, depression) disabilities. Actually, there is no cure for ND as we are not able to revive death cells. Our purpose was to find, with help of data mining and machine learning (ML), rules that describe and predict disease progression in two groups of PD patients: 23 BMT patients that are taking only medication; 24 DBS patients that are on medication and on DBS (deep brain stimulation) therapies. In the longitudinal course of PD there were three visits approximately every 6 months with the first visit for DBS patients before electrode implantation. We have estimated disease progression as UPDRS (unified Parkinson’s disease rating scale) changes on the basis of patient’s disease duration, saccadic eye movement parameters, and neuropsychological tests: PDQ39, and Epworth tests. By means of ML and rough set theory we found rules on the basis of the first visit of BMT patients and used them to predict UPDRS changes in next two visits (global accuracy was 70% for both visits). The same rules were used to predict UPDRS in the first visit of DBS patients (global accuracy 71%) and the second (78%) and third (74%) visit of DBS patients during stimulation-ON. These rules could not predict UPDRS in DBS patients during stimulation-OFF visits. In summary, relationships between condition and decision attributes were changed as result of the surgery but restored by electric brain stimulation.
Andrzej W. Przybyszewski, Stanislaw Szlufik, Piotr Habela, Dariusz M. Koziorowski

### Comparison of Methods for Real and Imaginary Motion Classification from EEG Signals

Abstract
A method for feature extraction and results of classification of EEG signals obtained from performed and imagined motion are presented. A set of 615 features was obtained to serve for the recognition of type and laterality of motion using 8 different classifications approaches. A comparison of achieved classifiers accuracy is presented in the paper, and then conclusions and discussion are provided. Among applied algorithms the highest accuracy was achieved with: Rough Set, SVM and ANN methods.
Piotr Szczuko, Michał Lech, Andrzej Czyżewski

### Procedural Generation of Multilevel Dungeons for Application in Computer Games using Schematic Maps and L-system

Abstract
This paper presents a method for procedural generation of multilevel dungeons, by processing set of schematic input maps and using L-system for shape generation. Existing solutions usually focus on generation of 2D systems or only consider creation of cave-like structures. If any 3D underground systems are considered, they tend to require large amount of computation, usually not allowing user any considerable level of control over generation process. Because of that, most of existing solutions are not suitable for applications such as computer games. We propose our solution to that problem, allowing generation of multilevel dungeon systems, with complex layouts, based on simplified maps. User can define all key properties of generated dungeon, including its layout, while results are represented as easily editable 3D meshes. Final objects generated by our algorithm can be used in computer games or similar applications.
Izabella Antoniuk, Przemysław Rokita

### An HMM-Based Framework for Supporting Accurate Classification of Music Datasets

Abstract
In this paper, we use Hidden Markov Models (HMM) and Mel-Frequency Cepstral Coefficients (MFCC) to build statistical models of classical music composers directly from the music datasets. Several musical pieces are divided by instruments (String, Piano, Chorus, Orchestra), and, for each instrument, statistical models of the composers are computed. We selected 19 different composers spanning four centuries by using a total number of 400 musical pieces. Each musical piece is classified as belonging to a composer if the corresponding HMM gives the highest likelihood for that piece. We show that the so-developed models can be used to obtain useful information on the correlation between the composers. Moreover, by using the maximum likelihood approach, we also classified the instrumentation used by the same composer. Besides as an analysis tool, the described approach has been used as a classifier. This overall originates an HMM-based framework for supporting accurate classification of music datasets. On a dataset of String Quartet movements, we obtained an average composer classification accuracy of more than $$96\%$$. As regards instrumentation classification, we obtained an average classification of slightly less than $$100\%$$ for Piano, Orchestra and String Quartet. In this paper, the most significant results coming from our experimental assessment and analysis are reported and discussed in detail.
Alfredo Cuzzocrea, Enzo Mumolo, Gianni Vercelli

### Classification of Music Genres by Means of Listening Tests and Decision Algorithms

Abstract
The paper compares the results of audio excerpt assignment to a music genre obtained in listening tests and classification by means of decision algorithms. A short review on music description employing music styles and genres is given. Then, assumptions of listening tests to be carried out along with an online survey for assigning audio samples to selected music genres are presented. A framework for music parametrization is created resulting in feature vectors, which are checked for data redundancy. Finally, the effectiveness of the automatic music genre classification employing two decision algorithms is presented. Conclusions contain the results of the comparative analysis of the results obtained in listening tests and automatic genre classification.
Aleksandra Dorochowicz, Piotr Hoffmann, Agata Majdańczuk, Bożena Kostek

### Handwritten Signature Verification System Employing Wireless Biometric Pen

Abstract
The handwritten signature verification system being a part of the developed multimodal biometric banking stand is presented. The hardware component of the solution is described with a focus on the signature acquisition and on verification procedures. The signature is acquired employing an accelerometer and a gyroscope built-in the biometric pen plus pressure sensors for the assessment of the proper pen grip and then the signature verification method based on adapted Dynamic Time Warping (DTW) method is applied. Hitherto achieved FRR and FAR measures for the verification based exclusively on the biometric pen sensors and for the comparison on the parameters retrieved from the signature scanning pad are compared.
Michał Lech, Andrzej Czyżewski

### Towards Entity Timeline Analysis in Polish Political News

Abstract
Our work presents a simple method of analysing occurrences of entities in news articles. We demonstrate that frequency of named entities in news articles is a reflection of events in real world related to these entities. Occurrences and co-occurrences of entities between portals were compared. We made visualisation of entities frequency in a timeline which can be used to analyse the history of entity occurrences.
Katarzyna Baraniak, Marcin Sydow

### Automatic Legal Document Analysis: Improving the Results of Information Extraction Processes Using an Ontology

Abstract
Information Extraction (IE) is a pervasive task in the industry that allows to obtain automatically structured data from documents in natural language. Current software systems focused on this activity are able to extract a large percentage of the required information, but they do not usually focus on the quality of the extracted data. In this paper we present an approach focused on validating and improving the quality of the results of an IE system. Our proposal is based on the use of ontologies which store domain knowledge, and which we leverage to detect and solve consistency errors in the extracted data. We have implemented our approach to run against the output of the AIS system, an IE system specialized in analyzing legal documents and we have tested it using a real dataset. Preliminary results confirm the interest of our approach.
María G. Buey, Cristian Roman, Angel Luis Garrido, Carlos Bobed, Eduardo Mena

### To Improve, or Not to Improve; How Changes in Corpora Influence the Results of Machine Learning Tasks on the Example of Datasets Used for Paraphrase Identification

Abstract
In this paper we attempt to verify the influence of data quality improvements on results of machine learning tasks. We focus on measuring semantic similarity and use the SemEval 2016 datasets. To achieve consistent annotations, we made all sentences grammatically and lexically correct, and developed formal semantic similarity criteria. The similarity detector used in this research was designed for the SemEval English Semantic Textual Similarity (STS) task. This paper addresses two fundamental issues: first, how each characteristic of the chosen sets affects performance of similarity detection software, and second, which improvement techniques are most effective for provided sets and which are not. Having analyzed these points, we present and explain the not obvious results we obtained.
Krystyna Chodorowska, Barbara Rychalska, Katarzyna Pakulska, Piotr Andruszkiewicz

### Context Sensitive Sentiment Analysis of Financial Tweets: A New Dictionary

Abstract
Sentiment analysis can make a contribution to behavioral economics and behavioral finance. It is concerned with the effect of opinions and emotions on economical or financial decisions. In sentiment analysis, or in opinion mining as they often call it, emotions or opinions of various degrees are assigned to the text (tweets in this case) under consideration. This paper describes an application of a lexicon-based domain-specific approach to a set of tweets in order to calculate sentiment analysis of the tweets. Further, we introduce a domain-specific lexicon for the financial domain and compare the results with those reported in other studies. The results show that using a context-sensitive set of positive and negative words, rather than one that includes general keywords, produces better outcomes than those achieved by humans on the same set of tweets.