Skip to main content

2019 | Buch

Recent Developments in Data Science and Intelligent Analysis of Information

Proceedings of the XVIII International Conference on Data Science and Intelligent Analysis of Information, June 4–7, 2018, Kyiv, Ukraine

herausgegeben von: Oleg Chertov, Tymofiy Mylovanov, Yuriy Kondratenko, Janusz Kacprzyk, Vladik Kreinovich, Vadim Stefanuk

Verlag: Springer International Publishing

Buchreihe : Advances in Intelligent Systems and Computing

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the XVIII International Conference on Data Science and Intelligent Analysis of Information (ICDSIAI'2018), held in Kiev, Ukraine on June 4-7, 2018. The conference series, which dates back to 2001 when it was known as the Workshop on Intelligent Analysis of Information, was renamed in 2008 to reflect the broadening of its scope and the composition of its organizers and participants. ICDSIAI'2018 brought together a large number of participants from numerous countries in Europe, Asia and the USA. The papers presented addressed novel theoretical developments in methods, algorithms and implementations for the broadly perceived areas of big data mining and intelligent analysis of data and information, representation and processing of uncertainty and fuzziness, including contributions on a range of applications in the fields of decision-making and decision support, economics, education, ecology, law, and various areas of technology.

The book is dedicated to the memory of the conference founder, the late Professor Tetiana Taran, an outstanding scientist in the field of artificial intelligence whose research record, vision and personality have greatly contributed to the development of Ukrainian artificial intelligence and computer science.

Inhaltsverzeichnis

Frontmatter

Machine Learning: Novel Methods and Applications

Frontmatter
Use of Symmetric Kernels for Convolutional Neural Networks

At this work we introduce horizontally symmetric convolutional kernels for CNNs which make the network output invariant to horizontal flips of the image. We also study other types of symmetric kernels which lead to vertical flip invariance, and approximate rotational invariance. We show that usage of such kernels acts as regularizer, and improves generalization of the convolutional neural networks at the cost of more complicated training process.

Viacheslav Dudar, Vladimir Semenov
Neural Network User Authentication by Geometry of the Auricle

The article is devoted to the development of a neural network model intended for use in the system of biometric user authentication based on the analysis of the geometry of the auricle. It is determined that from the point of view of using neural network methods the main features of the recognition task are the number of recognizable users, the size and quality of the auricle images, and the number and parameters of the characteristic features of the auricle. There was shown the expediency of using a convolutional neural network, the parameters of which must be adapted to the peculiarities of the recognition problem. There are proposed the principles of adaptation of structural parameters. The number of convolutional layers should correspond to the amount of image recognition levels of the auricle by an expert. The amount of feature maps in the n convolutional layer should be equal to the amount of features at the n recognition level. The map of n layer features, corresponding to the j recognition feature, is associated only with those maps of features of the previous layer, which are used for the construction of this figure. The size of the convolution kernel for the n convolutional layer must be equal to the size of the recognizable features on the n hierarchical level. The use of convolutional layers should not distort the geometric parameters of the features used for the image recognition of the auricle. Based on the proposed principles and revealed features of the problem of image recognition of the auricle, there was developed an appropriate method of adapting the structural parameters of the convolutional neural network. Conducted computer experiments showed satisfactory recognition accuracy, which confirms the prospects of the proposed solutions. It is shown that further research should be connected with the formation of the methodological base for adapting the main components of the mathematical support to the features of image recognition of the user’s auricle in a biometric authentication system.

Berik Akhmetov, Ihor Tereikovskyi, Liudmyla Tereikovska, Asselkhan Adranova
Race from Pixels: Evolving Neural Network Controller for Vision-Based Car Driving

Modern robotics uses many advanced precise algorithms to control autonomous agents. Now arises tendency to apply machine learning in niches, where precise algorithms are hard to design or implement. With machine learning, for continuous control tasks, evolution strategies are used. We propose an enhancement to crossover operator, which diminishes probability of degraded offsprings compared to conventional crossover operators. Our experiments in TORCS environment show, that presented algorithm can evolve robust neural networks for non-trivial continuous control tasks such as driving a racing car in various tracks.

Borys Tymchenko, Svitlana Antoshchuk
Application of Neuro-Controller Models for Adaptive Control

In this paper, a method for constructing a model of a controller based on recurrent neural network architecture for implementation of control for the optimal trajectory finding problem is considered. A type of a neuro-controller based on recurrent neural network architecture with long short-term memory blocks as a knowledge base on the external environment and previous states of the controller is proposed. The formalization of the technological cycle of a special type for adaptive control of a production process using the model of the neuro-controller is given.

Viktor Smorodin, Vladislav Prokhorenko
Forecasting of Forest Fires in Portugal Using Parallel Calculations and Machine Learning

Forest fires that occurred in Portugal on June 18, 2017 caused several dozens of human casualties. The cause of their emergence, as well as many others that occurred in Western Europe at the same time, remained unknown. The heliocentric hypothesis has indirectly been tested, according to which charged particles are a possible cause of forest fires. We must point out that it was not possible to verify whether in this specific case the particles by reaching the ground and burning the plant mass create initial phase of the formation of flame. Therefore, we have tried to determine whether during the critical period, i.e. on June 15–19, there was a certain statistical connection between certain parameters of the solar wind (SW) and meteorological elements. Based on the hourly values of the charged particles flow, a correlation analysis was performed with hourly values of individual meteorological elements including time lag at Monte Real station. The application of the adaptive neuro-fuzzy inference systems has shown that there is a high degree of connection between the flow of protons and the analyzed meteorological elements in Portugal. However, further verification of this hypothesis requires further laboratory testing.

Yaroslav Vyklyuk, Milan M. Radovanović, Volodymyr Pasichnyk, Nataliia Kunanets, Sydor Petro
A New Non-Euclidean Proximal Method for Equilibrium Problems

The paper analyzes the convergence of a new iterative algorithm for approximating solutions of equilibrium problems in finite-dimensional real vector space. Using the Bregman distance instead of the Euclidean, we modified the recently proposed two-stage proximal algorithm. The Bregman distance allows us to take into account the geometry of an admissible set effectively in some important cases. Namely, with the suitable choice of distance, we obtain a method with explicitly solvable auxiliary problems on the iterative steps. The convergence of the algorithm is proved under the assumption that the solution exists and the bifunction is pseudo-monotone and Lipschitz-type.

Lyubov Chabak, Vladimir Semenov, Yana Vedel

Data Analysis Using Fuzzy Mathematics, Soft Computing, Computing with Words

Frontmatter
Perceptual Computing Based Method for Assessing Functional State of Aviation Enterprise Employees

Safety of aircraft flights and aviation security in general highly depend on the correctness of actions performed by ground traffic control officers and on how well informed are the decisions made by the supervising officers. That said, the task of developing methods and tools for increasing the level of functional security of aviation enterprises is pressing. Such methods and tools must be aimed at determining factors causing negative influence of the human factor on aviation security, and mitigating their negative impact, including increasing awareness of decision makers.In this paper, we present methods for increasing efficiency of control over aviation enterprise employees functional state during recruitment and afterwards. The proposed approach allows the decision maker to rank employees according to their functional state and thereby choose the most fit for the job.Special features of the domain in question are the qualitative nature of assessments by experts of the components of the functional state of a person (physiological, physical, psychological, etc.), the presence of a wide range of uncertainties, and its ill-defined nature. To account for these features, we propose to apply results from the theory of perceptual computing, thereby making expert assessment of the functional state of employees of airports in a linguistic form.The proposed approach is evaluated using examples involving expert assessment of the functional state of airport employees of two different work profiles.

Dan Tavrov, Olena Temnikova, Volodymyr Temnikov
Multi-criteria Decision Making and Soft Computing for the Selection of Specialized IoT Platform

The task of the appropriate selection of the specialized Internet of Things (IoT) platform is very relevant today. The complexity of the selection process is due to (a) the large number of IoT platforms, which are available on the IoT services market, and (b) the variety of services and features, which they offer. In this paper, the multi-criteria decision making (MCDM) and the soft computing approaches for choosing the specialized IoT platform are considered. Authors illustrate solving MCDM problem using the linear convolution method with simple ranking approach to forming weight coefficients for criteria. MCDM methods have some limitations: (a) the need to take into account weight coefficients of the criteria; (b) the composition of the Pareto-optimal set of alternative decisions; (c) the lack of ability to change the dimension of the vector of alternatives and criteria in real time; (d) significant impact of weight coefficients that the expert determines on the result. Thus, the authors propose to use the soft computing approach, in particular, Mamdani-type fuzzy logic inference engine for selection of the specialized IoT platform. Relevant factors (reliability, dependability, safety, and security of IoT platforms) are considered as the most important ones for decision making in the IoT platform selection processes. In addition, analysis and research of the influence level of various factors on the selection of specialized IoT platform have been carried out. Special cases of choosing the specialized IoT platform with confirmation of the appropriateness of the using soft computing approach are discussed.

Yuriy Kondratenko, Galyna Kondratenko, Ievgen Sidenko
How Intelligence Community Interprets Imprecise Evaluative Linguistic Expressions, and How to Justify this Empirical-Based Interpretation

To provide a more precise meaning to imprecise evaluative linguistic expressions like “probable” or “almost certain”, researchers analyzed how often intelligence predictions hedged by each corresponding evaluative expression turned out to be true. In this paper, we provide a theoretical explanation for the resulting empirical frequencies.

Olga Kosheleva, Vladik Kreinovich
How to Explain Empirical Distribution of Software Defects by Severity

In the last decades, several tools have appeared that, given a software package, mark possible defects of different potential severity. Our empirical analysis has shown that in most situations, we observe the same distribution or software defects by severity. In this paper, we present this empirical distribution, and we use interval-related ideas to provide an explanation for this empirical distribution.

Francisco Zapata, Olga Kosheleva, Vladik Kreinovich

Applications of Data Science to Economics. Applied Data Science Systems

Frontmatter
Tobacco Spending in Georgia: Machine Learning Approach

The purpose of this study is to analyze tobacco spending in Georgia using various machine learning methods applied to a sample of 10,757 households from the Integrated Household Survey collected by GeoStat in 2016. Previous research has shown that smoking is the leading cause of death for 35–69 year olds. In addition, tobacco expenditures may constitute as much as 17% of the household budget. Five different algorithms (ordinary least squares, random forest, two gradient boosting methods and deep learning) were applied to 8,173 households (or 76.0%) in the train set. Out-of-sample predictions were then obtained for 2,584 remaining households in the test set. Under the default settings, a random forest algorithm showed the best performance with more than 10% improvement in terms of root-mean-square error (RMSE). Improved accuracy and availability of machine learning tools in R calls for active use of these methods by policy makers and scientists in health economics, public health and related fields.

Maksym Obrizan, Karine Torosyan, Norberto Pignatti
Explaining Wages in Ukraine: Experience or Education?

In this article, we analyze a large database of job vacancies in Ukraine, webscrapped from Work.ua website in January–February 2017. The obtained dataset was processed with bag-of-words approach. Exploratory data analysis revealed that experience and city influence wages. For example, wages in the capital are much higher than in other cities. To explain variation in wages, we used three models to predict wages: multiple linear regression, decision tree and random forest; the latter has demonstrated the best explanatory power. Our work has confirmed the old finding by Mincer that experience is an important variable that explains wages. In fact, this factor was the most informative. Education, however, was an unimportant factor to determine wages. English, teamwork, sales skills, car driving and programming languages are the skills for which modern employers are willing to pay.

Valentyna Sinichenko, Anton Shmihel, Ivan Zhuk
On the Nash Equilibrium in Stochastic Games of Capital Accumulation on a Graph

This paper discusses some applications of the theory of controlled Markov fields defined on some finite undirected graph. This graph describes a system of “neighborhood dependence” of the evolution of a random process described by local and synchronously change of state of vertices, depending on the decisions made in them. The main attention is paid to solving the problem of finding a Nash equilibrium for stochastic games of capital accumulation with many players.

Ruslan Chornei
Indoor and Outdoor Air Quality Monitoring on the Base of Intelligent Sensors for Smart City

People experience the problems of air quality every day, either inside or outdoors. The best solution to mitigate the problem inside the buildings is to open opening the windows. It is not just the most efficient, but also the cheapest solution. However, opening windows might only worsen the situation in the room in the case of excess air pollutants in the big cities. Consequently, one should use another method of improving air quality inside. Often people cannot recognize whether air quality is good enough inside, therefore, there is a need for a system which could monitor the air conditions inside and outside the buildings, analyze it and give recommendations for improving the air quality. Air quality monitoring is one of the important topics of the SMURBS/ERA-PLANET project within the European Commission’s Horizon-2020 program. This study addresses the problem of using remote sensing data and Copernicus ecological biophysical models for air quality assessment in the city, and proposes the intelligent solution based on indoor and outdoor sensors for air quality monitoring controlled by a fuzzy logic decision block. We are planning to implement the distributed system in the framework of the Smart City concept in Kyiv (Ukraine) within the SMURBS project.

Andrii Shelestov, Leonid Sumilo, Mykola Lavreniuk, Vladimir Vasiliev, Tatyana Bulanaya, Igor Gomilko, Andrii Kolotii, Kyrylo Medianovskyi, Sergii Skakun
Assessment of Sustainable Development Goals Achieving with Use of NEXUS Approach in the Framework of GEOEssential ERA-PLANET Project

In this paper, we propose methodology for calculating indicators of sustainable development goals within the GEOEssential project, that is a part of ERA-PLANET Horizon 2020 project. We consider indicators 15.1.1 Forest area as proportion of total land area, 15.3.1 Proportion of land that is degraded over total land area, and 2.4.1. Proportion of agricultural area under productive and sustainable agriculture. For this, we used remote sensing data, weather and climatic models’ data and in-situ data. Accurate land cover maps are important for precisely land cover changes assessment. To improve the resolution and quality of existing global land cover maps, we proposed our own deep learning methodology for country level land cover providing. For calculating essential variables, that are vital for achieving indicators, NEXUS approach based on idea of fusion food, energy, and water was applied. Long-term land cover change maps connected with land productivity maps are essential for determining environment changes and estimation of consequences of anthropogenic activity.

Nataliia Kussul, Mykola Lavreniuk, Leonid Sumilo, Andrii Kolotii, Olena Rakoid, Bohdan Yailymov, Andrii Shelestov, Vladimir Vasiliev
An Efficient Remote Disaster Management Technique Using IoT for Expeditious Medical Supplies to Affected Area: An Architectural Study and Implementation

Creating technology enhanced optimized strategies to handle the subsequent healthcare issues emerged from natural calamities and disaster has now become more tranquil with the latest advancements in networking and low power electronics. Proposed system addresses an immediate action plan in accordance with the various adverse medical affronts during and post disaster events. The study focuses on developing strategies for disaster management in India, where flooding of rivers is one of the frequently occurring drastic event around the years in various regions resulting many people died due to unavailability of medical facilities. There are many studies conducted which proves that managing medical emergencies like cardiac seizure during such disasters where doctors unable to reach in these remotely affected areas due to damaged transportation systems by flood and they are struggling to provide prompt delivery of medical as well as relief to the remotely affected areas. Thus, it is gaining significant attention by disaster management organizations. Therefore, this study proposes a classic solution for handling various medical emergencies that occurs during and post disaster events. The system uses the drones to carry a weight of 1.5 kg to 2 kg medical kits to the affected areas. Miniaturized IoT based medical devices are designed with various Wireless Body Area Sensors (WBAS) and actuators. Along with a defibrillator unit and ECG analyzer. These medical kits are placed in a connected drone and this will drive to the affected areas aerially. All the devices are IoT enabled and are connected through the central cloud infrastructure of hospital by which medical experts can access the required medical parameters and convey the instructions through drone to a caretaker in real time to perform related events which may possibly postpone casualties.

Vidyadhar Aski, Sanjana Raghavendra, Akhilesh K. Sharma

Knowledge Engineering Methods. Ontology Engineering. Intelligent Educational Systems

Frontmatter
Method of Activity of Ontology-Based Intelligent Agent for Evaluating Initial Stages of the Software Lifecycle

Importance of the task of automated evaluation of initial stages of the software lifecycle on the basis of software requirements specifications (SRS) analysis and the need for information technology of new generation for the software engineering domain necessitates the development of agent-oriented information technology for evaluating initial stages of the software lifecycle on the basis of ontological approach. The purpose of this study is the development of the method of activity of ontology-based intelligent agent for evaluating initial stages of the software lifecycle. The intelligent agent, which works on the basis of the developed method, evaluates the sufficiency of information in the SRS for assessing the non-functional software features—provides the conclusion about the sufficiency or insufficiency of information, the numerical evaluation of the level of sufficiency of information in the SRS for assessment of each non-functional feature in particular and all non-functional features in general, the list of attributes (measures) and/or indicators, which should be supplemented in the SRS for increasing the level of sufficiency of SRS information. During the experiments, the intelligent agent examined the SRS for the transport logistics decision support system and found that the information in this SRS is not sufficient for assessing the quality by ISO 25010 and for assessing quality by metric analysis.

Tetiana Hovorushchenko, Olga Pavlova
Ontological Representation of Legal Information and an Idea of Crowdsourcing for Its Filling

This article represents consideration of the creation process of legal knowledge ontology for study purposes. The peculiarities of legal information and experience of legal knowledge formalization have been scrutinized. The peculiarities of complex systems self-organization have been considered and application of these principles to legal information on the basis of four features of self-organization has been proved. It has been determined that the most reasonable way of legal knowledge description is ontology, as a basis for forming of knowledge structure. The review of existing ontologies that are used in the field of law has been carried out. Mathematical description of the knowledge base structure has been introduced. The software package has been developed for working with legal knowledge ontology. This package of programs is used by students at the Yaroslav Mudryi National Law University. The method of collective filling and editing of the knowledge base is proposed to be used as the basis of methodology for working with the knowledge base. The ontology of legal knowledge at the University has been created not only by experts but by all the users. Principles of crowdsourcing are considered as a basic technique of technological process of the ontology filling. Results of filling of this ontology by a number of users have been briefly reviewed. The legal knowledge ontology that is being created is proposed to be used for forming an individual learning style of students.

Anatolii Getman, Volodymyr Karasiuk, Yevhen Hetman, Oleg Shynkarov
Methods for Automated Generation of Scripts Hierarchies from Examples and Diagnosis of Behavior

The aim of the research is to increase the reliability of the behavior diagnostics by developing new models and methods based on scripts automatically extracted from data. An improved model of script hierarchies is proposed by adding concepts of role, forest of hierarchies, as well as the support function that connects them. An improved model of multilevel behavior pattern construction is proposed. That, unlike existing models, enabled using methods based on machine learning, along with an expert, to formulate scripts. The 2-staged method for diagnosing the objects behavior based on script hierarchies is developed: at the first stage, identification of the tested behavior to one or several script hierarchies is made; in the second stage, based on the naive Bayesian classifier, it is detected if the object belongs to one or more classes. Approbation of models and methods for the subject area of detecting malicious programs is carried out. The results show an increase in detection reliability.

Viktoriia Ruvinskaya, Alexandra Moldavskaya
Development of the Mathematical Model of the Informational Resource of a Distance Learning System

Analysis of the existing models of knowledge representation in informational systems allowed to make a conclusion about considerable advantages of combined network models, which are able to take into account indistinct content of some information. That’s why the semantic network that differs from the well-known ones by the peculiarities stated in this article is accepted as a mathematical model of informational resource of a distance learning system.

Vyacheslav Shebanin, Igor Atamanyuk, Yuriy Kondratenko, Yuriy Volosyuk
Use of Information Technologies to Improve Access to Information in E-Learning Systems

Various forms of training use e-learning systems. Existing e-learning systems do not have the ability to find information quickly in all the training courses that have been downloaded, which limits access to the information provided. Several commonly used search engines were researched to access the possibility of using them for efficient access to all materials presented in the e-learning system. Two e-learning systems EFront and Moodle were considered in terms of organization of the database structure. The EFront system is based on the database with clustered indices, and it was chosen for testing search engines. Use of an additional programming module connected to EFront system is proposed. This module would allow a full-text search of the information needed. During the research, two modules of full-text searching were considered. The Sphinx technology was selected as the most efficient one. Thus, the proposed solutions made it possible to expand the students’ ability to access the necessary educational information.

Tetiana Hryhorova, Oleksandr Moskalenko
Multi-agent Approach Towards Creating an Adaptive Learning Environment

The paper describes a concept of an intelligent agent-based tutoring system to guide students throughout the course material. The concept of adaptivity is applied to an adaptive education system, based on the profiling of students using the Felder and Silverman model of learning styles. Benefits that the multi-agent organization structure provides for an adaptive tutoring system are outlined. In this paper, a conceptual framework for adaptive learning systems is given. The framework is based on the idea that adaptivity is finding the best match between the learner’s profile and the course content’s profile. Learning styles of learners and content type of learning material are used to match the learner to the most suitable content.

Maksim Korovin, Nikolay Borgest

Intelligent Search and Information Analysis in Local and Global Networks

Frontmatter
Usage of Decision Support Systems in Information Operations Recognition

In this paper, usage of decision support systems in information operations recognition is described. Based on information of experts, the knowledge engineer constructs a knowledge base of subject domain, using decision support system tools. The knowledge base provides the basis for specification of queries for analysis of dynamics of respective informational scenarios using text analytics means. Based on the analysis and the knowledge base structure, decision support system tools calculate the achievement degree of the main goal of the information operation as a complex system, consisting of specific informational activities. Using the results of these calculations, decision makers can develop strategic and tactical steps to counter-act the information operation, evaluate the operation’s efficiency, as well as efficiencies of its separate components. Also, decision support system tools are used for decomposition of information operation topics and evaluation of efficiency rating of these topics in dynamics.

Oleh Andriichuk, Dmitry Lande, Anastasiya Hraivoronska
How Click-Fraud Shapes Traffic: A Case Study

This paper provides a real-life case-study of click-fraud. We aim to investigate the influence of invalid clicks on the time series of advertising parameters, such as the number of clicks and click-through-rate. Our results show that it can be challenging to visually identify click-fraud in real traffic. However, using powerful methods of signal analysis such as ‘Caterpillar’-SSA allows efficiently discovering fraudulent components. Finally, our findings confirm the hypothesis from previous works that attacks can be discovered via behavioral modeling of an attacker.

Dmytro Pavlov, Oleg Chertov
The Model of Words Cumulative Influence in a Text

A new approach to evaluation of the influence of words in a text is considered. An analytical model of the influence of words is presented. The appearance of a word is revealed as a surge of influence that extends to the subsequent text as part of the approach. Effects of individual words are accumulated. Computer simulation of the spread of influence of words is carried out; a new method of visualization is proposed as well. The proposed approach is demonstrated using an example of J.R.R. Tolkien’s novel “The Hobbit.” The proposed and implemented visualization method generalizes already existing methods of visualizing of the unevenness of words presence in a text.

Dmytro Lande, Andrei Snarskii, Dmytro Manko
Mathematical and Computer Models of Message Distribution in Social Networks Based on the Space Modification of Fermi-Pasta-Ulam Approach

The article proposes a new class of models for message distribution in social networks based on specific systems of differential equations, which describe the information distribution in the chain of the network graph. This class of models allows to take into account specific mechanisms for transmitting messages. Vertices in such graphs are individuals who, receiving a message, initially form their attitude towards it, and then decide on the further transmission of this message, provided that the corresponding potential of the interaction of two individuals exceeds a certain threshold level. Authors developed the original algorithm for calculating time moments of message distribution in the corresponding chain, which comes to the solution of a series of Cauchy problems for systems of ordinary nonlinear differential equations. These systems can be simplified, and part of the equations can be replaced with the Boussinesq or Korteweg-de Vries equations. The presence of soliton solutions of the above equations provides grounds for considering social and communication solitons as an effective tool for modeling processes of distributing messages in social networks and investigating diverse influences on their dissemination.

Andriy Bomba, Natalija Kunanets, Volodymyr Pasichnyk, Yuriy Turbal
Social Network Structure as a Predictor of Social Behavior: The Case of Protest in the 2016 US Presidential Election

This research explores relationships between social network structure (as inferred from Twitter posts) and the occurrence of domestic protests following the 2016 US Presidential Election. A hindcasting method is presented which exploits Random Forest classification models to generate predictions about protest occurrence that are then compared to ground truth data. Results show a relationship between social network structure and the occurrence of protests that is stronger or weaker depending on the time frame of prediction.

Molly Renaud, Rostyslav Korolov, David Mendonça, William Wallace
Internet Data Analysis for Evaluation of Optimal Location of New Facilities

Today, most purchases and orders for various services are made via the Internet. Network users are looking for the products they need, rest facilities and evaluate the positive and negative reviews from other users. By using web analytics tools, we can understand the search prospects of the requests, the likes and the popularity of a particular institution in a certain area of the city, as well as understand which areas of the city need such institutions most of all. Thus, we can create software to determine the best places to build new recreational facilities that will take into account the search needs, territorial peculiarities and the popularity of institutions - potential competitors. In this article, current software solutions and their disadvantages for solving the problem of determining the optimal location of the future institution of rest within the city are analyzed. The data on positive reviews, search and site visits statistics is analyzed. On the basis of the obtained data, a probabilistic model of optimal placement of the institution is constructed taking into account the distance of the institutions and their possible range of influence. Software solution, which allows you to simulate the optimal location of the future rest establishment, using only Internet data is proposed.

Liubov Oleshchenko, Daria Bilohub, Vasyl Yurchyshyn

Novel Theoretical and Practical Applications of Data Science

Frontmatter
Algebra of Clusterizable Relations

Relations are usually represented in a space of attributes whose values differ only in names similar to algebra of sets. The order of the values or any other preference measures are not significant for such attributes. The paper proposes a mathematical model based on n-tuple algebra (NTA), for relations in which the values of attributes are ordered. For this case, a mathematical tool has been developed that can be used to perform not only the previously discussed methods and means of logical-semantic analysis on the basis of NTA, including analysis of defeasible reasoning and logic-probabilistic analysis, but also to analyze the order and connectivity of structures and implement clustering methods. The concept of granules is introduced, the power of connectivity between the granules is defined, and methods to calculate distances between the disconnected granules are proposed. The obtained dependencies make it possible to extend the scope of classification techniques.

Boris Kulik, Alexander Fridman
Constructive Proofs of Heterogeneous Equalities in Cubical Type Theory

This paper represents the very small part of the developed base library for homotopical prover based on Cubical Type Theory (CTT) announced in 2017. We demonstrate the usage of this library by showing how to build a constructive proof of heterogeneous equality, the simple and elegant formulation of the equality problem, that was impossible to achieve in pure Martin-Löf Type Theory (MLTT). The machinery used in this article unveils the internal aspect of path equalities and isomorphism, used e.g. for proving univalence axiom, that became possible only in CTT. As an example of complex proof that was impossible to construct in earlier theories we took isomorphism between Nat and Fix Maybe datatypes and built a constructive proof of equality between elements of these datatypes. This approach could be extended to any complex isomorphic data types.

Maksym Sokhatskyi, Pavlo Maslianko
Improved Method of Determining the Alternative Set of Numbers in Residue Number System

The article analyzes the most well-known practical methods of determining the alternative set (AS) of numbers in a residue numeral system (RNS). The AS determining is most frequently required to perform error verification, diagnosing and correction of data in RNS, that was introduced to a minimal information redundancy in the computational process dynamics. This suggests the occurrence of only a single error in a number. The main downside of the reviewed methods is a significant time needed to determine the AS. In order to reduce time for AS determining in RNS, one of the known methods has been improved in the article. The idea of method improvement supposes preliminary correspondence table compilation (first stage tables) for each correct number out of informational numeric range of a possible set of incorrect numbers, that are not included into the range. Based on the analysis of tables content, the second stage table is being compiled, which contains the correspondence of each incorrect number out of numeric range to a possible values of correct numbers. By applying introduced method, efficiency of data verification, diagnosing and correction is increased due to time reduction of the AS numbers determining in RNS.

Victor Krasnobayev, Alexandr Kuznetsov, Sergey Koshman, Sergey Moroz
Method of an Optimal Nonlinear Extrapolation of a Noisy Random Sequence on the Basis of the Apparatus of Canonical Expansions

Method of optimal nonlinear extrapolation of a random sequence provided that the measurements are carried out with an error is developed using the apparatus of canonical expansions. Filter-extrapolator does not impose any essential limitations on the class of predictable random sequences (linearity, Markovian behavior, stationarity, monotony etc.) that allows to achieve maximum accuracy of the solution of a prediction problem. The results of a numerical experiment on a computer confirmed high effectiveness of the introduced method of the prediction of the realizations of random sequences. Expression for a mean-square error of extrapolation allows to estimate the quality of a prediction problem solving using a developed method. The method can be used in different spheres of science and technics for the prediction of the parameters of stochastic objects.

Igor Atamanyuk, Vyacheslav Shebanin, Yuriy Kondratenko, Valerii Havrysh, Yuriy Volosyuk
Regularization of Hidden Markov Models Embedded into Reproducing Kernel Hilbert Space

Hidden Markov models (HMMs) are well-known probabilistic graphical models for time series of discrete, partially observable stochastic processes. In this paper, we discuss an approach to extend the application of HMMs to non-Gaussian continuous distributions by embedding the belief about the state into a reproducing kernel Hilbert space (RKHS), and reduce tendency to overfitting and computational complexity of algorithm by means of various regularization techniques, specifically, Nyström subsampling. We investigate, theoretically and empirically, regularization and approximation bounds, the effectiveness of kernel samples as landmarks in the Nyström method for low-rank approximations of kernel matrices. Furthermore, we discuss applications of the method to real-world problems, comparing the approach to several state-of-the-art algorithms.

Galyna Kriukova, Mykola Glybovets
Application of Probabilistic-Algebraic Simulation for the Processing and Analysis of Large Amounts of Data

The paper considers a way of processing and analysis of large volumes of data characterizing network objects. We propose the application of probabilistic-algebraic simulation, which takes into account the probabilistic nature of the object and substantially reduces the computational complexity of the analysis at all stages of object research, namely: the processing stage, the analysis stage, the interpretation stage.

Viktor Smorodin, Elena Sukach
Multiblock ADMM with Nesterov Acceleration

ADMM (alternating direction methods of multipliers) is used for solving many optimization problems. This method is particularly important in machine learning, statistics, image, and signal processing. The goal of this research is to develop an improved version of ADMM with better performance. For this purpose, we use combination of two approaches, namely, decomposition of original optimization problem into N subproblems and calculating Nesterov acceleration step on each iteration. We implement proposed algorithm using Python programming language and apply it for solving basis pursuit problem with randomly generated distributed data. We compare efficiency of ADMM with Nesterov acceleration and existing multiblock ADMM and classic two-block ADMM.

Vladyslav Hryhorenko, Dmitry Klyushin
Input Information in the Approximate Calculation of Two-Dimensional Integral from Highly Oscillating Functions (Irregular Case)

Nowadays, methods for digital signal and image processing are widely used in scientific and technical areas. Current stage of research in astronomy, radiology, computed tomography, holography, and radar is characterized by broad use of digital technologies, algorithms, and methods. Correspondingly, an issue of development of new or improvement of known mathematical models arose, especially for new types of input information. There are the cases when input information about function is given on the set of traces of the function on planes, the set of traces of the function on lines, and the set of values of the function in the points. The paper is dedicated to the improvement of mathematical models of digital signal processing and imaging by the example of constructing formulas of approximate calculation of integrals of highly oscillating functions of two variables (irregular case). The feature of the proposed methods is using the input information about function as a set of traces of function on lines. The estimation of proposed method has been done for the Lipschitz class and class of differentiable functions. The proposed formula is based on the algorithm, which is also effective for a class of discontinuous functions.

Oleg M. Lytvyn, Olesia Nechuiviter, Iulia Pershyna, Vitaliy Mezhuyev
A New Approach to Data Compression

The idea of the method of data processing in this paper is to replace the data with operators that use functions (traces of unknown functions on the specified geometric objects such as points, lines, surfaces, stripes, tubes, or layers). This approach allows: first, to carry out data processing with parallelization of calculations; second, if the data are time dependent, to construct forecast operators at the functional level (extrapolation operators); third, to compress information. The paper will analyze the methods of constructing interlineation operators of functions of two or more variables, interflatation of functions of three or more variables, interpolation of functions of two or more variables, intertubation of functions of three or more variables, interlayeration of functions of three or more variables, interlocation of functions of two or more variables. Then, we will compare them with the interpolation operators of functions of corresponding number of variables.The possibility of using known additional information about the investigated object as well as examples of objects or processes that allow to test the specified method of data processing in practice. Examples are given of constructing interpolation operators of the function of many variables using interlineation and interflatation, which require less data about the approximate function than operators of classical spline-interpolation. In this case, the order of accuracy of the approximation is preserved.

Oleh M. Lytvyn, Oleh O. Lytvyn
Backmatter
Metadaten
Titel
Recent Developments in Data Science and Intelligent Analysis of Information
herausgegeben von
Oleg Chertov
Tymofiy Mylovanov
Yuriy Kondratenko
Janusz Kacprzyk
Vladik Kreinovich
Vadim Stefanuk
Copyright-Jahr
2019
Electronic ISBN
978-3-319-97885-7
Print ISBN
978-3-319-97884-0
DOI
https://doi.org/10.1007/978-3-319-97885-7