Skip to main content

2018 | Buch

Databases and Information Systems

13th International Baltic Conference, DB&IS 2018, Trakai, Lithuania, July 1-4, 2018, Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 13th International Baltic Conference on Databases and Information Systems, DB&IS 2018, held in Trakai, Lithuania, in July 2018.
The 24 revised papers presented were carefully reviewed and selected from 69 submissions. The papers are centered around topics like information systems engineering, enterprise information systems, business process management, knowledge representation, ontology engineering, systems security, information systems applications, database systems, machine learning, big data analysis, big data processing, cognitive computing.

Inhaltsverzeichnis

Frontmatter

Plenary Session

Frontmatter
Data Science and Advanced Digital Technologies
Abstract
The most topical challenges in data science are highlighted. The activities of Vilnius University Institute of Data Science and Digital Technologies are introduced. The institute pretends to solve at least a part of problems arising in this field, first of all, cognitive computing, blockchain technology, development of cyber-social systems and big data analytics gintautas.
Gintautas Dzemyda
A Light Insight into Latvian and Lithuanian ICT Terminology: Whether Kindred Language Imply Kindred Terminology?
Abstract
Lithuanian and Latvian are two closely related languages, the only two of the Baltic branch of Indo-European languages. They are quite similar and share a great deal of vocabulary and grammar features, but not close enough to make conversation possible. The paper reveals that commonalities between Latvian and Lithuanian information and communication technologies (ICT) terms are mostly due internationalisms, there are only small proportion of terms with common Baltic word-roots; influence of English language in Latvian and Lithuanian ICT terminology is moderate, if not minor; deliberately or unawares, Lithuanian terminologists follow the same rules as their Latvian counterparts.
Juris Borzovs

Invited Talks

Frontmatter
Business Process Analytics: From Insights to Predictions
Abstract
Business process analytics is a body of methods for analyzing data generated by the execution of business processes in order to extract insights about weaknesses and improvement opportunities, both at the tactical and operational levels. Tactical process analytics methods (also known as process mining) allow us to understand how a given business process is actually executed, if and how its execution deviates with respect to expected or normative pathways, and what factors contribute to poor process performance or undesirable outcomes. Meantime, operational process analytics methods allow us to monitor ongoing executions of a business process in order to predict future states and undesirable outcomes at runtime (predictive process monitoring). Existing methods in this space allow us to predict for example, which task will be executed next in a case, when, and who will perform it? When will an ongoing case complete? What will be its outcome and how can negative outcomes be avoided. This keynote paper presents a framework for conceptualizing business process analytics methods and applications. The paper and the keynote provide an overview of state-of-art methods and tools in the field and outline open challenges.
Marlon Dumas
Towards the Next-Generation Enterprise Information Systems – Research Opportunities and Challenges
Abstract
The increase in computing and storage performance achieved in a near past has brought attention of the researchers and practitioners back to different paradigms that could be effectively and efficiently used to solve many societal problems, such as Cloud architectures, Internet of Things (IoT), Artificial Intelligence (AI), Multi-Agent Systems (MAS) and Blockchain. This position paper discusses how all those new technologies, approaches and methodologies affect the landscape of Enterprise Information Systems (EIS). First, it identifies the properties of so-called Next-generation EIS (NG EIS). Then, it proposes the original approach to interoperability of systems, as inherent parent property of EIS. Finally, it presents the framework solution for the implementation problem. The proposed concepts are based on a wide discussion among members of IFAC TC5.3 (IFAC Technical Committee for Enterprise Integration and Networking) committee of the International Federation of Automatic Control.
Milan Zdravković
Languages of Baltic Countries in Digital Age
Abstract
Today, when we are surrounded by intelligent digital devices – computers, tablets and mobile phones, we expect communication with these devices in a natural language. Moreover, such communication needs to be in our native language. We also expect that language technologies will not only assist us in everyday tasks, but also will help to overcome problems caused by language barriers. This keynote will focus on language resources and tools that facilitate use of languages of three Baltic countries – Estonian, Latvian and Lithuanian -in digital means (computers, tablets, mobile phones), allow to minimize language barriers, facilitate social inclusion, and support more natural human-computer interaction, thus making digital services more “human”. Current situation, technological challenges and most important achievements in language technologies that help to narrow technological gap, facilitates use of natural language for interaction between computer and human, and minimize threat of digital extinction will be presented.
Inguna Skadiņa

Information Systems Engineering

Frontmatter
Towards the Trust Model for Industry 4.0
Abstract
In highly networked systems, such as Industry 4.0, it is essential to take care that only trustworthy elements participate in the network, otherwise the security of the system might be compromised and its functionality negatively influenced. Therefore it is important to identify whether the nodes in the network can be trusted by other elements of the system. There are different approaches for trust evaluation available in a variety of domains. However, the Industry 4.0 involves both human and artificial participants and imposes human-human, artifact-artifact; and human-artifact relationships in the system. This requires comparable interpretation and representation of trust in several areas. For this purpose the paper discusses trust interpretations and proposes trust models and trust dimensions in three areas relevant to Industry 4.0, namely, in the area of human-human interaction, in the area of human interaction with IT solutions, and in ad-hoc distributed sensing systems.
Marina Harlamova, Marite Kirikova
Towards the Reference Model for Security Risk Management in Internet of Things
Abstract
Security in the Internet of Things (IoT) systems is an important topic. In this paper we propose an initial comprehensive reference model to management security risks to the information and data assets managed and controlled in the IoT systems. Based on the domain model for the information systems security risk management, we explore how the vulnerabilities and their countermeasures defined in the open Web application security project could be considered in the IoT context. To illustrate applicability of the reference model we analyse how reported IoT security risks could be considered.
Raman Shapaval, Raimundas Matulevičius
Information Requirements for Big Data Projects: A Review of State-of-the-Art Approaches
Abstract
Big data technologies are rapidly gaining popularity and become widely used, thus, making the choice of developing methodologies including the approaches for requirements analysis more acute. There is a position that in the context of the Data Warehousing (DW), similar to other Decision Support Systems (DSS) technologies, defining information requirements (IR) can increase the chances of the project to be successful with its goals achieved. This way, it is important to examine this subject in the context of Big data due to the lack of research in the field of Big data requirements analysis. This paper gives an overview of the existing methods associated with Big data technologies and requirements analysis, and provides an evaluation by three types of criteria: (i) general characteristics, (ii) requirements analysis related, and (iii) Big data technologies related criteria. We summarize on the requirements analysis process in Big data projects, and explore solutions on how to (semi-) automate requirements engineering phases.
Natalija Kozmina, Laila Niedrite, Janis Zemnickis
Pattern Library for Use-Case-Based Application Logic Reuse
Abstract
This paper discusses the concept of patterns at a relatively early phase in software lifecycle, where detailed user-system dialogue (application logic) is defined. The dialogue is captured in generalised sequences of interactions performed by the system and its users, precisely linked with abstract domain vocabulary elements. We group individual interactions into sets of short scenarios which constitute “snippets” of system’s observable behaviour. In the paper we present several example patterns that form an initial library. We substantiate validity of the library with an example instantiation of patterns into a full and detailed use case specification. This instantiation consists in selecting patterns, combining them together and substituting abstract vocabulary elements with concrete ones. The resulting concrete application logic models can then be used as input to further automatic processing, including application code generation.
Michał Śmiałek, Albert Ambroziewicz, Rafał Parol
Impact of Demographic Differences on Color Preferences in the Interface Design of e-Services in Latvia
Abstract
In our study, we test users’ color preferences of interfaces of typical electronic services’ (e-services) environment. The motivation of our study is to explore discrepancy between (a) the high degree of availability of e-services in Latvia, and (b) low usage of these e-services among population. We aim to find the regularities regarding color preferences that could be useful in the development of e-services and that could support the rise of the usage of e-services in future. Although there are several reasons why people avoid using e-services (some of them socio-cognitive, some – technological), we would like to focus on a particular aspect – the colors of e-services interface. In particular, we test color preferences in respect to the interface depending on different demographic factors. Our hypothesis is that color contributes to the preference of using e-services and that such factors as gender, age, place of residence, education field, language knowledge, occupation, and hobbies determine the color preferences.
Jurģis Šķilters, Līga Zariņa, Signe Bāliņa, Dace Baumgarte
Asynchronous Client-Side Coordination of Cluster Service Sessions
Abstract
A system-to-system communication involving stateful sessions between a clustered service provider and a service consumer is investigated in this paper. An algorithm allowing to decrease a number of calls to failed provider nodes is proposed. It is designed for a clustered client and is based on an asynchronous communication. A formal specification of the algorithm is formulated in the TLA\(^+\) language and was used to investigate the correctness of the algorithm.
Karolis Petrauskas, Romas Baronas
Ping-Pong Tests on Distributed Processes Using Java Bindings of Open-MPI and Java Sockets with Applications to Distributed Database Performance
Abstract
The use of distributed database solutions is becoming more widespread due to their higher performance and storage capabilities compared to relational databases. Since these systems rely heavily on inter-process communications, an investigation on the effect of network latency is needed. In this paper, we examine the Java bindings of Open-MPI library running on InfiniBand and TCP/IP stack and the Java Socket API for TCP/IP communications with a simple ping-pong test with analysis of latency on performance of distributed in-memory key-value stores that operate in single data centers.
Mehmet Can Boysan
Model Based Approach for Testing: Distributed Real-Time Systems Augmented with Online Monitors
Abstract
Testing distributed systems requires an integration of computation, communication and control in the test architecture. This may pose number of issues that may not be suitably addressed by traditional centralized test architectures. In this paper, a distributed test framework for testing distributed real-time systems is presented, where online monitors (executable code as annotations) are integrated to systems to record relevant events. The proposed test architecture is more scalable than centralized architectures in the sense of timing constraints and geographical distribution. By assuming the existence of a coverage correct centralized remote tester, we give a partitioning algorithm of it to produce distributed local testers which enables to meet more flexible performance constraints while preserving the remote tester’s functionality. The proposed approach not only preserves the correctness of the centralized tester but also allows to meet stronger timing constraints for solving test controllability and observability issues. The effectiveness of the proposed architecture is demonstrated by an illustrative example.
Deepak Pal, Jüri Vain

Knowledge and Ontologies

Frontmatter
Domain Ontology for Expressing Knowledge of Variants of Thermally Modified Wood Products
Abstract
The thermally modified wood producer Thermory AS manufactures about 400 different products, which are ordered in large number of variants that makes the expression of the product variant knowledge and its validation very important. In this paper, we express knowledge of product variants as domain ontology in order to capture the product knowledge in the way that is consistent and shareable between humans and machines. Using Ontology Web Language (OWL) as Description Logics (DL) based ontology representation language enables to detect inconsistency in the product knowledge and customer order requirements. Constraints on valid product variants are expressed as OWL class expressions and as rules in Semantic Web Rule Language (SWRL). The provided knowledge representation method makes it possible to reduce combinatorial complexity of description of product variants and to place correct manufacturing orders saving time and money for the company.
Hele-Mai Haav, Riina Maigre
The Knowledge Increase Estimation Framework for Integration of Ontology Instances’ Relations
Abstract
The previous authors’ research showed that it is not only possible, but also profitable to estimate a potential growth of a level of knowledge that appears during an integration of ontologies. Such estimation can be done before the eventual integration procedure (or at least during such) which makes it even more valuable, because it allows to decide if a particular integration should be performed in the first place. Until now, authors of this paper prepared a formal framework that can be used to estimate the knowledge increase on the level of concepts, instances and relations between concepts. This paper is devoted to the level of relations between instances.
Adrianna Kozierkiewicz, Marcin Pietranik

Advanced Database Systems

Frontmatter
Proposal of an Unrestricted Character Encoding for Japanese
Abstract
The vast majority of characters used in Japanese are Chinese characters, which involve tens of thousands different glyphs. Due to this huge number of glyphs, database creation for character representation on computer systems has been an ongoing issue for years, and it has been actually since the early days of digital computing. Several character encodings have been described to allow the representation of character information, some specifically targeting Chinese characters, such as Big-5 and Shift-JIS, and others remaining general, such as Unicode. Yet, no matter the approach followed, it is still impossible to manipulate a large part of these characters as they are simply not covered by the current encoding solutions. Chinese characters feature various properties and relations, thus making it possible to classify them in a database according to several attributes. In this paper, we formally describe for such a large character database a structure in the form of a character encoding, thus aiming at addressing the concrete issue of character computer representation. It shall be shown that the proposed structure addresses the restrictions, such as coherency and glyph number, suffered by existing works. Finally, a database corresponding to the presented character encoding is practically assembled and visualised, demonstrating the advanced code structure.
Antoine Bossard, Keiichi Kaneko
Facilitation of Health Professionals Responsible Autonomy with Easy-to-Use Hospital Data Querying Language
Abstract
Support for the development of responsible autonomy as opposite to management that is based on direct control is found to be by far more effective approach in healthcare management, especially when it concerns physicians as the most influential group of health professionals. It is therefore important to obtain a process-oriented knowledge system where physicians would be able to autonomously answer questions which are outside the scope of pre-made direct control reports. However, the ad-hoc data querying process is slow and error-prone due to inability of health professionals to access data directly without involving IT experts. The problem lies in the complexity of means used to query data. We propose a new natural language- and semistar ontology-based ad-hoc data querying approach which reduces the steep learning curve required to be able to query data. The proposed approach would significantly decrease the time needed to master the ad-hoc data querying thus allowing health professionals an independent exploration of the data.
Edgars Rencis, Juris Barzdins, Mikus Grasmanis, Agris Sostaks
Efficient Model Repository for Web Applications
Abstract
Many model-based applications have been developed with standalone usage in mind. When migrating such applications to the web, we have to think about multiple users competing for limited server resources. In addition, we encounter the need to synchronize models via the network for client-side access. Thus, there is the risk that the model storage could become a bottleneck.
We propose a model repository that deals with these issues by using an efficient encoding of the model that resembles its Kolmogorov complexity. The encoding is suitable for direct sending over the network (with almost no overhead); it can also be used “as-is” in memory-mapped files, thus, utilizing the OS paging mechanism. By adding just 3 automatic indices, all traverse and query operations can be implemented efficiently. Our tests show that the proposed model repository outperforms other repositories concerning both CPU and memory and is able to hold 10,000 and more instances at the same time on a single server.
Sergejs Kozlovičs

Big Data Analysis and Processing

Frontmatter
Scalable Hadoop-Based Infrastructure for Big Data Analytics
Abstract
Cloud architectures are being used increasingly to support Big Data analytics by organizations that make ad hoc or routine use of the cloud in lieu of acquiring their own infrastructure. On the other hand, Hadoop has become the de-facto standard for storing and processing Big Data. It is hard to overstate how many advantages come with moving Hadoop into the cloud. The most important is scalability, meaning that the underlying infrastructure can be expanded or contracted according to the actual demand on resources. This paper presents a scalable Hadoop-based infrastructure for Big Data analytics, one that gets automatically adjusted if more computing power or storage capacity is needed. Adjustments are transparent to the users – the users seem to have nearly unlimited computation and storage resources.
Irina Astrova, Arne Koschel, Felix Heine, Ahto Kalja
Application of Graph Clustering and Visualisation Methods to Analysis of Biomolecular Data
Abstract
In this paper we present an approach based on integrated use of graph clustering and visualisation methods for semi-supervised discovery of biologically significant features in biomolecular data sets. We describe several clustering algorithms that have been custom designed for analysis of biomolecular data and feature an iterated two step approach involving initial computation of thresholds and other parameters used in clustering algorithms, which is followed by identification of connected graph components, and, if needed, by adjustment of clustering parameters for processing of individual subgraphs.
We demonstrate the applications of these algorithms to two concrete use cases: (1) analysis of protein coexpression in colorectal cancer cell lines; and (2) protein homology identification from, both sequence and structural similarity, data.
Edgars Celms, Kārlis Čerāns, Kārlis Freivalds, Paulis Ķikusts, Lelde Lāce, Gatis Melkus, Mārtiņš Opmanis, Dārta Rituma, Pēteris Ručevskis, Juris Vīksna
A New Knowledge-Transmission Based Horizontal Collaborative Fuzzy Clustering Algorithm for Unequal-Length Time Series
Abstract
This paper focuses on the clustering of unequal-length time series which appear frequently in reality. How to deal with the unequal lengths is the key step in the clustering process. In this paper, we will change the given unequal-length clustering problem into several equal-length clustering sub-problems by dividing the unequal-length time series into equal-length time series. For each sub-problem, we can use the standard fuzzy c-means algorithm to get the clustering result which is represented by a partition matrix and a set of cluster centers. In order to obtain the final clustering result of the original clustering problem, we will use the horizontal collaborative fuzzy clustering algorithm to fuse the clustering results of these sub-problems. In the process of collaboration, the collaborative knowledge is transmitted by partition matrixes whose sizes should be the same. But in the scenario here, the obtained partition matrixes most often have different sizes, thus we cannot directly use the horizontal collaborative fuzzy clustering algorithm. Taking into account the collaborative mechanism of the horizontal collaborative fuzzy clustering algorithm, this paper here presents a novel method for extending the partition matrixes to have same sizes. This method can make the partition knowledge be effectively transmitted and thus assume the good final clustering results. Experiments showed the effectiveness of the proposed method.
Shurong Jiang, Jianlong Wang, Fusheng Yu
Non-index Based Skyline Analysis on High Dimensional Data with Uncertain Dimensions
Abstract
The notion of skyline query is to find a set of objects that is not dominated by any other objects. Regrettably, existing works lack on how to conduct skyline queries on high dimensional uncertain data with objects represented as continuous ranges and exact values, which in this paper is referred to as uncertain dimensions. Hence, in this paper we define skyline queries over data with uncertain dimensions and propose an algorithm, SkyQUD, to efficiently answer skyline queries. The SkyQUD algorithm determines skyline objects through three methods that guaranteed the probability of each object being in the final skyline results: exact domination, range domination, and uncertain domination. The algorithm has been validated through extensive experiments employing real and synthetic datasets. Results exhibit our proposed algorithm is efficient and scalable in answering skyline query on high dimensional and large datasets with uncertain dimensions.
Nurul Husna Mohd Saad, Hamidah Ibrahim, Fatimah Sidi, Razali Yaakob

Cognitive Computing

Frontmatter
Semi-automatic Quasi-morphological Word Segmentation for Neural Machine Translation
Abstract
This paper proposes the Prefix-Root-Postfix-Encoding (PRPE) algorithm, which performs close-to-morphological segmentation of words as part of text pre-processing in machine translation. PRPE is a cross-language algorithm requiring only minor tweaking to adapt it for any particular language, a property which makes it potentially useful for morphologically rich languages with no morphological analysers available. As a key part of the proposed algorithm we introduce the ‘Root alignment’ principle to extract potential sub-words from a corpus, as well as a special technique for constructing words from potential sub-words. We conducted experiments with two different neural machine translation systems, training them on parallel corpora for English-Latvian and Latvian-English translation. Evaluation of translation quality showed improvements in BLEU scores when the data were pre-processed using the proposed algorithm, compared to a couple of baseline word segmentation algorithms. Although we were able to demonstrate improvements in both translation directions and for both NMT systems, they were relatively minor, and our experiments show that machine translation with inflected languages remains challenging, especially with translation direction towards a highly inflected language.
Jānis Zuters, Gus Strazds, Kārlis Immers
Effective Online Learning Implementation for Statistical Machine Translation
Abstract
Online learning has been an active research area in statistical machine translation. However, as we have identified in our research, the implementation of successful online learning capabilities in the Moses SMT system can be challenging. In this work, we show how to use open source and freely available tools and methods in order to successfully implement online learning for SMT systems that allow improving translation quality. In our experiments, we compare the baseline implementation in Moses to an improved implementation utilising a two-step tuning strategy. We show that the baseline implementation achieves unstable performance (from −6 to \(+\)6 BLEU points in online learning scenarios and over −6 BLEU points in translation scenarios, i.e., when post-edits were not returned to the SMT system). However, our devised two-step tuning strategy is able to successfully utilise online learning capabilities and is able to improve MT quality in the online learning scenario by up to \(+\)12 BLEU points.
Toms Miks, Mārcis Pinnis, Matīss Rikters, Rihards Krišlauks
Investigation of Text Attribution Methods Based on Frequency Author Profile
Abstract
The task of text analysis with the objective to determine text’s author is a challenge the solutions of which have engaged researchers since the last century. With the development of social networks and platforms for publishing of web-posts or articles on the Internet, the task of identifying authorship becomes even more acute. Specialists in the areas of journalism and law are particularly interested in finding a more accurate approach in order to resolve disputes related to the texts of dubious authorship. In this article authors carry out an applicability comparison of eight modern Machine Learning algorithms like Support Vector Machine, Naive Bayes, Logistic Regression, K-nearest Neighbors, Decision Tree, Random Forest, Multilayer Perceptron, Gradient Boosting Classifier for classification of Russian web-post collection. The best results were achieved with Logistic Regression, Multilayer Perceptron and Support Vector Machine with linear kernel using combination of Part-of-Speech and Word N-grams as features.
Polina Diurdeva, Elena Mikhailova
Implementing a Face Recognition System for Media Companies
Abstract
During the past few years face recognition technologies have greatly benefited from the huge progress in machine learning and now have achieved precision rates that are even comparable with humans. This allows us to apply face recognition technologies more effectively for a number of practical problems in various businesses like media monitoring, security, advertising, entertainment that we previously were not able to do due to low precision rates of existing face recognition technologies. In this paper we discuss how to build a face recognition system for media companies and share our experience gained from implementing one for Latvian national news agency LETA. Our contribution is: which technologies to use, how to build a practical training dataset, how large should it be, how to deal with unknown persons.
Arturs Sprogis, Karlis Freivalds, Elita Cirule

Applications and Case Studies

Frontmatter
What Language Do Stocks Speak?
Abstract
Stock prediction is a challenging and chaotic research area where many variables are included with their effects being complex to determine. Nevertheless, stock value prediction is still very appealing for researchers and investors since it might be profitable, yet the number of published research papers remains to be relatively small. The employment of advanced data analysis techniques has already been suggested by previous researches, such as the use of neural networks for stock price prediction, but practical implications of the majority of approaches are limited as they are concerned mainly with a prediction accuracy and less with the success in real trading with consideration of trading fees. We propose a novel approach for stock trend prediction that combines Japanese candlesticks (OHLC trading data) and neural network based group of models Word2Vec. Word2Vec is usually utilized to produce word embeddings in natural language processing tasks, while we adopt it for acquiring semantic context of words in candlesticks’ sequence, where clustered candlesticks represent stock’s words. The approach is employed for the extraction of useful information from large sets of OHLC trading data to improve prediction accuracy. In evaluation of our approach we define a trading strategy and compare our approach with other popular prediction models – Buy & Hold, MA and MACD. The evaluation results on Russell Top 50 index are encouraging – the proposed Word2Vec approach outperformed all compared models on a test set with a statistical significance.
Marko Poženel, Dejan Lavbič
The Algorithm for Constrained Shortest Path Problem Based on Incremental Lagrangian Dual Solution
Abstract
Most of the systems that rely on the solution of shortest path problem or constrained shortest demand real-time response to unexpected real world events that affect the input graph of the problem such as car accidents, road repair works or simply dense traffic. We developed new incremental algorithm that uses data already present in the system in order to quickly update a solution under new conditions. We conducted experiments on real data sets represented by road graphs of the cities of Oldenburg and San Joaquin. We test the algorithm against that of Muhandiramge and Boland [1] and show that it provides up to 50% decrease in computation time compared to solving the problem from scratch.
Boris Novikov, Roman Guralnik
Current Perspectives on the Application of Bayesian Networks in Different Domains
Abstract
Bayesian networks are powerful tools for representing relations of dependence among variables of a domain under uncertainty. Over the last decades, applications of Bayesian networks have been developed for a wide variety of subject areas, in tasks such as learning, modeling, forecasting and decision-making. Out of hundreds of related papers found, we picked a sample of 150 to study the trends of such applications over a 16-year interval. We classified the publications according to their corresponding domain of application, and then analyzed the tendency to develop Bayesian networks in determined areas of research. We found a set of indicators that help better explain these tendencies: the levels of formalization, data accuracy and data accessibility of a domain, and the level of human intervention in the primary data. The results and methodology of the current study provide insight into potential areas of research and application of Bayesian networks.
Galina M. Novikova, Esteban J. Azofeifa
Backmatter
Metadaten
Titel
Databases and Information Systems
herausgegeben von
Audrone Lupeikiene
Olegas Vasilecas
Gintautas Dzemyda
Copyright-Jahr
2018
Electronic ISBN
978-3-319-97571-9
Print ISBN
978-3-319-97570-2
DOI
https://doi.org/10.1007/978-3-319-97571-9

Premium Partner