nach oben

2015 | Buch

Kapitel lesen Erstes Kapitel lesen

Beyond Databases, Architectures and Structures

11th International Conference, BDAS 2015, Ustroń, Poland, May 26-29, 2015, Proceedings

herausgegeben von: Stanisław Kozielski, Dariusz Mrozek, Paweł Kasprowski, Bożena Małysiak-Mrozek, Daniel Kostrzewa

Verlag: Springer International Publishing

Buchreihe : Communications in Computer and Information Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the refereed proceedings of the 11th International Conference entitled Beyond Databases, Architectures and Structures, BDAS 2015, held in Ustroń, Poland, in May 2015.

This book consists of 53 carefully revised selected papers that are assigned to 8 thematic groups: database architectures and performance; data integration, storage and data warehousing; ontologies and semantic web; artificial intelligence, data mining and knowledge discovery; image analysis and multimedia mining; spatial data analysis; database systems development; application of database systems.

Inhaltsverzeichnis

Frontmatter

Invited Papers

Frontmatter

New Metrics and Related Statistical Approaches for Efficient Mining in Very Large and Highly Multidimensional Databases

As regard to the evolution of the concept of text and to the continuous growth of textual information of multiple nature which is available online, one of the important issues for linguists and information analysts for building up assumptions and validating models is to exploit efficient tools for textual analysis, able to adapt to large volumes of heterogeneous data, often changing and of distributed nature. We propose in this communication to look at new statistical methods that fit into this framework but that can also extent their application range to the more general context of dynamic numerical data.

For that purpose, we have recently proposed an alternative metric based on feature maximization. The principle of this metric is to define a measure of compromise between generality and discrimination based altogether on the properties of the data which are specific to each group of a partition and on those which are shared between groups. One of the key advantages of this method is that it is operational in an incremental mode both on clustering (i.e. unsupervised classification) and on traditional categorization. We have shown that it allowed to very efficiently solve complex multidimensional problems related to unsupervised analysis of textual or linguistic data, like topic tracking with data changing over time or automatic classification in natural language processing (NLP) context. It can also adapt to the traditional discriminant analysis, often exploited in text mining, or to automatic text indexing or summarization, with performance that are far superior to conventional methods. In a more general way, this approach that freed from the exploitation of parameters can be exploited as an accurate feature selection and data resampling method in any numerical or non numerical context.

We will present the general principles of feature maximization and we will especially return to its successful applications in the supervised framework, comparing its performance with those of the state of the art methods on reference databases.

Jean-Charles Lamirel

Generating Ontologies from Relational Data with Fuzzy-Syllogistic Reasoning

Existing standards for crisp description logics facilitate information exchange between systems that reason with crisp ontologies. Applications with probabilistic or possibilistic extensions of ontologies and reasoners promise to capture more information, because they can deal with more uncertainties or vagueness of information. However, since there are no standards for either extension, information exchange between such applications is not generic. Fuzzy-syllogistic reasoning with the fuzzy-syllogistic system

S provides 2048 possible fuzzy inference schema for every possible triple concept relationship of an ontology. Since the inference schema are the result of all possible set-theoretic relationships between three sets with three out of 8 possible fuzzy-quantifiers, the whole set of 2048 possible fuzzy inferences can be used as one generic fuzzy reasoner for quantified ontologies. In that sense, a fuzzy syllogistic reasoner can be employed as a generic reasoner that combines possibilistic inferencing with probabilistic ontologies, thus facilitating knowledge exchange between ontology applications of different domains as well as information fusion over them.

Bora İ. Kumova

Surveying the Versatility of Constraint-Based Large Neighborhood Search for Scheduling Problems

Constraint-based search techniques have gained increasing attention in recent years as a basis for scheduling procedures that are capable of accommodating a wide range of constraints. Among these, the Large Neighborhood Search (

lns

) has largely proven to be a very effective heuristic-based methodology. Its basic optimization cycle consists of a continuous iteration of two steps where the solution is first relaxed and then re-constructed. In Constraint Programming terms, relaxing entails the retraction of some previously imposed constraints, while re-constructing entails imposing new constraints, searching for a better solution. Each iteration of constraint removal and re-insertion can be considered as the examination of a large neighborhood move, hence the procedure’s name. Over the years, LNS has been successfully employed over a wide range of different problems; this paper intends to provide an overview of some utilization examples that demonstrate both the versatility and the effectiveness of the procedure against significantly difficult scheduling benchmarks known in literature.

Riccardo Rasconi, Angelo Oddi, Amedeo Cesta

Database Architectures and Performance

Frontmatter

Query Workload Aware Multi-histogram Based on Equi-width Sub-histograms for Selectivity Estimations of Range Queries

Query optimizer uses a selectivity parameter for estimating the size of data that satisfies a query condition. Selectivity value calculations are based on some representation of attribute values distribution e.g. a histogram. In the paper we propose a query workload aware multi-histogram which contains a set of equi-width sub-histograms. The multi-histogram is designated for single-attribute-based range query selectivity estimating. Its structure is adapted to a 2-dimensional distribution of conditions of last recently processed range queries. The structure is obtained by clustering values of boundaries of query ranges. Sub-histograms’ resolutions are adapted to a variability of a 1-dimensional distribution of attribute values.

Dariusz Rafał Augustyn

Analysis of the Effect of Chosen Initialization Parameters on Database Performance

The paper presents an analysis of the influence of chosen initialization parameters on database efficiency and processing time. Experimental research is performed on a real banking system model which includes data structures characteristic to banking systems - structures supporting bank accounts, and reflecting a typical bank organization. Experimental results are compared with results of a real banking system.

Wanda Gryglewicz-Kacerka, Jarosław Kacerka

Database Under Pressure - Scaling Database Performance Tests in Microsoft Azure Public Cloud

Making changes in production database or changes in database system configuration often requires these changes to be priorly tested in a test system. This also requires to replay the original workload in the test environment by simulating client’s activity on many workstations. In the paper, we show how this task can be realized with the use of many Workload Replay Agents working in Microsoft Azure public cloud. We present model and architecture of widely scalable, cloud-based stress testing environment, called CloudDBMonitor, which allows controlled execution of captured SQL scripts against a specified database in Microsoft SQL Server database management system. The stress testing environment provides the possibility to investigate how the tested database works under a large pressure generated by many simulated clients.

Dariusz Mrozek, Anna Paliga, Bożena Małysiak-Mrozek, Stanisław Kozielski

Comparison Between Performance of Various Database Systems for Implementing a Language Corpus

Data storage and information retrieval are some of the most important aspects when it comes to the development of a language corpus. Currently most corpora use either relational databases or indexed file systems. When selecting a data storage system, most important facts to consider are the speeds of data insertion and information retrieval. Other than the aforementioned two approaches, currently there are various database systems which have different strengths that can be more useful. This paper compares the performance of data storage and retrieval mechanisms which use relational databases, graph databases, column store databases and indexed file systems for various steps such as inserting data into corpus and retrieving information from it, and tries to suggest an optimal storage architecture for a language corpus.

Dimuthu Upeksha, Chamila Wijayarathna, Maduranga Siriwardena, Lahiru Lasandun, Chinthana Wimalasuriya, N. H. N. D. de Silva, Gihan Dias

A Comparison of Different Forms of Temporal Data Management

Recently, the ANSI committee for the standardization of the SQL language has published the specification for temporal data support. This new ability allows users to create and manipulate temporal data in a significantly simpler way instead of implementing the same features using triggers and database applications. In this article we examine the creation and manipulation of temporal data using built-in temporal logic and compare its performance with the performance of equivalent hand-coded applications. For this study, we use an existing commercial database system, which supports the standardized temporal data model.

Florian Künzner, Dušan Petković

Performance Aspects of Migrating a Web Application from a Relational to a NoSQL Database

There are many studies which discuss the problem of using various NoSQL databases and compare their efficiency thus confirming their usefulness and performance quality. However, there are very few studies dealing with the problem of replacing data storage for systems currently working. This lack has become a motivating factor to examine how difficult and laborious it is to move an existing, regularly used application, based on the relational environment to a non-relational data structure. The difficulty of carrying out a data migration process, the scope of changes which would have to be done in the existing environment and the efficiency of an application while using new data structures were considered in the presented research. As an example one of on–line games, being a good representative for popular web applications, was chosen.

Katarzyna Harezlak, Robert Skowron

A Consensus Quorum Algorithm for Replicated NoSQL Data

We propose an algorithm, called Lorq, for managing NoSQL data replication. Lorq is based on consensus quorum approach and is focused on replicating logs storing update operations. Read operations can be performed on different levels of consistency (from strong to eventual consistency), realizing so-called service level agreements (SLA). In this way the trade-off among availability/latency, partition tolerance and consistency is considered. We discuss correctness of Lorq and its importance in developing modern information systems based on geo-replication and cloud computing.

Tadeusz Pankowski

Preserving Data Consistency in Scalable Distributed Two Layer Data Structures

The scalable NoSQL systems are often the best solutions to store huge amount of data. Despite that they in vast majority do not provide some features known from database systems (like transactions) they are suitable for many applications. However, in systems that require data consistency, e.g. payments or Internet booking, the lack of transactions is still noticeable. Recently the need of transactions in such data store systems can be observed more and more often. The Scalable Distributed Two Layer Data Structures (SD2DS) are very efficient implementation of NoSQL system. This paper exposes its basic inconsistency problems. We propose simple mechanisms which will be used to introduce consistency in SD2DS.

Adam Krechowicz, Stanisław Deniziak, Grzegorz Łukawski, Mariusz Bedla

Modern Temporal Data Models: Strengths and Weaknesses

Time is generally a challenging task. All issues in relation to time can be better supported using a temporal data model than implementing them in user applications. More than two-dozen temporal data models have been introduced in time period between 1982 and 1998. After several years of stagnancy, the last couple of years brought the new revival of the topic and the emergence of new data models. Two temporal data models have been specified recently. The one is the SQL:2011 standard, published in 2011. The second one is from Teradata. In this article we present the temporal data model of the ANSI SQL standard on one side and the data model of an existing relational DBMS on the other. After that, we compare their support of several temporal concepts. Finally, we discuss strengths and weaknesses of both models and give suggestions for future extensions.

Dusan Petković

Multiargument Relationships in Possibilistic Databases

In the paper we consider the possible coexistence of associations between

attributes of the

-ary relationship. The analysis is carried out using the theory of functional dependencies. We assume that attribute values are represented by means of possibility distributions. According to the representation of data the notion of fuzzy functional dependency has been appropriately extended. Its level is evaluated with the use of possibility and necessity measures. The dependencies between all

attributes describe the integrity constraints of the

-ary relationship and must not be infringed. They constitute a restriction for dependencies of fewer attributes. The paper formulates the rules to which fuzzy functional dependencies between (

-1) attributes of the

-ary relationship must be subordinated.

Krzysztof Myszkorowski

Data Integration, Storage and Data Warehousing

Frontmatter

Data Sharing and Exchange: General Data-Mapping Semantics

Traditional data sharing and exchange solved the problem of exchanging data between applications that store same information using different vocabularies. We discuss in this paper the data sharing and exchange problem between applications that store

data which do not necessarily possess the same meaning. We first consider this problem in settings where source instances are

complete

– that is, do not contain unknown data. Then we address more collaborative scenarios where peers can store incomplete information. We define the semantics of these settings, and we provide the data complexity for generating solutions and the minimal among those. Also, we distinguish between

sound

and

complete

certain answers as semantics for conjunctive query answering.

Rana Awada, Iluju Kiringa

A Universal Cuboid-Based Integration Architecture for Polyglotic Querying of Heterogeneous Datasources

Fortunately, the industry has eventually abandoned the old “one-size fits all” relational dream and started to develop task-oriented storage solutions. Nowadays, in a big project a devotion to a single persistence mechanism usually leads to suboptimal architectures. A combination of appropriate storage engines is often the best solution. However, such a combination implies a significant growth of data integrity maintenance. In this paper we describe a solution to this problem, i.e. a cuboid-based universal integration architecture. It allows hiding the peculiarities of integration so that it is transparent to the application programmer. We use graphs as an example of data that needs a task-oriented database in order to be efficiently processed. We show how graph queries can be effectively executed with the help of a graph database assisting a relational database. The proposed solution does not impose any additional complexity for programmers.

Michał Chromiak, Piotr Wiśniewski, Krzysztof Stencel

Efficient Multidisk Database Storage Configuration

Simple database storage configuration includes block device and filesystem. In advanced multi-disk database storage configuration also disk manager is required. Article presents multi-disk storage configuration impact on relational database performance. Research results include various local multi-disk storage space configuration scenarios for database cluster in modern database management systems with popular disk managers like software RAID and standard or thin provisioned logical volume equipped with disk allocation policy. The research conclusions facilitate the local storage space configuration for efficient transaction processing in relational databases.

Mateusz Smolinski

E–LT Concept in a Light of New Features of Oracle Data Integrator 12c Based on Data Migration within a Hospital Information System

The paper, presents an approach to developing and running ETL processes implemented in Oracle Data Integrator (ODI), on the example of a hospital information system. Thanks to inversion of the classical order of ETL stages to Extract-Load-Transform (ELT) sequence, ODI simplifies and improves efficiency of most common ETL tasks. Several new features introduced in 12c version positively affects productivity, efficiency and functionality as well.

Lukasz Wycislik, Dariusz Rafał Augustyn, Dariusz Mrozek, Ewa Pluciennik, Hafed Zghidi, Robert Brzeski

Automating Schema Integration Technique Case Study: Generating Data Warehouse Schema from Data Mart Schemas

The schema integration technique offers the possibility to unify the representation of several schemas into one global schema. In this work, we present two contributions. The first one is about automating this technique to reduce human intervention. The second one is about applying this technique to generate data warehouse schema from data mart schemas. To response to our goals, we propose a new methodology that is composed by schema matching and schema mapping. The first technique compares the elements of the two schemas using a new semantic measure to generate the mapping rules. The second one transforms the mapping rules into queries and applies them to ensure the automatic merging of the schemas.

Nouha Arfaoui, Jalel Akaichi

Proposal of a New Data Warehouse Architecture Reference Model

A common taxonomy of data warehouse architectures comprises five basic approaches: Centralized, Independent Data Mart, Federated, Hub-and-Spoke and Data Mart Bus. However, for many real world cases, an applied data warehouse architecture can be their combination. In this paper we propose a Data Warehouse Architecture Reference Model (DWARM), which unifies known architectural styles and provides options for adaptation to fit particular purposes of a developed data warehouse system. The model comprises 11 layers grouping containers (data stores, sources and consumers), as well as processes, covering typical functional groups: ETL, data storage, data integration and delivery. Actual data warehouse architecture can be obtained by tailoring (removing unnecessary components) and instantiating (creating required layers and components of a given type).

Dariusz Dymek, Wojciech Komnata, Piotr Szwed

DWARM: An Ontology of Data Warehouse Architecture Reference Model

This paper describes DWARM, an ontology formalizing a new data warehouse architecture reference model intended do capture common five architectural approaches, as well as to provide means for describing complex hybrid architectures that emerge due to observed evolution of business and technology. The ontology defines concepts, e.g. layers, processes, containers and property classes, as well as relations that can be used to construct precise architectural models. Formalization of architecture description as an ontology gives an opportunity to perform automatic or semiautomatic validation and assessment.

Piotr Szwed, Wojciech Komnata, Dariusz Dymek

Ontologies and Semantic Web

Frontmatter

Ontology Learning from Relational Database: How to Label the Relationships Between Concepts?

Developing ontology for modeling the universe of a Relational Database (RDB) is a key success for many RDB related domains, including semantic-query of RDB, Linked Data and semantic interoperability of information systems. However, the manual development of ontology is a tedious task, error-prone and requires much time. The research field of ontology learning aims to provide (semi-) automatic approaches for building ontology. However, one big challenge in the automatic transformation, is how to label the relationships between concepts. This challenge depends heavily on the correct extraction of the relationship types. In fact, the RDB model does not store the meaning of relationships between entities, it only indicates the existence of a link between them. This paper suggests a solution consisting of a meta-model for the semantic enrichment of the RDB model and of a classification of relationships. A case study shows the effectiveness of our approach.

Bouchra El Idrissi, Salah Baïna, Karim Baïna

Integration of Facebook Online Social Network User Profiles into a Knowledgebase

The article describes attempts made to integrate variety of user’s data available on Facebook online social network. The source of the data are user profiles, publicly available for other Facebook users, which contain data such as visited places, favorite sport teams, TV programs, watched movies, read books and other

likes

. The destination of integrated data is FOAF ontology adopted for integration purposes. The work presents the required FOAF ontology extensions and an approach to Facebook data extraction as a contribution. Also the query and reasoning examples on the created knowledgebase are presented.

Wojciech Kijas, Michał Kozielski

RDF Graph Partitions: A Brief Survey

The paper presents justifications and solutions for RDF graph partitioning. It uses an approach from the classical theory of graphs to deal with this problem. We present four ways to transform an RDF graph to a classical graph. We show how to apply solutions from the theory of graphs to RDF graphs. We also perform an experimental evaluation using the

gpmetis

algorithm (a recognized graph partitioner) on both real and synthetic RDF graphs and prove its practical usability.

Dominik Tomaszuk, Łukasz Skonieczny, David Wood

Artificial Intelligence, Data Mining and Knowledge Discovery

Frontmatter

Optimization of Inhibitory Decision Rules Relative to Coverage – Comparative Study

In the paper, a modification of a dynamic programming algorithm for optimization of inhibitory decision rules relative to coverage is proposed. The aim of the paper is to study the coverage of inhibitory decision rules constructed by the proposed algorithm and comparison of coverage of inhibitory rules constructed by a dynamic programming algorithm and greedy algorithm.

Beata Zielosko

Application of the Shapley-Shubik Power Index in the Process of Decision Making on the Basis of Dispersed Medical Data

The paper considers the issues that are related to the process of decision-making that is based on dispersed knowledge. In previous papers the author proposed a dispersed decision support system with a dynamic structure, which is the approach that is used. The novelty, that is analyzed in this paper is the application of a power index in this system. Together with the Shapley-Shubik index, a simple method of determining local decisions has been applied. The purpose of this was to reduce the computational complexity in comparison with the approach that was proposed in the earlier papers. In experiments the situation is considered in which medical data from one domain are collected in many medical centers. We want to use all of the collected data at the same time in order to make a global decisions.

Małgorzata Przybyła-Kasperek

Inference in Expert Systems Using Natural Language Processing

The authors show the real life application of an expert system using queries submitted by the user using natural language. The system is based on polish language. The two stage process (involving data preparation and the inference itself) is proposed in order to complete the inference.

Tomasz Jach, Tomasz Xięski

Impact of Parallel Memetic Algorithm Parameters on Its Efficacy

The vehicle routing problem with time windows (VRPTW) is an NP-hard discrete optimization problem with two objectives—to minimize a number of vehicles serving a set of dispersed customers, and to minimize the total travel distance. Since real-life, commercially-available road network and address databases are very large and complex, approximate methods to tackle the VRPTW became a main stream of development. In this paper, we investigate the impact of selecting two crucial parameters of our parallel memetic algorithm—the population size and the number of children generated for each pair of parents—on its efficacy. Our experimental study performed on selected benchmark problems indicates that the improper selection of the parameters can easily jeopardize the search. We show that larger populations converge to high-quality solutions in a smaller number of consecutive generations, and creating more children helps exploit parents as best as possible.

Miroslaw Blocho, Jakub Nalepa

Data Processing in Immune Optimization of the Structure

The paper is devoted data processing in immune optimization (using artificial immune system - AIS) to selected optimization problems of the structures. The Procedure for the Exchange of Data - PED in immune optimization is presented. During this procedure important information about design variables and the objective function are saved in specific files. Additional commercial software

MSC

Patran

and

Nastran

to analyze of mechanical structures is used.

Arkadiusz Poteralski

A Prudent Based Approach for Customer Churn Prediction

This study contributes to formalize a three phase customer churn prediction technique. In the first phase, a supervised feature selection procedure is adopted to select the most relevant subset of features by laying-off the redundancy and increasing the relevance that leads to reduced and highly correlated features set. In the second phase, a knowledge based system (KBS) is built through Ripple Down Rule (RDR) learner which acquires knowledge about seen customer churn behavior and handles the problem of brittle in churn KBS through prudence analysis that will issue a prompt to the decision maker whenever a case is beyond the maintained knowledge in the knowledge database. In the final phase, a technique for Simulated Expert (SE) is proposed to evaluate the Knowledge Acquisition (KA) in KB system. Moreover, by applying the proposed approach on publicly available dataset, the results show that the proposed approach can be a worthy alternate for churn prediction in telecommunication industry.

Adnan Amin, Faisal Rahim, Muhammad Ramzan, Sajid Anwar

Forecasting Daily Urban Water Demand Using Dynamic Gaussian Bayesian Network

The objective of the presented research is to create effective forecasting system for daily urban water demand. The addressed problem is crucial for cost-effective, sustainable management and optimization of water distribution systems. In this paper, a dynamic Gaussian Bayesian network (DGBN) predictive model is proposed to be applied for the forecasting of a hydrological time series. Different types of DGBNs are compared with respect to their structure and the corresponding effectiveness of prediction. First, it has been found that models based on the automatic learning of network structure are not the most effective, and they are outperformed by models with the designed structure. Second, this paper proposes a simple but effective structure of DGBN. The presented comparative experiments provide evidence for the superiority of the designed model, which outperforms not only other DGBNs but also other state-of-the-art forecasting models.

Wojciech Froelich

Parallel Density-Based Stream Clustering Using a Multi-user GPU Scheduler

With the emergence of advanced stream computing architectures, their deployment to accelerate long-running data mining applications is becoming a matter of course. This work presents a novel design concept of the stream clustering algorithm

DenStream

, based on a previously presented scheduling framework for GPUs. By means of our scheduler

OCLSched

DenStream

runs together with general computation tasks in a multi-user computing environment, sharing the GPU resources. A major point of concern throughout this paper has been to disclose the functionality and purposes of the applied scheduling methods, and to demonstrate the

OCLSched

’s ability of managing highly complex applications in a multi-task GPU environment. Also in terms of performance, our tests show reasonable improvements when comparing the proposed parallel concept of

DenStream

with a single-threaded CPU version.

Ayman Tarakji, Marwan Hassani, Lyubomir Georgiev, Thomas Seidl, Rainer Leupers

Image Analysis and Multimedia Mining

Frontmatter

Lossless Compression of Medical and Natural High Bit Depth Sparse Histogram Images

In this paper we overview histogram packing methods and focus on an off-line packing method, which requires encoding the original histogram along with the compressed image. For a diverse set containing medical MR, CR and CT images as well as various natural 16-bit images, we report histogram packing effects obtained for several histogram encoding methods. The histogram packing improves significantly JPEG2000 and JPEG-LS lossless compression ratios of high bit depth sparse histogram images. In case of certain medical image modalities the improvement may exceed a factor of two, which indicates that histogram packing should be exploited in medical image databases as well as in medical picture archiving and communication systems in general as it is both highly advantageous and easy to apply.

Roman Starosolski

Query by Shape for Image Retrieval from Multimedia Databases

Efficient methods of image retrieval is one of the most important challenges in the scope of the management of large multimedia databases. Existing methods for querying, based on a textual description e.g. keywords or based on image content, are not sufficient for the most applications. Methods based on semantic features are more suitable. In this paper we propose a new query by shape (QS) method for image retrieval from multimedia databases. Each image in the database is represented as a set of graphical objects, which are specified using graphical primitives like lines, circles, polygons etc. To retrieve images containing the given object, the object shape should be provided. Next, the efficient algorithm for testing the similarity of shapes is applied. The preliminary results showed the high effectiveness of the QS method.

Stanisław Deniziak, T. Michno

Real-Time People Counting from Depth Images

In this paper, we propose a real-time algorithm for counting people from depth image sequences acquired using the Kinect sensor. Counting people in public vehicles became a vital research topic. Information on the passenger flow plays a pivotal role in transportation databases. It helps the transport operators to optimize their operational costs, providing that the data are acquired automatically and with sufficient accuracy. We show that our algorithm is accurate and fast as it allows 16 frames per second to be processed. Thus, it can be used either in real-time to process traffic information on the fly, or in the batch mode for analyzing very large databases of previously acquired image data.

Jakub Nalepa, Janusz Szymanek, Michal Kawulok

PCA Application in Classification of Smiling and Neutral Facial Displays

Psychologists claim that the majority of inter-human communication is transferred non-verbally. The face and its expressions are the most significant channel of this kind of communication. In this research the problem of recognition between smiling and neutral facial display is investigated. The set-up consisting of proper local binary pattern operator supported with an image division schema and PCA for feature vector length diminishing is presented. The achieved results using k-nearest neighbourhood classifier are compared on a couple of image datasets to prove successful application of this approach.

Karolina Nurzynska, Bogdan Smolka

Detection of Tuberculosis Bacteria in Sputum Slide Image Using Morphological Features

Automatic finding of tuberculosis bacteria is a medicinal imaging issue that includes the utilization of machine vision strategies. The manual technique for the detection of tuberculosis bacteria is costly, required much time and qualified persons to prevent errors. In this paper a new detection technique is presented. This technique is based on morphological features of bacteria object. The identification process is carried out using eccentricity, bounding box, area and aspect ratio

abstract

environment.

Zahoor Jan, Muhammad Rafiq, Hayat Muhammad, Noor Zada

Automatic Medical Objects Classification Based on Data Sets and Domain Knowledge

This paper describes the approach for automatic identifying organs from a medical CT imagery. Main assumption of this approach is the use of data sets and domain knowledge. We apply this approach to automatic classification of chest organs (trachea, lungs, bronchus) and present the results to demonstrate their usefulness and effectiveness. The paper includes the results of experiments that have been performed on medical data obtained from II Department of Internal Medicine, Jagiellonian University Medical College, Krakow, Poland. The experimental results showed that the approach is promising and can be used in the future to support solving more complex medical problems.

Przemyslaw Wiktor Pardel, Jan G. Bazan, Jacek Zarychta, Stanislawa Bazan-Socha

Spatial Data Analysis

Frontmatter

Interpolation as a Bridge Between Table and Array Representations of Geofields in Spatial Databases

Development of database technology facilitates wider integration of diverse data types, which in turn increases opportunities to ask ad hoc queries, and gives new possibilities of declarative queries optimization. For more than a decade, work on supporting multidimensional arrays in databases has been carried out, which led to such DBMSs as rasdaman, SciDB and SciQL. However, the DBMSs lack the ability to handle queries concerning geographic phenomena varying continuously over space (called geofields) which were measured in irregularly distributed nodes (e.g. air pollution). This paper addresses this issue by presenting an extension of SQL making possible to write declarative queries referencing geofields, called geofield queries. Geofield query optimization opportunities are also shortly discussed.

Piotr Bajerski

A Proposal of Hybrid Spatial Indexing for Addressing the Measurement Points in Monitoring Sensor Networks

One of the important features of data analysis methods in the area of continuous surveillance systems is a computation time. This article contains a research that is focused on improving the performance of processing by the most efficient possible indexation of spatial data. The authors proposed a structure of indexes implementation based on layered grouping of sensors, so as to reduce the amount of data in time windows. This allows to compare data at the layer-layer level, thereby reducing the problem of comparisons between all sensors.

Michał Lupa, Monika Chuchro, Adam Piórkowski, Anna Pięta, Andrzej Leśniak

An Attempt to Automate the Simplification of Building Objects in Multiresolution Databases

The paper presents a method for the simplification of building objects in multiresolution databases. The authors present a theoretical foundation, practical ways to implement the method, examples of results, as well as a comparison with currently available generalization methods in commercial software. This algorithm allows the verifiability and reproducibility of results to be kept while minimising graphic conflicts, which are a major problem during the automatic generalisation process. These results are achieved by defining the shape of buildings, employing classification rules and adopting minimum measures of recognition on a digital map. Solutions included in this paper are universal and can successfully be used as a component in any automated cartographic generalisation process. Moreover, these methods will help to get closer to full automation of the data generalisation process and hence the automatic production of digital maps.

Michał Lupa, Krystian Kozioł, Andrzej Leśniak

Database Systems Development

Frontmatter

Motivation Modeling and Metrics Evaluation of IT Architect Certification Programs

The alignment of university curricula to the needs of the IT industry is a great challenge which needs analysis of various different aspects. IT architecture competencies and skills are very important to parties such as the IT industry, course providers, universities, and of course students. In this paper IT architect certification programs are analyzed as they need to be well-aligned to the needs of the industry. The range of IT architect certification programs on todays market is vast and rather complicated. This article describes a new lightweight method for quickly evaluating IT architect certifications using specially developed data collection methods. The method concentrates on non-domain certification features and introduces metrics which can be used to compare programs with each other. Broad research has been done to identify the most important domain-independent features of certificate programs for IT architects on the employee market today. These features have been selected, evaluated and combined into a metrics formula using a specially developed automated data mining process. The metrics can also be automatically updated in a process called “self-adaptation” after a specified period of time. The whole process assumes that the highest-ranking certificate programs from a previous time period can be used as a reference for establishing domain-independent features in the next period. Each certificate program can currently be evaluated only once per period based on the reference. The proposed solution will deliver a powerful tool for IT architect skill comparison, especially when there are many job candidates with different sets of certification documents to be assessed. The research results are currently being used to design architecture courses at the IT Architecture Academy at the AGH Univ. of Science and Technology.

Michal Turek, Jan Werewka

Supporting Code Review by Automatic Detection of Potentially Buggy Changes

Code reviews constitute an important activity in software quality assurance. Although they are essentially based on human expertise and scrupulosity, they can also be supported by automated tools. In this paper we present such a solution integrated with code review tools. It is based on a SVM classifier that indicates potentially buggy changes. We train such a classifier on the history of a project. In order to construct a training set, we assume that a change/commit is buggy if its modifications has been later altered by a bug-fix commit. We evaluated our approach on 77 selected projects taken from GitHub and achieved promising results. We also assessed the quality of the resulting classifier depending on the size of a project and the fraction of the history of a project that have been used to build the training set.

Mikołaj Fejzer, Michał Wojtyna, Marta Burzańska, Piotr Wiśniewski, Krzysztof Stencel

Project Management in the Scrum Methodology

Taking into account the current leading role of techniques based on the incremental-iterative programming, the system that allows the optimization of project teams developing software in the Agile methodology Scrum technique, is proposed in the paper. Presented tool automates the process of development project management. It is distinguished from the other tools of the same class, by the ability of creation the workflow tasks that allows to solve the problems associated with information and communication in the project team and the kanbanboard visibility that demarcates whether the task is in the process of implementation or in testing. In addition, it implements the algorithm of Sprints’ generating and archiving.

Maria Tytkowska, Aleksandra Werner, Małgorzata Bach

Applications of Database Systems

Frontmatter

Regression Rule Learning for Methane Forecasting in Coal Mines

The rule-based approach to methane concentration prediction is presented in this paper. The applied solution is based on the modification called

fixed

of the separate-and-conquer rule induction approach. We also proposed the modification of a rule quality evaluation based on confidence intervals calculated for positive and negative examples covered by the rule. The characteristic feature of the considered methane forecasting model is that it omits the readings of the sensor being the subject of forecasting. The approach is evaluated on a real life data set acquired during a week in a coal mine. The results show the advantages of the introduced method (in terms of both the prediction accuracy and knowledge extraction) in comparison to the standard approaches typically implemented in the analytical tools.

Michał Kozielski, Adam Skowron, Łukasz Wróbel, Marek Sikora

On Design of Domain-Specific Query Language for the Metallurgical Industry

Many systems have to choose between user-friendly visual query editor and textual querying language with industrial strength in order to deal with big amount of complex data. Some of them provide both ways of accessing data in a warehouse.

In this paper, we present key features of domain specific querying language, which we designed as a part of informational system of a steel production plant. This language aims to give an opportunity of easy data manipulation to those who know what the data actually is and to provide an easy way to discover what the data is for others. We also provide an evaluation of the designed language. The main focus of our estimation was to measure effort required for discovering dataset and deriving simple math expressions.

Although the paper overviews a data model for one specific domain, it can be easily applied for different domains.

Andrey Borodin, Yuri Kiselev, Sergey Mirvoda, Sergey Porshnev

Approach to the Monitoring of Energy Consumption in Eco-grinder Based on ABC Optimization

This article is a part of the series dedicated to AI Methods Inspired by Nature and their implementation in the mechatronic systems. The Artificial Bee Colony (ABC) enables the optimization of the consumption of power supplied from photovoltaic cells. The paper includes a few implementations of ABC. Special emphasis was put on maintaining proper energetic balance, and on monitoring of power demand as well as energy sources used for the comminution. The ecological grinder was designed based on the autonomic unit. It can be called autonomous, because it does not need external control. The built-in computer system ensures monitoring and visualization of the current state of the energetic balance.

Jacek Czerniak, Dawid Ewald, Marek Macko, Grzegorz Śmigielski, Krzysztof Tyszczuk

MercuryDb - A Database System Supporting Management and Limitation of Mercury Content in Fossil Fuels

Mercury is commonly known as a harmful element for environment and human beings. Due to the high mobility and lack of degradation in environment, reduction of mercury emission became one of the main fields of limitation in its adverse impact on environment and health. The database systems, addressed in gathering, transforming and disseminating information about mercury sources, ways of transportation in environment and technologies used to limit its influence, can be a useful management supporting tool. It can help scientists, stakeholders and decision makers in straggle to restrict mercury influence on people’s lives and environment.

In the article a concept and design of database system dedicated to support management of mercury content in fossil fuels and energy sector wastes was presented. The solutions chosen and being implemented to provide users with ability to store measurements of mercury and accompanying species as well as sample material characteristic were described. The structure of object data model and its mapping into relational database structure is given. Design of software the developed is described and detailed solutions improving flexibility in data management and exploration are depicted. The article summarizes the use of database system as a support in toxic substances content management.

Sebastian Iwaszenko, Karolina Nurzynska, Barbara Białecka

Liquefied Petroleum Storage and Distribution Problems and Research Thesis

The greatest threat to the environment and aquatic life is an uncontrolled fuel leakage, which is also extremely hazardous to health and safety of people. Guaranteeing the reliability of a leak detection system is probably the ultimate purpose of fuel management systems. However, there are more problems that ought to be solved before or simultaneously with detecting possible outflows of fuel products. In this paper we highlight major research opportunities consistent with wetstock management and statistical inventory reconciliation. The main goal is to outline thesis on the nature and impact of numerous phenomena on the inventory reconciliation methods. Issues considered in this paper include but are not limited to sensor miscalibration, data acquisition, and transmission problems as well as leak detection from both, tanks and connected pipeline.

Marcin Gorawski, Anna Gorawska, Krzysztof Pasterak

Idea of Impact of ERP-APS-MES Systems Integration on the Effectiveness of Decision Making Process in Manufacturing Companies

The aim of this article is to present an analysis of the impact of ERP-APS-MES systems integration on decision-making process in the manufacturing company of global nature. As a part of the article the flow of data in the integrated ERP, APS, and MES systems and including of these data in Business Intelligence systems were analyzed. Two types of BI systems have been considered, the first classic, based on the search technologies and processing of OLAP data, the second type are the systems of “In-Memory” class. There has been also described the impact of ERP APS and MES systems integration, on the efficiency of business management at all levels: strategic, tactical and operational. The approach proposed by the authors aims to improve significantly the reliability of data. As a consequence this will reduce the risk of erroneous data used in planning and evaluation of implementation processes. According to the authors the integration of these systems will significantly increase the effectiveness of management in the manufacturing company.

Edyta Kucharska, Katarzyna Grobler-Dębska, Jarosław Gracel, Mieczysław Jagodziński

A Concept of Decision Support in Supply Chain Management – A Hybrid Approach

This paper describes the hybrid approach to optimization of decision problems in supply chain management (SCM). The hybrid approach proposed here combines the strengths of mathematical programming (MP) and constraint logic programming (CLP), which leads to a significant reduction in the search time necessary to find the optimal solution, and allows solving larger problems. The proposed hybrid approach is presented as a concept of an additional layer of decision-making in integrated systems, for example ERP, DRP, etc. This solution allows the implementation of complete decision-making models, additional constraints as well as a set of questions for these models. The illustrative model presented in the article illustrate the advantages of the approach.

Paweł Sitek

eProS – A Bioinformatics Knowledgebase, Toolbox and Database for Characterizing Protein Function

Proteins are macromolecules that facilitate virtually every biological process. Information on functional and structural characteristics of proteins is invaluable in life sciences, but remain difficult to obtain, both computationally and experimentally.

In recent work, we have introduced a novel method for functional characterization, which we refer to as protein energy profiling. The eProS (energy profile suite) is an online knowledgebase, toolbox and database that provides a webspace for protein energy profile analyses to the scientific community. The objective of eProS is to offer a free-for-all repository of energy profile data, annotations, visualizations, as well as tools that can aid in deducing relations complementing and supporting findings made by traditional bioinformatics methods.

In this paper, we discuss the underlying biological and theoretical backgrounds used by implemented methods and tools, and also introduce recent enhancements and developments.

eProS is available at http://bioservices.hs-mittweida.de/Epros.

Florian Heinke, Daniel Stockmann, Stefan Schildbach, Mathias Langer, Dirk Labudde

Evaluation Criteria for Affect-Annotated Databases

In this paper a set of comprehensive evaluation criteria for affect-annotated databases is proposed. These criteria can be used for evaluation of the quality of a database on the stage of its creation as well as for evaluation and comparison of existing databases. The usefulness of these criteria is demonstrated on several databases selected from affect computing domain. The databases contain different kind of data: video or still images presenting facial expressions, speech recordings and affect-annotated words.

Agata Kolakowska, Agnieszka Landowska, Mariusz Szwoch, Wioleta Szwoch, Michal R. Wrobel

The Mask: A Face Network System for Bell’s Palsy Recovery Surveillance

Bell’s palsy is a sudden disease and very complex to diagnose. Doctors and physicians are required to be more agile and to be able to adapt quickly and efficiently to such disease. Mostly, doctors opt for the electromyography technique but the low quality services makes its realization difficult. To cope with this major problem, we provide a home nursing system. Our solution consists of modeling the EDA of Bells palsy system and proposing a conceptual framework that can guide research by providing a visual representation of theoretical aspects. However, a model should be good enough to be useful. To reach that, we present a mask for the face network system to detect and supervise Bells palsy.

Hanen Bouali, Jalel Akaichi

Backmatter

Titel: Beyond Databases, Architectures and Structures
herausgegeben von: Stanisław Kozielski
Dariusz Mrozek
Paweł Kasprowski
Bożena Małysiak-Mrozek
Daniel Kostrzewa
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-18422-7
Print ISBN: 978-3-319-18421-0
DOI: https://doi.org/10.1007/978-3-319-18422-7

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Invited Papers

Frontmatter

New Metrics and Related Statistical Approaches for Efficient Mining in Very Large and Highly Multidimensional Databases

Generating Ontologies from Relational Data with Fuzzy-Syllogistic Reasoning

Surveying the Versatility of Constraint-Based Large Neighborhood Search for Scheduling Problems

Database Architectures and Performance

Frontmatter

Query Workload Aware Multi-histogram Based on Equi-width Sub-histograms for Selectivity Estimations of Range Queries

Analysis of the Effect of Chosen Initialization Parameters on Database Performance

Database Under Pressure - Scaling Database Performance Tests in Microsoft Azure Public Cloud

Comparison Between Performance of Various Database Systems for Implementing a Language Corpus

A Comparison of Different Forms of Temporal Data Management

Performance Aspects of Migrating a Web Application from a Relational to a NoSQL Database

A Consensus Quorum Algorithm for Replicated NoSQL Data

Preserving Data Consistency in Scalable Distributed Two Layer Data Structures

Modern Temporal Data Models: Strengths and Weaknesses

Multiargument Relationships in Possibilistic Databases

Data Integration, Storage and Data Warehousing

Frontmatter

Data Sharing and Exchange: General Data-Mapping Semantics

A Universal Cuboid-Based Integration Architecture for Polyglotic Querying of Heterogeneous Datasources

Efficient Multidisk Database Storage Configuration

E–LT Concept in a Light of New Features of Oracle Data Integrator 12c Based on Data Migration within a Hospital Information System

Automating Schema Integration Technique Case Study: Generating Data Warehouse Schema from Data Mart Schemas

Proposal of a New Data Warehouse Architecture Reference Model

DWARM: An Ontology of Data Warehouse Architecture Reference Model

Ontologies and Semantic Web

Frontmatter

Ontology Learning from Relational Database: How to Label the Relationships Between Concepts?

Integration of Facebook Online Social Network User Profiles into a Knowledgebase

RDF Graph Partitions: A Brief Survey

Artificial Intelligence, Data Mining and Knowledge Discovery

Frontmatter

Optimization of Inhibitory Decision Rules Relative to Coverage – Comparative Study

Application of the Shapley-Shubik Power Index in the Process of Decision Making on the Basis of Dispersed Medical Data

Inference in Expert Systems Using Natural Language Processing

Impact of Parallel Memetic Algorithm Parameters on Its Efficacy

Data Processing in Immune Optimization of the Structure

A Prudent Based Approach for Customer Churn Prediction

Forecasting Daily Urban Water Demand Using Dynamic Gaussian Bayesian Network

Parallel Density-Based Stream Clustering Using a Multi-user GPU Scheduler

Image Analysis and Multimedia Mining

Frontmatter

Lossless Compression of Medical and Natural High Bit Depth Sparse Histogram Images

Query by Shape for Image Retrieval from Multimedia Databases

Real-Time People Counting from Depth Images

PCA Application in Classification of Smiling and Neutral Facial Displays

Detection of Tuberculosis Bacteria in Sputum Slide Image Using Morphological Features

Automatic Medical Objects Classification Based on Data Sets and Domain Knowledge

Spatial Data Analysis

Frontmatter

Interpolation as a Bridge Between Table and Array Representations of Geofields in Spatial Databases

A Proposal of Hybrid Spatial Indexing for Addressing the Measurement Points in Monitoring Sensor Networks

An Attempt to Automate the Simplification of Building Objects in Multiresolution Databases

Database Systems Development

Frontmatter

Motivation Modeling and Metrics Evaluation of IT Architect Certification Programs

Supporting Code Review by Automatic Detection of Potentially Buggy Changes

Project Management in the Scrum Methodology

Applications of Database Systems

Frontmatter

Regression Rule Learning for Methane Forecasting in Coal Mines

On Design of Domain-Specific Query Language for the Metallurgical Industry

Approach to the Monitoring of Energy Consumption in Eco-grinder Based on ABC Optimization

MercuryDb - A Database System Supporting Management and Limitation of Mercury Content in Fossil Fuels

Liquefied Petroleum Storage and Distribution Problems and Research Thesis

Idea of Impact of ERP-APS-MES Systems Integration on the Effectiveness of Decision Making Process in Manufacturing Companies

A Concept of Decision Support in Supply Chain Management – A Hybrid Approach

eProS – A Bioinformatics Knowledgebase, Toolbox and Database for Characterizing Protein Function

Evaluation Criteria for Affect-Annotated Databases

The Mask: A Face Network System for Bell’s Palsy Recovery Surveillance

Backmatter

Premium Partner