Skip to main content

Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the 8th International Conference on Data Management Technologies and Applications, DATA 2019, held in Prague, Czech Republic, in July 2019. The 8 revised full papers were carefully reviewed and selected from 90 submissions. The papers deal with the following topics: decision support systems, data analytics, data and information quality, digital rights management, big data, knowledge management, ontology engineering, digital libraries, mobile databases, object-oriented database systems, and data integrity.



Explainable and Transferrable Text Categorization

Automated argument stance (pro/contra) detection is a challenging text categorization problem, especially if said arguments are to be detected for new topics. In previous research, we designed and evaluated an explainable machine learning based classifier. It was capable to achieve 96% F1 for argument stance recognition within the same topic and 60% F1 for previously unseen topics, which informed our hypothesis, that there are two sets of features in argument stance recognition: General features and topic specific features. An advantage of the described system is its quick transferability to new problems. Besides providing further details about the developed C3 TFIDF-SVM classifier, we investigate the classifiers effectiveness for different text categorization problems spanning two natural languages. Besides the quick transferability, the generation of human readable explanations about why specific results were achieved is a key feature of the described approach. We further investigate the generated explanation understandability and conduct a survey about how understandable the classifier’s explanations are.
Tobias Eljasik-Swoboda, Felix Engel, Matthias Hemmje

Dealing with Critical Issues in Emails: A Comparison of Approaches for Sentiment Analysis

The customer service of larger companies is constantly faced with the challenge of mastering the daily flood of incoming emails. In particular, the effort involved in dealing with critical issues, such as complaints, and the insufficient resources available to deal with them can have a negative impact on customer relations and thus on the public perception of companies. It is therefore necessary to assess and prioritise these concerns automatically, if possible. It is therefore necessary to evaluate and prioritise these concerns automatically if possible. The sentiment analysis as the automatic recognition of the sentiment in texts enables such prioritisation. The sentiment analysis of German-language e-mails is still an open research problem and till now there is no evidence of a dominant approach in this field. The aim of this article is to compare three approaches for the sentiment analysis of German emails:
The first approach (A) is based on the combination of sentiment lexicons and machine learning methods. The second (B) is the extension of approach A by further feature extraction methods and the third approach (C) is a deep learning approach based on the combination of Word Embeddings and Convolutional Neural Networks (CNN). A gold standard corpus is generated to compare these approaches. Based on this corpus, systematic experiments are carried out in which the different method combinations for the approaches are examined.
The results of the experiments show that the Deep Learning approach is more effective than classical approaches and generates better classification results.
Bernd Markscheffel, Markus Haberzettl

Industry 4.0: Sensor Data Analysis Using Machine Learning

The technological revolution, known as industry 4.0, aims to improve efficiency/productivity and reduce production costs. In the Industry 4.0 based smart manufacturing environment, machine learning techniques are deployed to identify patterns in live data by creating models using historical data. These models will then predict previously undetectable incidents. This paper initially performs a descriptive statistics and visualization, subsequently issues like classification of data with imbalanced class distribution are addressed. Then several binary classification-based machine learning models are built and trained for predicting production line disruptions, although only logistic regression and artificial neural networks are discussed in detail. Finally, it evaluates the effectiveness of the machine learning models as well as the overall utilization of the manufacturing operation in terms of availability, performance and quality.
Nadeem Iftikhar, Finn Ebertsen Nordbjerg, Thorkil Baattrup-Andersen, Karsten Jeppesen

Scalable Architecture, Storage and Visualization Approaches for Time Series Analysis Systems

In order to adapt to the recent phenomenon of exponential growth of time series data sets in both academic and commercial environments, and with the goal of deriving valuable knowledge from this data, a multitude of analysis software tools have been developed to allow groups of collaborating researchers to find and annotate meaningful behavioral patterns. However, these tools commonly lack appropriate mechanisms to handle massive time series data sets of high cardinality, as well as suitable visual encodings for annotated data. In this paper we conduct a comparative study of architectural, persistence and visualization methods that can enable these analysis tools to scale with a continuously-growing data set and handle intense workloads of concurrent traffic. We implement these approaches within a web platform, integrated with authentication, versioning, and locking mechanisms that prevent overlapping contributions or unsanctioned changes. Additionally, we measure the performance of a set of databases when writing and reading varying amounts of series data points, as well as the performance of different architectural models at scale.
Eduardo Duarte, Diogo Gomes, David Campos, Rui L. Aguiar

Optimizing Steering of Roaming Traffic with A-number Billing Under a Rolling Horizon Policy

In this study, we focus on single service steering international roaming traffic (SIRT) problem by considering telecommunication operators’ agreements and “a-number billing” while keeping service quality above a certain threshold. The steering decision is made considering the origin and destination of the call, total volume requirement of bilateral agreements, quality threshold and price quote of partner operators. We develop an optimization model that considers these requirements while satisfying projected demand requirements. We suggest a framework based on rolling horizon mechanism for demand forecasting and policy updating. The results show that the steering cost is decreased approximately 11% with deterministic demand and 10% with forecasted demand compared to the base cost value provided by the company. Also, the model provides approximately 26% decrease in unsatisfied committed volume in agreements.
Ahmet Şahin, Kenan Cem Demirel, Ege Ceyhan, Erinc Albey

A Web-Based Decision Support System for Quality Prediction in Manufacturing Using Ensemble of Regressor Chains

In this study we construct a decision support system (DSS), which utilizes the production process parameters to predict the quality characteristics of final products in two different manufacturing plants. Using the idea of regressor chains, an ensemble method is developed to attain the highest prediction accuracy. Collected data is divided into two sets, namely “normal” and “unusual”, using local outlier factor method. The prediction performance is tested separately for each set. It is seen that the ensemble idea shows its competence especially in situations, where collected data is classified as “unusual”. We tested the proposed method in two different real-life cases: textile manufacturing process and plastic injection molding process. Proposed DSS supports online decisions through live process monitoring screens and provides real time quality predictions, which help to minimize the total number of nonconforming products.
Kenan Cem Demirel, Ahmet Şahin, Erinc Albey

Farm Area Segmentation in Satellite Images Using DeepLabv3+ Neural Networks

Farm detection using low resolution satellite images is an important part of digital agriculture applications such as crop yield monitoring. However, it has not received enough attention compared to high-resolution images. Although high resolution images are more efficient for detection of land cover components, the analysis of low-resolution images are yet important due to the low-resolution repositories of the past satellite images used for timeseries analysis, free availability and economic concerns. In this paper, semantic segmentation of farm areas is addressed using low resolution satellite images. The segmentation is performed in two stages; First, local patches or Regions of Interest (ROI) that include farm areas are detected. Next, deep semantic segmentation strategies are employed to detect the farm pixels. For patch classification, two previously developed local patch classification strategies are employed; a two-step semi-supervised methodology using hand-crafted features and Support Vector Machine (SVM) modelling and transfer learning using the pretrained Convolutional Neural Networks (CNNs). For the latter, the high-level features learnt from the massive filter banks of deep Visual Geometry Group Network (VGG-16) are utilized. After classifying the image patches that contain farm areas, the DeepLabv3+ model is used for semantic segmentation of farm pixels. Four different pretrained networks, resnet18, resnet50, resnet101 and mobilenetv2, are used to transfer their learnt features for the new farm segmentation problem. The first step results show the superiority of the transfer learning compared to hand-crafted features for classification of patches. The second step results show that the model trained based on resnet50 achieved the highest semantic segmentation accuracy.
Sara Sharifzadeh, Jagati Tata, Hilda Sharifzadeh, Bo Tan

About the Fairness of Database Performance Comparisons

Whenever a new database technology appears, several comparisons also come up to attest that the new database technology is better than the traditional relational one. Even more, an outstanding performance is shown quite often by conducting performance comparisons. This paper attempts to illustrate that these performance comparisons should be taken with a pinch of salt. Revisiting published statements about comparisons between the Neo4j graph database and relational systems, we investigate several causes why relational systems show a worse performance. One possible reason is – among others – applying a default database configuration or configuring the system inadequately. Next, most tests are implemented in a straightforward manner, particularly not considering alternatives or applying useful features. In order to support our findings, we use a PostgreSQL database and implement some scenarios that are commonly used in comparisons. Thereby, we invalidate some stated results about the bad performance of relational systems in those scenarios. Concluding the discussion, we present some general considerations how fairness of comparisons can be improved.
Uwe Hohenstein, Martin Jergler


Weitere Informationen

Premium Partner