main-content

## Über dieses Buch

The LNCS journal Transactions on Large-Scale Data- and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing (e.g., computing resources, services, metadata, data sources) across different sites connected through networks has led to an evolution of data- and knowledge-management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability.

This, the 43rd issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, contains five revised selected regular papers. Topics covered include classification tasks, machine learning algorithms, top-k queries, business process redesign and a knowledge capitalization framework.

## Inhaltsverzeichnis

### Role-Based Access Classification: Evaluating the Performance of Machine Learning Algorithms

Abstract
The analysis of relational database access for the purpose of audit and anomaly detection can be based on the classification of queries according to user roles. One such approach is DBSAFE, a database anomaly detection system, which uses a Naïve Bayes classifier to detect anomalous queries in Role-based Access Control (RBAC) environments. We propose to consider the usual machine learning algorithms for classification tasks: K-Nearest Neighbours, Random Forest, Support Vector Machine and Convolutional Neural Network, as alternatives to DBSAFE’s Naïve Bayes classifier. We identify the need for an effective representation of the input to the classifiers. We propose the utilisation of a query embedding mechanism with the classifiers. We comparatively and empirically evaluate the performance of different algorithms and variants with two benchmarks: the comprehensive off-the-shelf OLTP-Bench benchmark and a variant of the CH-benCHmark that we extended with hand-crafted user roles for database access classification. The empirical comparative evaluation shows clear benefits in the utilisation of the machine learning tools.
Randy Julian, Edward Guyot, Shaowen Zhou, Geong Sen Poh, Stéphane Bressan

### Top-k Queries over Distributed Uncertain Categorical Data

Abstract
Uncertain data arises in many modern applications including sensor networks, data integration, and information extraction. Often this data is distributed and there is a need to do efficient query processing over the data in situ. We focus on answering top-k queries and propose a distributed algorithm TDUD, to efficiently answer top-k queries over distributed uncertain categorical data in queries single round of communication. TDUD uses a distributed index structure composed of local uncertain indexes (LUIs) on local sites and a single global uncertain index (GUI) on a coordinator site. Our algorithm minimizes the amount of communication needed to answer a top-k query by maintaining the mean sum dispersion of the probability distribution on each site. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed methods in terms of communication costs and response time. We show empirically that TDUD is near-optimal in that it can typically retrieve the top-k query answers by communicating only k tuples in a single round.

### On Knowledge Transfer from Cost-Based Optimization of Data-Centric Workflows to Business Process Redesign

Abstract
Georgia Kougka, Konstantinos Varvoutas, Anastasios Gounaris, George Tsakalidis, Kostas Vergidis

### A New Knowledge Capitalization Framework in the Big Data Context Through Shared Parameters Experiences

Abstract
Knowledge management proves to be inexorable in generating value from disorganized knowledge bases, as well as separating concerns through intelligent knowledge capitalization system in the big data context. Such systems, however, require a long and challenging learning process and complex parameters tuning in order to push the capitalization process forward.
In this paper, a new knowledge capitalization framework is introduced as an adaptive and intelligent technique, acting on top of a distributed system and running on a large scale. This framework is a three-level paradigm in which each knowledge base is modeled as a mixture over an underlying set of knowledge groups. Each group is, in turn, formed as a mixture over a latent set of knowledge entities. Besides, focusing on each model separately and tuning its parameters require more extended time and resources to find the optimal configuration, so the proposed approach uses the shared parameter mechanism driven by the group coherence metric. It relies on this paradigm to increase the model’s quality, improve knowledge entities’ coherence, and advance the groups’ smoothness and density. Results reveal significant and robust consistency amongst different knowledge groups. Additionally, each distributed model is updated three times on average. A straightforward adaptation of each model can lead to an improved model, with an augmentation of $$20\%$$ in the group coherence. Finally, a knowledge retrieval system is developed to verify the appropriateness and efficacy of the formed groups as well as to evaluate the response time and precision.
Badr Hirchoua, Brahim Ouhbi, Bouchra Frikh, Ismail Khalil

### DiNer - On Building Multilingual Disease-News Profiler

Abstract
Disease-News Profiler aims to gather a collection of online news articles containing information related to diseases. A need for such profiler arises in epidemic intelligence where it acts as an information system for diseases. It can be used by health agencies and researchers to track any epidemic or to develop a knowledge base for diseases. Much of the existing profiling techniques have targeted specific languages like English, Arabic, Chinese, Spanish or Russian but have largely ignored many Asian and resource-poor languages. Building a multilingual disease-news profiler has a huge advantage in terms of coverage, timeliness, quality and information enrichment. In this paper we propose a novel system, DiNer for filtering and indexing of Disease-News. We have developed a language agnostic and low-resource based filtering technique which uses a Support Vector Machine based classifier to identify instances of Disease-news from any given news corpus. In this paper, we describe our novel approach of feature engineering and the development of Disease-Related corpus for training our SVM classifier. We have tested our filtering module on four languages - English, Hindi, Punjabi and Gujarati. Our filtering technique performs significantly better than the baseline-approach both in terms of F-Score(>5%) and recall(>50%) across languages.
Sajal Rustagi, Dhaval Patel

### Backmatter

Weitere Informationen