Skip to main content
main-content

Über dieses Buch

The LNCS journal Transactions on Large-Scale Data- and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing across different sites connected through networks has led to an evolution of data- and knowledge-management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability. Current decentralized systems still focus on data and knowledge as their main resource. Feasibility of these systems relies basically on P2P (peer-to-peer) techniques and the support of agent systems with scaling and decentralized control. Synergy between grids, P2P systems, and agent technologies is the key to data- and knowledge-centered systems in large-scale environments.

This, the 41st issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, contains seven revised, extended papers selected from the 4th International Conference on Future Data and Security Engineering, FDSE 2017, which was held in Ho Chi Minh City, Vietnam, in November/December 2017. The main focus of this special issue is on data and security engineering, as well as engineering applications.

Inhaltsverzeichnis

Frontmatter

Fast Distributed Top-q and Top-k Query Processing

Abstract
Top-k queries retrieve the k results of a query which score best for an objective function representing the preferences of users. To require that the returned results also have to satisfy the preferences to a certain degree we introduce top-q queries which return all results which approximate the user preferences to at least some minim degree q. We show how top-q queries and top-k queries can be combined enabling the user to post a large number of interesting queries. Furthermore, we show that the calculation of top-q queries can be integrated in algorithms efficiently processing top-k queries. We implemented our approach and evaluated it against the fastest threshold based top-k query answering approaches (BPA-2). Our experiments showed an improvement by one to two orders of magnitude regarding time and memory requirements. Furthermore, we show how such queries can be processed in highly distributed peer-to-peer databases in an efficient way and propose an adaptive algorithm which takes several parameters of the network of databases into account to optimize the processing of distributed top-k queries.
Claus Dabringer, Johann Eder

Invariant Properties and Bounds on a Finite Time Consensus Algorithm

Abstract
Finite time consensus algorithms compute consensus values exactly and in a finite number of steps, contrasting with asymptotic consensus algorithms. In the literature, there exists few approaches deriving finite time convergence for discrete consensus algorithms. In this paper we focus on an analysis of finite time convergence based on the observability matrix for consensus networks. We introduce analytical results extending the applicability of network observability theory to consensus and other distributed algorithms. New analytical bounds on the number of steps to compute consensus are provided as well as counterexamples which are disproving a conjecture on the minimum of steps to compute consensus. A polynomial time algorithm is described to calculate empirically the exact number of steps to compute consensus values. We have implemented a consensus-based network intrusion detection system based on the observability matrix approach of consensus networks. This implementation validates empirically our analytical results. We also compare the performance of the finite time consensus with an implementation of the same intrusion detection system using asymptotic consensus. Although the finite time algorithm provides exact solutions, tests show that it needs less iterations to obtain a consensus solution.
Michel Toulouse, Bùi Quang Minh, Quang Tran Minh

Parallel Learning Algorithms of Local Support Vector Regression for Dealing with Large Datasets

Abstract
New parallel algorithms of local support vector regression (local SVR), called kSVR, krSVR are proposed in this paper to efficiently handle the prediction task for large datasets. The learning strategy of kSVR performs the regression task with two main steps. The first one is to partition the training data into k clusters, followed which the second one is to learn the SVR model from each cluster to predict the data locally in the parallel way on multi-core computers. The krSVR learning algorithm trains an ensemble of T random kSVR models for improving the generalization capacity of the kSVR alone. The performance analysis in terms of the algorithmic complexity and the generalization capacity illustrates that our kSVR and krSVR algorithms are faster than the standard SVR for the non-linear regression on large datasets while maintaining the high correctness in the prediction. The numerical test results on five large datasets from UCI repository showed that proposed kSVR and krSVR algorithms are efficient compared to the standard SVR. Typically, the average training time of kSVR and krSVR are 183.5 and 43.3 times faster than the standard SVR; kSVR and krSVR also improve 62.10%, 63.70% of the relative prediction correctness compared to the standard SVR, respectively.
Thanh-Nghi Do, Le-Diem Bui

A Parallel Incremental Frequent Itemsets Mining IFIN+: Improvement and Extensive Evaluation

Abstract
In this paper, we propose a shared-memory parallelization solution for the Frequent Itemsets Mining algorithm IFIN, called IFIN+. The motivation for our work is that commodity processors, nowadays, are enhanced with many physical computational units, and exploiting full advantage of this is a potential solution to improve computational performance in single-machine environments. The portions in the serial version are improved in means which increases efficiency and computational independence for convenience in designing parallel computation with Work-Pool model, be known as a good model for load balance. We conducted extensive experiments on both synthetic and real datasets to evaluate IFIN+ against its serial version IFIN, the well-known algorithm FP-Growth and other two state-of-the-art ones, FIN and PrePost+. The experimental results show that the running time of IFIN+ is the most efficient, especially in the case of mining at different support thresholds within the same running session. Compare to its serial version, IFIN+ performance is improved significantly.
Van Quoc Phuong Huynh, Josef Küng, Tran Khanh Dang

Automated Security Analysis of Authorization Policies with Contextual Information

Abstract
Role-Based Access Control (RBAC) has made great attention in the security community and is widely deployed in the enterprise as a major tool to manage security and restrict system access to unauthorized users. As the RBAC model evolves to meet enterprise requirements, the RBAC policies will become complex and need to be managed by multiple collaborative administrators. The collaborative administrator may interact unintendedly with the policies, creates the undesired effect to the security requirements of the enterprise. Consequently, researchers have studied various safety analyzing techniques that are useful to prevent such issues in RBAC, especially with the Administrative Role-Based Access Control (ARBAC97). For critical applications, several extensions of RBAC, such as Spatial-Temporal Role-Based Access Control (STRBAC), are being adopted in recent years to enhance the security of an application on authorization with contextual information such as time and space. The features, which proposed in STRBAC for collaborative administrators, may interact in subtle ways that violate the original security requirements. However, the analysis of it has not been considered in the literature.
In this research, we consider the security analysis technique for the extension of STRBAC, named Administrative STRBAC (ASTRBAC), and illustrate the safety analysis technique to detect and report the violation of the security requirements. This technique leverages First-Order Logic and Symbolic Model Checking (SMT) by translating the policies to decidable reachability problems, which are essential to understand the security policies and inform policies designer using this model to take appropriate actions. Our extensive experimental evaluation demonstrates the correctness of our proposed solutions in practice, which supports finite ASTRBAC policies analysis without prior knowledge about the number of users in the system.
Khai Kim Quoc Dinh, Anh Truong

Classification Methods in Colon Disease Information System

Abstract
This paper presents the process of building a new logistic regression model, which aims to support the decision-making process in medical database. The developed logistic regression model, J48 classifier and Random Tree algorithm define the probability of the disease and indicates the statistically significant changes that affect the onset of the disease. In our work, we attempted to build a classifier that would classify patients undergoing ulcerative colitis and other conditions within the lower gastrointestinal tract. The value of probability can be treated as one of the feature in decision process of patient’s future treatment.
Anna Kasperczuk, Agnieszka Dardzinska

Application of Regularized Online Sequential Learning for Glucose Correction

Abstract
Glucose measurement by using handheld devices is applied widely due to their comfortabilities. They are easy to use and can give results quickly. However, the accuracy of measurement results is affected by interferences, in which hematocrit (HCT) is one of the most highly affecting factors. In this paper, an approach for glucose correction based on the neural network is presented. The regularized online sequential learning is utilized for hematocrit estimation. The transduced current curve which is produced by the chemical reaction during glucose measurement is used as an input feature of neural network. The experimental results shown that the proposed approach is promising.
Hieu Trung Huynh, Yonggwan Won

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise