Skip to main content

Über dieses Buch

This, the 37th issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, contains five revised selected regular papers. Topics covered include data security in clouds, privacy languages, probabilistic modelling in linked data integration, business intelligence based on multi-agent systems, collaborative filtering, and prediction accuracy.



Keeping Secrets by Separation of Duties While Minimizing the Amount of Cloud Servers

In this paper we address the problem of data confidentiality when outsourcing data to cloud service providers. In our separation of duties approach, the original data set is fragmented into insensitive subsets such that each subset can be managed by an independent cloud provider. Security policies are expressed as sets of confidentiality constraints that induce the fragmentation process. We assume that the different cloud providers do not communicate with each other so that only the actual data owner is able to link the subsets and reconstruct the original data set. While confidentiality is a hard constraint that has to be satisfied in our approach, we consider two further optimization goals (the minimization of the amount of cloud providers and the maximization of utility as defined by visibility constraints) as well as data dependencies that might lead to unwanted disclosure of data. We extend prior work by formally defining the confidentiality and optimization requirements as an optimization problem. We provide an integer linear program (ILP) formulation and analyze different settings of the problem. We present a prototype that exploits a distributed installation of several PostgreSQL database systems; we give an in-depth account of the sophisticated distributed query management that is enforced by defining views for the outsourced data sets and rewriting queries according to the fragments.
Ferdinand Bollwein, Lena Wiese

LPL, Towards a GDPR-Compliant Privacy Language: Formal Definition and Usage

The upcoming General Data Protection Regulation (GDPR) imposes several new legal requirements for privacy management in information systems. In this paper, we introduce LPL, an extensible Layered Privacy Language that allows to express and enforce these new privacy properties such as personal privacy, user consent, data provenance, and retention management. We present a formal description of LPL. Based on a set of usage examples, we present how LPL expresses and enforces the main features of the GDPR and application of state-of-the-art anonymization techniques.
Armin Gerl, Nadia Bennani, Harald Kosch, Lionel Brunie

Quantifying and Propagating Uncertainty in Automated Linked Data Integration

The Web of Data consists of numerous Linked Data (LD) sources from many largely independent publishers, giving rise to the need for data integration at scale. To address data integration at scale, automation can provide candidate integrations that underpin a pay-as-you-go approach. However, automated approaches need: (i) to operate across several data integration steps; (ii) to build on diverse sources of evidence; and (iii) to contend with uncertainty. This paper describes the construction of probabilistic models that yield degrees of belief both on the equivalence of real-world concepts, and on the ability of mapping expressions to return correct results. The paper shows how such models can underpin a Bayesian approach to assimilating different forms of evidence: syntactic (in the form of similarity scores derived by string-based matchers), semantic (in the form of semantic annotations stemming from LD vocabularies), and internal in the form of fitness values for candidate mappings. The paper presents an empirical evaluation of the methodology described with respect to equivalence and correctness judgements made by human experts. Experimental evaluation confirms that the proposed Bayesian methodology is suitable as a generic, principled approach for quantifying and assimilating different pieces of evidence throughout the various phases of an automated data integration process.
Klitos Christodoulou, Fernando Rene Sanchez Serrano, Alvaro A. A. Fernandes, Norman W. Paton

A Comprehensive Approach for Designing Business-Intelligence Solutions with Multi-agent Systems in Distributed Environments

Multi-agent systems (MAS) are an active research area of system engineering to deal with the complexity of distributed systems. Due to the complexity of business-intelligence (BI) generation in a distributed environment, the adaptation of such system is diverse due to integrated MAS and distributed data mining (DDM) technologies. Bringing these two frameworks together in the content of BI-systems poses challenges during the analysis, design, and test in the development life-cycle. The development processes of such complex systems demand a comprehensive methodology to systematically guide and support developers through the various stages of BI-system life-cycles. In the context of agent-based system engineering, several agent-oriented methodologies exist. Deploying the most suitable methodology is another challenge for developers. In this paper, we develop an exemplar of MAS-based BI-system called BI-MAS with comprehensive designing steps as a running case. For demonstrating the new approach, first we consider an evaluation process to find suitable agent-oriented methodologies. Second, we apply the selected methodologies in analyzing and designing concepts for BI-MAS life-cycles. Finally, we demonstrate a new approach of verification and validation processes for BI-MAS life-cycles.
Karima Qayumi, Alex Norta

Enhancing Rating Prediction Quality Through Improving the Accuracy of Detection of Shifts in Rating Practices

The most widely used similarity metrics in collaborative filtering, namely the Pearson Correlation and the Adjusted Cosine Similarity, adjust each individual rating by the mean of the ratings entered by the specific user, when computing similarities, due to the fact that users follow different rating practices, in the sense that some are stricter when rating items, while others are more lenient. However, a user’s rating practices change over time, i.e. a user could start as lenient and subsequently become stricter or vice versa; hence by relying on a single mean value per user, we fail to follow such shifts in users’ rating practices, leading to decreased rating prediction accuracy. In this work, we present a novel algorithm for calculating dynamic user averages, i.e. time-in-point averages that follow shifts in users’ rating practices, and exploit them in both user-user and item-item collaborative filtering implementations. The proposed algorithm has been found to introduce significant gains in rating prediction accuracy, and outperforms other dynamic average computation approaches that are presented in the literature.
Dionisis Margaris, Costas Vassilakis


Weitere Informationen

Premium Partner