Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX

Special Issue on Big Data and Open Data

herausgegeben von: Abdelkader Hameurlain, Josef Küng, Roland Wagner, Devis Bianchini, Valeria De Antonellis, Roberto De Virgilio

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The LNCS journal Transactions on Large-Scale Data- and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing across different sites connected through networks has led to an evolution of data- and knowledge-management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability. Current decentralized systems still focus on data and knowledge as their main resource. Feasibility of these systems relies basically on P2P (peer-to-peer) techniques and the support of agent systems with scaling and decentralized control. Synergy between grids, P2P systems, and agent technologies is the key to data- and knowledge-centered systems in large-scale environments. This, the 19th issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, contains four high-quality papers investigating the areas of linked data and big data from a data management perspective. Two of the four papers focus on the application of clustering techniques in performing inference and search over (linked) data sources. One paper leverages graph analysis techniques to enable application-level integration of institutional data and a final paper describes an approach for protecting users' profile data from disclosure, tampering, and improper use.

Inhaltsverzeichnis

Frontmatter

Structure Inference for Linked Data Sources Using Clustering

Abstract

Linked Data (LD) overlays the World Wide Web of documents with a Web of Data. This is becoming significant as shown in the growth of LD repositories available as part of the Linked Open Data (LOD) cloud. At the instance-level, LD sources use a combination of terms from various vocabularies, expressed as RDFS/OWL, to describe data and publish it to the Web. However, LD sources do not organise data to conform to a specific structure analogous to a relational schema; instead data can adhere to multiple vocabularies. Expressing SPARQL queries over LD sources – usually over a SPARQL endpoint that is presented to the user – requires knowledge of the predicates used so as to allow queries to express user requirements as graph patterns. Although LD provides low barriers to data publication using a single language (i.e., RDF), sources organise data with different structures and terminologies. This paper describes an approach to automatically derive structural summaries over instance-level data expressed as RDF triples. The technique builds on a hierarchical clustering algorithm that organises RDF instance-level data into groups that are then utilised to infer a structural summary over a LD source. The resulting structural summaries are expressed in the form of classes, properties and, relationships. Our experimental evaluation shows good results when applied to different types of LD sources.

Klitos Christodoulou, Norman W. Paton, Alvaro A. A. Fernandes

The Web Within: Leveraging Web Standards and Graph Analysis to Enable Application-Level Integration of Institutional Data

Abstract

The expansion of the Web and of our capacity of producing and storing information have had a profound impact on the way we organize, manipulate and share data. We have seen an increased specialization of database back-ends and data models to respond to modern application needs: text indexing engines organize unstructured data, standards and models were created to support the Semantic Web, Big Data requirements stimulated an explosion of data representation and manipulation models. This complex and heterogeneous environment demands unified strategies that enable data integration and, especially, cross-application, expressive querying.

Here we present a new approach for the integration of structured and unstructured data within organizations. Our solution is based on the Complex Data Management System (CDMS), a system being developed to handle data typical of complex networks. The CDMS enables a relationship-centric interaction with data that brings many advantages to the institutional data integration scenario, allowing applications to rely on common models for data querying and manipulation.

In our framework, diverse data models are integrated in a unifying RDF graph. A novel query model allows the combination of concepts from information retrieval, databases, and complex networks into a declarative query language that extends SPARQL. This query language enables flexible correlation queries over the unified data, enabling support for a wide range of applications such as CMSs, recommendation systems, social networks, etc. We also introduce Mappers, a data management mechanism that simplifies the integration of heterogeneous data and that is integrated in the query language for further flexibility. Experimental results from real data demonstrate the viability of our approach.

Luiz Gomes Jr., André Santanchè

Dimensional Clustering of Linked Data: Techniques and Applications

Abstract

The plurality and heterogeneity of linked data features require appropriate solutions for accurate matching and clustering. In this paper, we propose a dimensional clustering approach to enforce (i) the capability to select the set of features to use for data matching and clustering, that are packaged into the so-called thematic dimension, and (ii) the capability to make explicit the cause of similarity that generates each cluster. Ensemble techniques for combining different single-dimension cluster sets into a sort of multi-dimensional view of the considered linked data are also presented as a further contribution of the paper. Application to linked data summarization and exploration is finally discussed.

Alfio Ferrara, Lorenzo Genta, Stefano Montanelli, Silvana Castano

ProProtect3: An Approach for Protecting User Profile Data from Disclosure, Tampering, and Improper Use in the Context of WebID

Abstract

WebID is a new identification approach of the W3C. It enables managing profile data associated to persons and services at self-defined places in the cloud. By relying on RDF vocabularies like FOAF for describing user profile data, WebID contributes to the Semantic Web vision. While access to user profiles can be controlled with existing security mechanisms, they are not designed to protect sensitive data within user profiles from unwanted retrieval, malicious manipulation, and improper use. This article analyzes the risks that affect the knowledge stored in WebID-based user profiles. It therefore describes potential attack scenarios and outlines the challenges a solution must deal with. To tackle the problem of insufficient protection, we propose ProProtect3. This approach enables identity owners (1) to create customized filters for sensitive data, (2) to verify the profile data integrity, and (3) to restrict the rights of delegatees. For evaluating the ProProtect3 approach, we integrate it into a WebID identity provider.

Stefan Wild, Fabian Wiedemann, Sebastian Heil, Alexey Tschudnowsky, Martin Gaedke

Backmatter

Titel: Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX
herausgegeben von: Abdelkader Hameurlain
Josef Küng
Roland Wagner
Devis Bianchini
Valeria De Antonellis
Roberto De Virgilio
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-662-46562-2
Print ISBN: 978-3-662-46561-5
DOI: https://doi.org/10.1007/978-3-662-46562-2