Transactions on Large-Scale Data- and Knowledge-Centered Systems V

herausgegeben von: Abdelkader Hameurlain, Josef Küng, Roland Wagner

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The LNCS journal Transactions on Large-Scale Data- and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing across different sites connected through networks has led to an evolution of data- and knowledge-management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability. Current decentralized systems still focus on data and knowledge as their main resource. Feasibility of these systems relies basically on P2P (peer-to-peer) techniques and the support of agent systems with scaling and decentralized control. Synergy between Grids, P2P systems, and agent technologies is the key to data- and knowledge-centered systems in large-scale environments. This, the fifth issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, contains nine selected full-length papers, focusing on the topics of query processing, information extraction, management of dataspaces and contents, and mobile applications.

Inhaltsverzeichnis

Frontmatter

Approximate Query Processing for Database Flexible Querying with Aggregates

Abstract

Database flexible querying is an alternative to the classic one for users. The use of Formal Concepts Analysis (FCA) makes it possible to turn approximate answers that those turned over by a classic DataBase Management System (DBMS). Some applications do not need exact answers. However, database flexible querying can be expensive in response time. This time is more significant when the flexible querying require the calculation of aggregate functions (“Sum”, “Avg”, “Count”, “Var” etc.). Online aggregation enables a user to issue an SQL aggregate query, see results immediately, and adjust the processing as the query runs. In this case, the user sees refining estimates of the final aggregation results. In spite of the success which known this method, until now, it hasn’t been integrated in flexible querying systems. In this article, we propose an approach which tries to solve this problem by using approximate query processing (AQP). This approach allow user to i) wrote flexible query contains linguistics terms, ii) observe the progress of their aggregation queries and iii) control execution on the fly. We report on an initial implementation of online aggregation in a flexible querying system.

Minyar Sassi, Oussama Tlili, Habib Ounelli

Metric-Based Similarity Search in Unstructured Peer-to-Peer Systems

Abstract

Peer-to-peer systems constitute a promising solution for deploying novel applications, such as distributed image retrieval. Efficient search over widely distributed multimedia content requires techniques for distributed retrieval based on generic metric distance functions. In this paper, we propose a framework for distributed metric-based similarity search, where each participating peer stores its own data autonomously. In order to establish a scalable and efficient search mechanism, we adopt a super-peer architecture, where super-peers are responsible for query routing. We propose the construction of metric routing indices suitable for distributed similarity search in metric spaces. Furthermore, we present a query routing algorithm that exploits pruning techniques to selectively direct queries to super-peers and peers with relevant data. We study the performance of the proposed framework using both synthetic and real data demonstrate its scalability over a wide range of experimental setups.

Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis

Adaptive Parallelization of Queries to Data Providing Web Service Operations

Abstract

A data providing web service operation returns a collection of objects for given parameters without any side effects. The Web Service MEDiator (WSMED) system automatically provides relational views of any data providing web service operations by reading their WSDL documents. These views are queried with SQL. In an execution plan, a call to a data providing web service operation may depend on the results from other web service operation calls. In other cases, different web service calls are independent of each other and can be called in any order. To parallelize and speed up both dependent and independent web service operation calls, WSMED has been extended with the adaptive operator PAP. It incrementally parallelizes calls to web service operations until no significant performance improvement is measured. The performance of PAP is evaluated using publicly available web services. The operator substantially improves the query performance without knowing the cost of calling a web service operation and without extensive memory usage.

Manivasakan Sabesan, Tore Risch

A Pattern-Based Approach for Efficient Query Processing over RDF Data

Abstract

The recent prevalence of Linked Data attracts research interest towards the efficiency of query execution over the web of data. Search and query engines crawl and index triples into a centralized repository and queries are executed locally. It has been shown in various literatures that the performance bottleneck of large scale query execution lies in joins and unions. Based on the observation that a large part of join operations result in a much smaller binding set which can be precomputed and stored, we propose to augment RDF indexes to store the bindings of complex patterns and exploit these patterns to enhance performance. In addition to the index, we also introduce two strategies of selecting these patterns: one depends on developed heuristic rules and the other employs query history to optimize time-space ratio. Our empirical study demonstrates the proposed pattern index outperforms traditional triple index by up to three orders of magnitude while keeping the overhead low.

Yuan Tian, Haofen Wang, Wei Jin, Yuan Ni, Yong Yu

The H $\imath$ L ε X System for Semantic Information Extraction

Abstract

The explosive growth and popularity of the Web has resulted in a huge amount of digital information sources on the Internet. Unfortunately, such sources only manage data, rather than the knowledge they carry. Recognizing, extracting, and structuring relevant information according to their semantics is a crucial task. Several approaches in the field of Information Extraction (IE) have been proposed to support the translation of semi-structured/unstructured documents into structured data or knowledge. Most of them have a high precision but, since they are mainly syntactic, they often have a low recall, are dependent on the document format, and ignore the semantics of information they extract. In this paper, we describe a new approach for semantic information extraction that could represent the basis for automatically extracting highly structured data from unstructured web sources without any undesirable trade-off between precision and recall. In short, the approach (i) is ontology driven, (ii) is based on a unified representation of documents, (iii) integrates existing IE techniques, (iv) implements semantic regular expressions, (v) has been implemented through Answer Set Programming, (vi) is employed in real-world applications, and (vii) is having a positive feedback from business customers.

Marco Manna, Ermelinda Oro, Massimo Ruffolo, Mario Alviano, Nicola Leone

DSToolkit: An Architecture for Flexible Dataspace Management

Abstract

The vision of dataspaces is to provide various of the benefits of classical data integration, but with reduced up-front costs. Combining this with opportunities for incremental refinement enables a ‘pay-as-you-go’ approach to data integration, resulting in simplified integrated access to distributed data. It has been speculated that model management could provide the basis for Dataspace Management, however, this has not been investigated until now.

Here, we present DSToolkit, the first dataspace management system that is based on model management, and therefore, benefits from the flexibility provided by the approach for the management of schemas represented in heterogeneous models, supports the complete dataspace lifecycle, which includes automatic initialisation, maintenance and improvement of a dataspace, and allows the user to provide feedback by annotating result tuples returned as a result of queries the user has posed. The user feedback gathered is utilised for improvement by annotating, selecting and refining mappings. Without the need for additional feedback on a new data source, these techniques can also be applied to determine its perceived quality with respect to already gathered feedback and to identify the best mappings over all sources including the new one.

Cornelia Hedeler, Khalid Belhajjame, Lu Mao, Chenjuan Guo, Ian Arundale, Bernadette Farias Lóscio, Norman W. Paton, Alvaro A. A. Fernandes, Suzanne M. Embury

Temporal Content Management and Website Modeling: Putting Them Together

Abstract

The adoption of high-level models for temporal, data-intensive Web sites is proposed together with a methodology for the design and development through a content management system (CMS). The process starts with a traditional ER scheme; the various steps lead to a temporal ER scheme, to a navigation scheme (called N-ER) and finally to a logical scheme (called T-ADM). The logical model allows the definition of page-schemes with temporal aspects (which could be related to the page as a whole or to individual components of it). Each model considers the temporal features that are relevant at the respective level. A content management tool associated with the methodology has been developed: from a typical content management interface it automatically generates both the relational database (with the temporal features needed) supporting the site and the actual Web pages, which can be dynamic (JSP) or static (plain HTML or XML), or a combination thereof. The tool also includes other typical features of content management all integrated with temporal features.

Paolo Atzeni, Pierluigi Del Nostro, Stefano Paolozzi

Homogeneous and Heterogeneous Distributed Classification for Pocket Data Mining

Abstract

Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques.

Frederic Stahl, Mohamed Medhat Gaber, Paul Aldridge, David May, Han Liu, Max Bramer, Philip S. Yu

Integrated Distributed/Mobile Logistics Management

Abstract

The objective of the logistics management and control in a transport enterprise is to plan for the cheapest way to fulfill the transport needs of the customers and to offer services to the customers like for example supplying information about where products of the customers are in the transport process at any time. In case of delays in some transports this type of information may be important if the customers have to fulfill their obligations. The objective of this paper is to describe an architecture for logistics management systems where it is possible to integrate the logistics management systems of different cooperating transport companies or mobile users in order to optimize the availability to data that can optimize the transport process. In central databases the consistency of data is normally implemented by using the ACID (Atomicity, Consistency, Isolation and Durability) properties of a DBMS (Data Base Management System). This is not possible if distributed and/or mobile databases are involved and the availability of data also has to be optimized. Therefore, we will in this paper use so-called relaxed ACID properties across different locations. The objective of designing relaxed ACID properties across different database locations is that the users can trust the data they use even if the distributed database is temporarily inconsistent. It is also important that disconnected locations can operate in a meaningful way in so-called disconnected mode.

Lars Frank, Rasmus Ulslev Pedersen

Backmatter

Titel: Transactions on Large-Scale Data- and Knowledge-Centered Systems V
herausgegeben von: Abdelkader Hameurlain
Josef Küng
Roland Wagner
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-28148-8
Print ISBN: 978-3-642-28147-1
DOI: https://doi.org/10.1007/978-3-642-28148-8