Skip to main content
Top

2011 | Book

New Directions in Web Data Management 1

Editors: Athena Vakali, Lakhmi C. Jain

Publisher: Springer Berlin Heidelberg

Book Series : Studies in Computational Intelligence

insite
SEARCH

About this book

This book addresses the major issues in the Web data management related to technologies and infrastructures, methodologies and techniques as well as applications and implementations. Emphasis is placed on Web engineering and technologies, Web graph managing, searching and querying and the importance of social Web.

Table of Contents

Frontmatter
Innovations and Trends in Web Data Management
Abstract
The growing influence and resulting importance of the Web 2.0 applications has changed the daily practices in the areas of research, education, finance, entertainment and an even wider range of applications in work and personal life. Such a development in the roles of users such as navigators, content creators and regulators has had a major impact. This impacts on the amount and type of data and the sources that are now circulated and disseminated over the Web. It has posed new and interesting research questions and problems in Web data management.
Athena Vakali
Massive Graph Management for the Web and Web 2.0
Abstract
The problem of efficiently managing massive datasets has gained increasing attention due to the availability of a plethora of data from various sources, such as the Web. Moreover, Web 2.0 applications seem to be one of the most fruitful sources of information as they have attracted the interest of a large number of users that are eager to contribute to the creation of new data, available online. Several Web 2.0 applications incorporate Social Tagging features, allowing users to upload and tag sets of online resources. This activity produces massive amounts of data on a daily basis, which can be represented by a tripartite graph structure that connects users, resources and tags. The analysis of Social Tagging Systems (STS) emerges as a promising research field, enabling the identification of common patterns in the behavior of users, or the identification of communities of semantically related tags and resources, and much more. The massive size of STS datasets dictates the necessity for a robust underlying infrastructure to be used for their storage and access.
This chapter contains a survey of existing solutions to the problem of storing and managing massive graph data focusing particularly on the implications that the underlying technologies of such frameworks have on the support/operation of Web 2.0 applications using them as back-end storage solutions, as well as on the efficient execution of web mining tasks. Considering the category of STS as an example of Web 2.0 applications, the requirements that are posed for the management of STS data are thoroughly discussed. On the basis of these requirements three frameworks have been developed, using state-of-the-art technologies as backbones. The results of benchmarks conducted on the developed frameworks are presented and discussed.
Maria Giatsoglou, Symeon Papadopoulos, Athena Vakali
Web Engineering and Metrics
Abstract
The objective of this chapter is three-fold. First, it provides an introduction to Web Engineering, and discusses the need for empirical investigations in this area. Second, it defines concepts such as metrics and measurement, and details the types of quantitative metrics that can be gathered when carrying out empirical investigations in Web Engineering. Third, it presents the three main types of empirical investigations – surveys, case studies, and formal experiments.
Emilia Mendes
Modern Web Technologies
Abstract
Nowadays, World Wide Web is one of the most significant tools that people employ to seek information, locate new sources of knowledge, communicate, share ideas and experiences or even purchase products and make online bookings. The technologies adopted by the modern Web applications are being discussed in this book chapter. We summarize the most fundamental principles employed by the Web such as the client-server model and the http protocol and then we continue by presenting the current trends such as asynchronous communications, distributed applications, cloud computing and mobile Web applications. Finally, we conduct a short discussion regarding the future of the Web and the technologies that are going to play key roles in the deployment of novel applications.
Leonidas Akritidis, Dimitrios Katsaros, Panayiotis Bozanis
Federated Data Management and Query Optimization for Linked Open Data
Abstract
Linked Open Data provides data on the web in a machine readable way with typed links between related entities. Means of accessing Linked Open Data include crawling, searching, and querying. Search in Linked Open Data allows for more than just keyword-based, document-oriented data retrieval. Only complex queries across different data source can leverage the full potential of Linked Open Data. In this sense Linked Open Data is more similar to distributed/federated databases, but with less cooperation between the data sources, which are maintained independently and may update their data without notice. Since Linked Open Data is based on standards like the RDF format and the SPARQL query language, it is possible to implement a federation infrastructure without the need for specific data wrappers. However, some design issues of the current SPARQL standard limit the efficiency and applicability of query execution strategies. In this chapter we consider some details and implications of these limitations and presents an improved query optimization approach based on dynamic programming.
Olaf Görlitz, Steffen Staab
Queries over Web Services
Introduction
Nowadays, technologies such as grid and cloud computing infrastructures and service-oriented architectures have become adequately mature and have been adopted by a large number of enterprizes and organizations [2,19,36]. A Web Service (WS) is a software system designed to support interoperable machine-to-machine interaction over a network and is implemented using open standards and protocols. WSs became popular data management entities; some of their benefits are interoperability and reuseability.
Efthymia Tsamoura, Anastasios Gounaris, Yannis Manolopoulos
Towards Adaptively Approximated Search in Distributed Architectures
Abstract
Innovative applications over distributed architectures, like the Web, often require the analysis of strongly related, highly heterogeneous data, stored in remote and autonomous data sources, that can be either totally available at query processing time (stored data) or become available in a continuous stream (data stream). In these contexts, search efficiency is a key issue. However, classical processing techniques, according to which queries are executed exactly, both for what concerns the request and for what concerns the processing technique, which is set at the beginning of the execution, may not ensure adequate performance and quality (in terms of completeness and of accuracy) of the returned result. To overcome such problem, approximate and adaptive query processing techniques have been proposed. Adaptive techniques aim at ensuring an efficient query processing whenever a priori information, needed to statically select once at the beginning of the processing the most efficient processing technique, is not available. Approximation, by contrast, has been proposed for ensuring a higher result quality in presence of data heterogeneity and limited data knowledge. In highly dynamic and heterogeneous environments, these two approaches have usually been considered as orthogonal. However, we claim that applications exist that could benefit from a combined approach. An example are Web applications allowing to specify queries on heterogeneous data (streams), retrieved through mash-up from different sites. Since data are dynamically acquired, they cannot be statically reconciled, before processing queries. Moreover, adopting a single approximate search strategy, fixed a priori, could penalize the system efficiency and/or the quality of result, whenever heterogeneity only characterizes subsets of input data. The aim of this chapter is to make one step towards the integration of such approaches by introducing Approximate Search with Adaptive Processing (ASAP for short) systems. In ASAP, decisions concerning when, how, and how much to approximate are taken dynamically, with the goal of optimizing both the quality of result and the efficiency of processing.
Barbara Catania, Giovanna Guerrini
Online Social Networks: Status and Trends
Abstract
The rapid proliferation of Online Social Network (OSN) sites has made a profound impact on the WWW, which tends to reshape its structure, design, and utility. Industry experts believe that OSNs create a potentially transformational change in consumer behavior and will bring a far-reaching impact on traditional industries of content, media, and communications. This chapter starts out by presenting the current status of OSNs through a taxonomy which delineates the spectrum of attributes that relate to these systems. It also presents an overall reference system architecture that aims at capturing the building blocks of prominent OSNs. Additionally, it provides a state-of-the-art survey of popular OSN systems, examining their architectural designs and business models. Finally, the chapter explores the future trends of OSN systems, presents significant research challenges and discusses their societal and business impact.
George Pallis, Demetrios Zeinalipour-Yazti, Marios D. Dikaiakos
Enhancing Computer Vision Using the Collective Intelligence of Social Media
Abstract
Teaching the machine has been a great challenge for computer vision scientists since the very first steps of artificial intelligence. Throughout the decades there have been remarkable achievements that drastically enhanced the capabilities of the machines both from the perspective of infrastructure (i.e., computer networks, processing power, storage capabilities), as well as from the perspective of processing and understanding of the data. Nevertheless, computer vision scientists are still confronted with the problem of designing techniques and frameworks that will be able to facilitate effortless learning and allow analysis methods to easily scale in many different domains and disciplines. It is true that state of the art approaches cannot produce highly effective models, unless there is dedicated, and thus costly, human supervision in the process of learning that dictates the relation between the content and its meaning (i.e., annotation). Recently, we have been witnessing the rapid growth of Social Media that emerged as the result of users’ willingness to communicate, socialize, collaborate and share content. The outcome of this massive activity was the generation of a tremendous volume of user contributed data that have been made available on the Web, usually along with an indication of their meaning (i.e., tags). This has motivated the research objective of investigating whether the Collective Intelligence that emerges from the users’ contributions inside a Web 2.0 application, can be used to remove the need for dedicated human supervision during the process of learning. In this chapter we deal with a very demanding learning problem in computer vision that consists of detecting and localizing an object within the image content. We present a method that exploits the Collective Intelligence that is fostered inside an image Social Tagging System in order to facilitate the automatic generation of training data and therefore object detection models. The experimental results shows that although there are still many issues to be addressed, computer vision technology can definitely benefit from Social Media.
Elisavet Chatzilari, Spiros Nikolopoulos, Ioannis Patras, Ioannis Kompatsiaris
From Extensional Data to Intensional Data: AXML for XML
Abstract
As a data representation language, eXtensible Markup Language (XML) needs to satisfy new requirements from the rapid development of Peer-to-Peer architectures and wide use of Web Services. The solution of this problem is the proposal of Intensional XML Data, such as Active XML (AXML). AXML consists of normal XML data and embedded Web Service calls. This extension of XML promises many advantages such as: (1) reusable data, (2) dynamic and fresh data, (3) user-oriented and intensive data, (4) time and bandwidth saving, (5) Web Service invocations simplifications, (6) sharing computing tasks as well as (7) support for distributed computing.
This chapter focuses on the introduction of intensional XML data, a new extension for XML that is able to integrate XML data and Web Services. Moreover, a current solution for intensional XML that is known as Active XML (AXML) will be introduced and discussed. Advantages of the new XML extension and its comparison to traditional XML will be presented through examples.
Viet Binh Phan, Eric Pardede, J. Wenny Rahayu
Migrating Legacy Assets through SOA to Realize Network Enabled Capability
Abstract
Network Enabled Capability (NEC) is the UK Ministry of Defence’s aspiration to enhance the achievement of military effect through the networking of future and existing military capabilities. The NECTISE (NEC Through Innovative Systems Engineering) program responded to this need by investigating the question ‘are you ready for NEC?’ on behalf of equipment and service providers. Research work on this project proposed Service Oriented Architectures (SOA) as an architectural approach to delivering dependable and sustainable military capability. Specifically the work looked at how loosely coupled services could be used to expose and reuse functions and databases and how to describe the quality of service for heterogeneous systems and networks. The System of Systems that NEC will be realized from will not be implemented from scratch, but rather will be migrated from legacy assets over time. These assets will provide both functionality and data/information services, for example a weather sensor.
The focus of this chapter is to layout an understanding of the challenges faced and lessons learned in realizing NEC when migrating legacy assets to an SOA (Service Oriented Architecture) based System of Systems over time in order to reuse their functionality and databases. This work was based around a Software Demonstrator to illustrate a situational awareness capability realized by dynamically discovering and aggregating sensor data. This focus is not specifically on sensors, however, the sensor example provides a good example of data integration to realize military capability.
An abstract decision process model for wrapping legacy components was proposed to guide how existing system components can be selected for integration into the system of systems that NEC will be realized from. This model can be used to assist in the integration process of system components when migrating to or between execution architectures. The process model provides decision support for trade-offs between the costs of reimplementation, system wrapping and those costs incurred as a consequence of System of Systems complexity and ongoing maintenance.
David Webster, Lu Liu, Duncan Russell, Colin Venters, Zongyang Luo, Jie Xu
Backmatter
Metadata
Title
New Directions in Web Data Management 1
Editors
Athena Vakali
Lakhmi C. Jain
Copyright Year
2011
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-17551-0
Print ISBN
978-3-642-17550-3
DOI
https://doi.org/10.1007/978-3-642-17551-0

Premium Partner