Skip to main content
Top

2009 | Book

Web Information Systems Engineering - WISE 2009

10th International Conference, Poznań, Poland, October 5-7, 2009. Proceedings

Editors: Gottfried Vossen, Darrell D. E. Long, Jeffrey Xu Yu

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

Welcome to the tenth anniversary of the International Conference on Web Information Systems Engineering, WISE 2009. This year the WISE conference continued the tradition that has evolved from the inaugural conference held in 2000 in Hong Kong and since then has made its journey around the world: 2001 Kyoto (Japan), 2002 Singapore, 2003 Rome (Italy), 2004 Brisbane (Australia), 2005 New York (USA), 2006 Wuhan (China), 2007 Nancy (France), and 2008 Auckland (New Zealand). This year we were happy to hold the event in Poznan, a city of 600,000 inhabitants in western Poland. Poznan is the capital of the most affluent province of the country – Wielkopolska – which means “Greater Poland”. For more than 1,000 years, Poznan’s geographical location has predestined the city to be a significant scientific, cultural and economic center with more than just regional influence. The city is situated on the strategic cross-roads from Paris and Berlin in the west, to Warsaw and Moscow in the east, and from Scandinavia through the Baltic Sea in the north to the Balkans in the south. Poznan is a great research and university center with a dynamic potential. In all, 140,000 students are enrolled in 26 state-run and private institutions of higher education here, among which the Poznan University of Economics with its 12,000 students is one of the biggest. The WISE 2009 Conference provided a forum for engineers and scientists to present their latest findings in Web-related technologies and solutions.

Table of Contents

Frontmatter

Keynotes

Blighted Virtual Neighborhoods and Other Threats to Online Social Experiences

The rapid expansion of web presence into many new kinds of social networks has by far outpaced our ability to manage (or even understand) the community, economic, demographic and moral forces that shape user experiences. Online ticket queues, communities of online gamers, online retail malls and checkout sites, Facebook or MySpace communities, web-based town hall discussions, and Second Life destinations are just a few examples of places that users have come to regard as neighborhoods. They are virtual neighborhoods. They begin as attractive destinations and attract both visitors and inhabitants. Some users spend money, and some put down roots in the community. But like many real neighborhoods, virtual neighborhoods all too often turn into frightening, crime-ridden, disease- (or malware-)infested eyesores. Most users are driven away, real commerce is replaced by questionable transactions and billions of dollars of value is destroyed in the process. In blighted inner city neighborhoods you can find a familiar array of bad actors: loan sharks, vagrants, drug dealers, vandals and scam artists. Online neighborhoods fall prey to virtual blight: (1) Bot Blight, where the bad actors use bots and other non-human agents to overwhelm systems that are designed for human beings, (2) Human Blight, where individuals ranging from hackers to sociopaths and organized groups deliberately degrade a virtual neighborhood, (3) Entropy Blight, where abandoned property accumulates dead-end traffic of various kinds. The simple first-generation tools that were deployed to protect online properties have failed – the collapse of Geocities and the recent apparent defeat of Captcha, a technology to let only humans enter the neighborhood, are evidence of that failure. There is a growing realization of how easily bad actors can create the virtual version of urban blight and how ineffective existing approaches to identity, trust and security will be in battling it.

Richard A. DeMillo
Cloud Computing

Cloud computing is an emerging computing and business model where users can gain access to their applications from anywhere through their connected devices. The proliferation of intelligent mobile devices, high speed wireless connectivity, and rich browser-based Web 2.0 interfaces have made this shared network-based cloud computing model possible. Cloud Computing is very much driven by the increasingly unmanageable IT complexity. We will describe the main technology developments that have made this IT simplification possible: namely Virtualization, SOA, and Service/Systems Management. We believe that the Cloud will grow rapidly at the top SaaS layer (including Application services, Business services and People services). The Cloud model of services can improve the way we manage health care, finance, mobile information, and more; and will help us realize the vision of Intelligent Living. The speaker will also share his view on Cloud Computing opportunities for Taiwan.

George Wang
Beyond Search: Web-Scale Business Analytics

We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000’s of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. This application is an exciting challenge that should appeal to and benefit from several research communities, most notably, the database, text analytics and distributed system worlds. E.g., to provide fast answers for such queries, cloud computing techniques need to be incorporated with text analytics, data cleansing, query processing and query refinement methods. However, the envisioned path for OLAP-style query processing over textual web data may take a long time to mature. Two recent developments have the potential to become key components of such an ad-hoc analysis platform: significant improvements in cloud computing query languages and advances in self-supervised information extraction techniques. In this talk, I will give an informative and practical look at the underlying research challenges in supporting "Web-Scale Business Analytics" applications with a focus on its key components and will highlight recent projects.

Alexander Löser

Tutorials

Storyboarding – High-Level Engineering of Web Information Systems

A web Information System (WIS) is an information system that can be accessed via the world-wide-web. We will describe the various aspects of web information systems (WISs) such as purpose, usage, content, functionality, context, presentation. Further we will present three major blocks dealing with strategic modelling of WISs, usage modelling of WISs by means of storyboarding, and semantics and pragmatics of storyboarding. Strategic modelling lays out the plan for the whole WIS without drifting into technical details. It specifies the purpose(s) of the system, and what are criteria for the WIS being successful. Usage modelling emphasises storyboarding, which consists of three interconnected parts: the modelling of the story space, the modelling of the actors, i.e. classes of users, and the modelling of tasks. We will first present the modelling language of storyboarding. Then, we will briefly discuss semantic aspects of storyboarding focusing on an algebraic approach to reason about storyboards emphasising the personalisation with respect to preference rules, and the satisfiability of deontic constraints. Finally we will address pragmatics of storyboarding, the necessary complement devoted to what the storyboard means for its users. The part of pragmatics is concerned with usage analysis by means of life cases, user models and contexts that permit a deeper understand of what users actually understand the system to be used for.

Hui Ma, Klaus-Dieter Schewe, Bernhard Thalheim
Web Queries: From a Web of Data to a Semantic Web

One significant effort towards combining the virtues of Web search, viz. being accessible to untrained users and able to cope with vastly heterogeneous data, with those of database-style Web queries is the development of keyword-based Web query languages. These languages operate essentially in the same setting as XQuery or SPARQL but with an interface for untrained users.

Keyword-based query languages trade some of the precision that languages like XQuery enable by allowing to formulate exactly what data to select and how to process it, for an easier interface accessible to untrained users. The yardstick for these languages becomes an easily accessible interface that does not sacrifice the essential premise of database-style Web queries, that selection and construction are precise enough to fully automate data processing tasks. To ground the discussion of keyword-based query languages, we give a summary of what we perceive as the main contributions of research and development on Web query languages in the past decade. This summary focuses specifically on what sets Web query languages apart from their predecessors for databases.

Further, this tutorial (1) gives an overview over keyword-based query languages for XML and RDF (2) discusses where the existing approaches succeed and what, in our opinion, are the most glaring open issues, and (3) where, beyond keyword-based query languages, we see the need, the challenge, and the opportunities for combining the ease of use of Web search with the virtues of Web queries.

François Bry, Tim Furche, Klara Weiand
Toward a Unified View of Data and Services

The research on data integration and service discovering has involved from the beginning different (not always overlapping) communities. Therefore, data and services are described with different models and different techniques to retrieve data and services have been developed. Nevertheless, from a user perspective, the border between data and services is often not so definite, since data and services provide a complementary vision about the available resources.

In NeP4B (Networked Peers for Business), a project funded by the Italian Ministry of University and Research, we developed a semantic approach for providing a uniform representation of data and services, thus allowing users to obtain sets of data and lists of web-services as query results. The NeP4B idea relies on the creation of a Peer Virtual View (PVV) representing sets of data sources and web services, i.e. an ontological representation of data sources which is mapped to an ontological representation of web services. The PVV is exploited for solving user queries: 1) data results are selected by adopting a GAV approach; 2) services are retrieved by an information retrieval approach applied on service descriptions and by exploiting the mappings on the PVV.

In the tutorial, we introduce: 1) the state of the art of semantic-based data integration and web service discovering systems; 2) the NeP4B architecture.

Sonia Bergamaschi, Andrea Maurino

Web Computing

Maximizing Utility for Content Delivery Clouds

A content delivery cloud, such as MetaCDN, is an integrated overlay that utilizes cloud computing to provide content delivery services to Internet end-users. While it ensures satisfactory user perceived performance, it also aims to improve the traffic activities in its world-wide distributed network and uplift the usefulness of its replicas. To realize this objective, in this paper, we measure the utility of content delivery via MetaCDN, capturing the system-specific perceived benefits. We use this utility measure to devise a request-redirection policy that ensures high performance content delivery. We also quantify a content provider’s benefits from using MetaCDN based on its user perceived performance. We conduct a

proof-of-concept

testbed experiment for MetaCDN to demonstrate the performance of our approach and reveal our observations on the MetaCDN utility and content provider’s benefits from using MetaCDN.

Mukaddim Pathan, James Broberg, Rajkumar Buyya
Aggregation of Document Frequencies in Unstructured P2P Networks

Peer-to-peer (P2P) systems have been recently proposed for providing search and information retrieval facilities over distributed data sources, including web data. Terms and their document frequencies are the main building blocks of retrieval and as such need to be computed, aggregated, and distributed throughout the system. This is a tedious task, as the local view of each peer may not reflect the global document collection, due to skewed document distributions. Moreover, central assembly of the total information is not feasible, due to the prohibitive cost of storage and maintenance, and also because of issues related to digital rights management. In this paper, we propose an efficient approach for aggregating the document frequencies of carefully selected terms based on a hierarchical overlay network. To this end, we examine unsupervised feature selection techniques at the individual peer level, in order to identify only a limited set of the most important terms for aggregation. We provide a theoretical analysis to compute the cost of our approach, and we conduct experiments on two document collections, in order to measure the quality of the aggregated document frequencies.

Robert Neumayer, Christos Doulkeridis, Kjetil Nørvåg
Processes Are Data: A Programming Model for Distributed Applications

Many modern distributed applications employ protocols based on XML messages. Typical architectures for these applications follow an approach where messages are organized in queues, state is stored in DBMS, and application code is written in imperative languages. As a result, much developer productivity and system performance is wasted on handling conversions between the various data models (XML messages, objects, relations), and reliably managing persistent state for application instances. This overhead turns application servers into data management servers instead of process servers. We show how this model can be greatly improved by changing two aspects. Firstly, by using a declarative rule language to describe the processing logic. Secondly, by providing a single, unified data model based on XML messages that covers all kinds of data encountered, including process state. We discuss the resulting design choices for compile-time and run-time systems, and show and experimentally evaluate the performance improvements made possible.

Alexander Böhm, Carl-Christian Kanne

Industrial Session I

OfficeObjects Service Broker – An Intelligent Service Integration Platform

We present a brief description of a service integration platform developed by Rodan Systems as the result of the eGov-Bus project with the use of the proprietary OfficeObjects® information management tool set.

Michał Gajewski, Witold Staniszkis, Jakub Strychowski
PSILOC World Traveler: Overcoming the Challenges of Adopting Wide–Scale Mobile Solutions

The mobile software marketplace today faces several challenges, and companies that aspire to global success in this market must face them holistically or not at all. While many companies are able to market products that prove to be successful in one or two isolated areas, their failure to adopt wide scale solutions means that however innovative their products might be, they are limited in their success by their lack of integration within a simple user experience. PSILOC has shown how innovatively created services may be integrated within an open–platform architecture so that mobile software companies can address all the needs of their increasingly demanding mobile phone customers within a sleek, easy–to–use interface.

Michał Sieczko
ETSI Industry Specification Group on Autonomic Network Engineering for the Self-managing Future Internet (ETSI ISG AFI)

The area of Autonomic/Self-Managing Networks is still faced with problems of the lack of harmonized steps and efforts towards the establishment of common Specifications of the architectures and functionalities for Self-Management within Future Networks such as the envisaged Future Internet. Ideally, the harmonization can now be achieved through a newly established and well-focused Special Working Group in ETSI - a world renowned Telecommunications Standardization body. The SpecialWorking Group is an Industry Specification Group (ISG) called ”Autonomic network engineering for the self-managing Future Internet (AFI) [4]. The AFI aims to serve as a focal point for the development of common Specifications and engineering frameworks that guarantee interoperability of nodes/devices for Self-managing Future Networks.

Ranganai Chaparadza, Laurent Ciavaglia, Michał Wódczak, Chin-Chou Chen, Brian A. Lee, Athanassios Liakopoulos, Anastasios Zafeiropoulos, Estelle Mancini, Ultan Mulligan, Alan Davy, Kevin Quinn, Benoit Radier, Nancy Alonistioti, Apostolos Kousaridas, Panagiotis Demestichas, Kostas Tsagkaris, Martin Vigoureux, Laurent Vreck, Mick Wilson, Latif Ladid

Tagging

Facing Tagging Data Scattering

Web2.0 has brought tagging at the forefront of user practises for organizing and locating resources. Unfortunately, these tagging efforts suffer from a main drawback: lack of interoperability. Such situation hinders tag sharing (e.g. tags introduced at

del.icio.us

to be available at

Flickr

) and, in practice, leads to tagging data to be locked to tagging sites. This work argues that for tagging to reach its full potential,

tag management systems

should be provided that accounts for a common way to handle tags no matter the tagging site (e.g.

del.icio.us, Flickr

) that frontended the tagging. This paper introduces

TAGMAS

(TAG MAnagement System) that offers a global view of

your

tagging data no matter where it is located. By capitalizing on

TAGMAS

, tagging applications can be built in a quicker and robust way. Using measurements and one use case, we demonstrate the practicality and performance of

TAGMAS

.

Oscar Díaz, Jon Iturrioz, Cristóbal Arellano
Clustering of Social Tagging System Users: A Topic and Time Based Approach

Under Social Tagging Systems, a typical Web 2.0 application, users label digital data sources by using freely chosen textual descriptions (tags). Mining tag information reveals the topic-domain of users interests and significantly contributes in a profile construction process. In this paper we propose a clustering framework which groups users according to their preferred topics and the time locality of their tagging activity. Experimental results demonstrate the efficiency of the proposed approach which results in more enriched time-aware users profiles.

Vassiliki Koutsonikola, Athena Vakali, Eirini Giannakidou, Ioannis Kompatsiaris
Spectral Clustering in Social-Tagging Systems

Social tagging is an increasingly popular phenomenon with substantial impact on the way we perceive and understand the Web. For the many Web resources that are not self-descriptive, such as images, tagging is the sole way of associating them with concepts explicitly expressed in text. Consequently, users are encouraged to assign tags to Web resources, and tag recommenders are being developed to stimulate the re-use of existing tags in a consistent way. However, a tag still and inevitably expresses the personal perspective of each user upon the tagged resource. This personal perspective should be taken into account when assessing the similarity of resources with help of tags. In this paper, we focus on similarity-based clustering of tagged items, which can support several applications in social-tagging systems, like information retrieval, providing recommendations, or the establishment of user profiles and the discovery of topics. We show that it is necessary to capture and exploit the

multiple values of similarity

reflected in the tags assigned to the same item by different users. We model the items, the tags on them and the users who assigned the tags in a multigraph structure. To discover clusters of similar items, we extend spectral clustering, an approach successfully used for the clustering of complex data, into a method that captures multiple values of similarity between any two items. Our experiments with two real social-tagging data sets show that our new method is superior to conventional spectral clustering that ignores the existence of multiple values of similarity among the items.

Alexandros Nanopoulos, Hans-Henning Gabriel, Myra Spiliopoulou

Semantics

Semantic Weaving for Context-Aware Web Service Composition

An Aspect-oriented Programming (AOP) based approach is proposed to perform context-aware service composition on the fly. It realises context-aware composition by semantically weaving context into static Web service composition. A context

weaver

is implemented based on the proposed approach. The proposed semantic weaving allows services to be composed in a systematic way with changing context.

Li Li, Dongxi Liu, Athman Bouguettaya
Multi-synchronous Collaborative Semantic Wikis

Semantic wikis have opened an interesting way to mix Web 2.0 advantages with the Semantic Web approach. However, compared to other collaborative tools, wikis do not support all collaborative editing mode such as offline work or multi-synchronous editing. The lack of multi-synchronous supports in wikis is a problematic, especially, when working with semantic wikis. In these systems, it is often important to change multiple pages simultaneous in order to refactor the semantic wiki structure. In this paper, we present a new model of semantic wiki called Multi-Synchronous Semantic Wiki (MS2W). This model extends semantic wikis with multi-synchronous support that allows to create a P2P network of semantic wikis. Semantic wiki pages can be replicated on several semantic servers. The MS2W ensures CCI consistency on these pages relying on the Logoot algorithm.

Charbel Rahhal, Hala Skaf-Molli, Pascal Molli, Stéphane Weiss
Facing the Technological Challenges of Web 2.0: A RIA Model-Driven Engineering Approach

One of the main reasons for the success of Web 2.0 is the improvement in user experience. This improvement is a consequence of the evolution from HTML User Interfaces (UI) to more usable and richer UI. The most popular Web 2.0 applications have selected the Rich Internet Application (RIA) paradigm to achieve this goal. However, the current Web Engineering methods do not provide the expressivity required to produce RIA interfaces. This work presents a RIA Metamodel to deal with the new technological challenges that have arisen with Web 2.0 development. This metamodel supports two main perspectives: 1) the definition of the UI as a combination of widgets from a selected RIA technology; and 2) the specification of the UI interaction as a consequence of the events produced by the user. In order to illustrate how this RIA Metamodel can be used in a Model-driven Engineering (MDE) method, this work also presents the integration of the RIA Metamodel with the OOWS method.

Francisco Valverde, Oscar Pastor

Search I

Intent-Based Categorization of Search Results Using Questions from Web Q&A Corpus

User intent is defined as a user’s information need. Detecting intent in Web search helps users to obtain relevant content, thus improving their satisfaction. We propose a novel approach to instantiating intent by using adaptive categorization producing predicted intent probabilities. For this, we attempt to detect factors by which intent is formed, called intent features, by using a Web Q&A corpus. Our approach was motivated by the observation that questions related to queries are effective for finding intent features. We extract set of categories and their intent features automatically by analyzing questions within Web Q&A corpus, and categorize search results using these features. The advantages of our intent-based categorization are twofold, (1) presenting the most probable intent categories to help users clarify and choose starting points for Web searches, and (2) adapting sets of intent categories for each query. Experimental results show that distilled intent features can efficiently describe intent categories, and search results can be efficiently categorized without any human supervision.

Soungwoong Yoon, Adam Jatowt, Katsumi Tanaka
TermCloud for Enhancing Web Search

We previously proposed a reranking system for Web searches based on user interaction. The system encouraged users to interact with terms in search results or with frequent terms displayed in the tagcloud visualization style. Over 20,000 interaction logs of users were analyzed, and the results showed that more than 70% of users had interacted with the terms in the tagcloud area. Therefore, this visualization style is thought to have great potential in supporting users in their Web searches. This visualization style is referred to as

TermCloud

in this paper. We describe how TermCloud can increase the effectiveness of users’ Web searches, and we propose a technique to generate a more useful TermCloud than the frequency-based TermCloud. Then, we show the usefulness of our method based on the experimental test.

Takehiro Yamamoto, Satoshi Nakamura, Katsumi Tanaka

Visualization

Seeing Past Rivals: Visualizing Evolution of Coordinate Terms over Time

In this paper, we describe an approach for detection and visualization of coordinate term relationships over time and their evolution using temporal data available on the Web. Coordinate terms are terms with the same hypernym and they often represent rival or peer relationships of underlying objects. We have built a system that portrays the evolution of coordinate terms in an easy and intuitive way based on data in an online news archive collection spanning more than 100 years. With the proposed method, it is possible to see the changes in the peer relationships between objects over time together with the context of these relationships. The experimental results proved quite high precision of our method and indicated high usefulness for particular knowledge discovery tasks.

Hiroaki Ohshima, Adam Jatowt, Satoshi Oyama, Katsumi Tanaka
A Novel Visualization Method for Distinction of Web News Sentiment

Recently, an increasing number of news websites have come to provide various featured services. However, effective analysis and presentation for distinction of viewpoints among different news sources are limited. We focus on the sentiment aspect of news reporters’ viewpoints and propose a system called the

Sentiment Map

for distinguishing the sentiment of news articles and visualizing it on a geographical map based on map zoom control. The proposed system provides more detailed sentiments than conventional sentiment analysis which only considers positive and negative emotions. When a user enters one or more query keywords, the sentiment map not only retrieves news articles related to the concerned topic, but also summarizes sentiment tendencies of Web news based on specific geographical scales. Sentiments can be automatically aggregated at different levels corresponding to the change of map scales. Furthermore, we take into account the aspect of time, and show the variation in sentiment over time. Experimental evaluations conducted by a total of 100 individuals show the sentiment extraction accuracy and the visualization effect of the proposed system are good.

Jianwei Zhang, Yukiko Kawai, Tadahiko Kumamoto, Katsumi Tanaka
Visually Lossless HTML Compression

The verbosity of the Hypertext Markup Language (HTML) remains one of its main weaknesses. This problem can be solved with the aid of HTML specialized compression algorithms. In this work, we describe a visually lossless HTML transform that, combined with generally used compression algorithms, allows to attain high compression ratios. Its core is a transform featuring substitution of words in an HTML document using a static English dictionary, effective encoding of dictionary indexes, numbers, and specific patterns.

Visually lossless compression means that the HTML document layout will be modified, but the document displayed in a browser will provide the exact fidelity with the original. The experimental results show that the proposed transform improves the HTML compression efficiency of general purpose compressors on average by 21% in the case of gzip, achieving comparable processing speed. Moreover, we show that the compression ratio of gzip can be improved by up to 32% for the price of higher memory requirements and much slower processing.

Przemysław Skibiński

Search II

Enhancing Web Search by Aggregating Results of Related Web Queries

Currently, commercial search engines have implemented methods to suggest alternative Web queries to users, which helps them specify alternative related queries in pursuit of finding needed Web pages. In this paper, we address the Web search problem on related queries to improve retrieval quality by devising a novel search rank aggregation mechanism. Given an initial query and the suggested related queries, our search system concurrently processes their search result lists from an existing search engine and then forms a single list aggregated by all the retrieved lists. In particular we propose a generic rank aggregation framework which considers not only the number of wins that an item won in a competition, but also the quality of its competitor items in calculating the ranking of Web items. The framework combines the traditional and random walk based rank aggregation methods to produce a more reasonable list to users. Experimental results show that the proposed approach can clearly improve the retrieval quality in a parallel manner over the traditional search strategy that serially returns result lists. Moreover, we also empirically investigate how different rank aggregation methods affect the retrieval performance.

Lin Li, Guandong Xu, Yanchun Zhang, Masaru Kitsuregawa
Extracting Structured Data from Web Pages with Maximum Entropy Segmental Markov Model

Automated techniques can help to extract information from the Web. A new semi-automatic approach based on the maximum entropy segmental Markov model, therefore, is proposed to extract structured data from Web pages. It is motivated by two ideas: modeling sequences embedding structured data instead of their context to reduce the number of training Web pages and preventing the generation of too specific or too general models from the training data. The experimental results show that this approach has better performance than Stalker when only one training Web page is provided.

Susan Mengel, Yaoquin Jing
Blog Ranking Based on Bloggers’ Knowledge Level for Providing Credible Information

With the huge increase of recently popular user-generated content on the Web, searching for credible information has become progressively difficult. In this paper, we focus on blogs, one kind of user-generated content, and propose a credibility-focused blog ranking method based on bloggers’ knowledge level. This method calculates knowledge scores for bloggers and ranks blog entries based on bloggers’ knowledge level. Bloggers’ knowledge level is evaluated based on their usage of domain-specific words in their past blog entries. A blogger is given multiple scores with respect to various topic areas. In our method, blog entries written by knowledgeable bloggers have higher rankings than those written by common bloggers. Additionally, our system can present multiple ranking lists of blog entries from the perspectives of different bloggers’ groups. This allows users to estimate the trustworthiness of blog contents from multiple aspects. We built a prototype of the proposed system, and our experimental evaluation showed that our method could effectively rank bloggers and blog entries.

Shinsuke Nakajima, Jianwei Zhang, Yoichi Inagaki, Tomoaki Kusano, Reyn Nakamoto

Web Services

Multi-source Remote Sensing Images Data Integration and Sharing Using Agent Service

Remote sensing images have been utilized fully in disaster detection and other domains. However, many researchers cannot find and access the appropriate remote sensing images as they needed. In this paper, we proposed an effective approach to integrate and share the image resources over the Web. Firstly, the image metadata are exposed based on Grid services and the standard metadata specification; secondly, Agent service is introduced to discover and invoke the metadata services dynamically; thirdly, researchers can query and browse or locate the remote sensing images through the service interface. We have developed a service-oriented remote sensing images integration platform, which supports the parallel query and browse of multi-source remote sensing images. Moreover, it can provide better availability and extensibility.

Binge Cui, Xin Chen, Pingjian Song, Rongjie Liu
An Integrated Declarative Approach to Web Services Composition and Monitoring

In this paper we propose a constraint based declarative approach for Web services composition and monitoring problem. Our approach allows user to build the

abstract composition

by identifying the participating entities and by providing a set of constraints that mark the boundary of the solution. Different types of constraints have been proposed to handle the composition modeling and monitoring.

Abstract composition

is then used for instantiating the

concrete composition

, which both finds and executes an instantiation respecting constraints, and also handles the process run-time monitoring. When compared to the traditional approaches, our approach is declarative and allows for the same set of constraints to be used for composition modeling and monitoring and thus allows for refining the

abstract composition

as a result of run-time violations, such as service failure or response time delays.

Ehtesham Zahoor, Olivier Perrin, Claude Godart
Formal Identification of Right-Grained Services for Service-Oriented Modeling

Identifying the right-grained services is important to lead the successful service orientation because it has a direct impact on two major goals: the composability of loosely-coupled services, and the reusability of individual services in different contexts. Although the concept of service orientation has been intensively debated in recent years, a unified methodic approach for identifying services has not yet been reached. In this paper, we suggest a formal approach to identify services at the right level of granularity from the business process model. Our approach uses the concept of graph clustering and provides a systematical approach by defining the cost metric as a measure of the interaction costs. To effectively extract service information from the business model, we take activities as the smallest units in service identification and cluster activities with high interaction cost into a task through hierarchical clustering algorithm, so as to reduce the coupling of remote tasks and to increase local task cohesion.

Yukyong Kim, Kyung-Goo Doh

Trust and Uncertainty

Start Trusting Strangers? Bootstrapping and Prediction of Trust

Web-based environments typically span interactions between humans and software services. The management and automatic calculation of trust are among the key challenges of the future service-oriented Web. Trust management systems in large-scale systems, for example, social networks or service-oriented environments determine trust between actors by either collecting manual feedback ratings or by mining their interactions. However, most systems do not support bootstrapping of trust. In this paper we propose techniques and algorithms enabling the prediction of trust even when only few or no ratings have been collected or interactions captured. We introduce the concepts of

mirroring

and

teleportation

of trust facilitating the evolution of cooperation between various actors. We assume a user-centric environment, where actors express their opinions, interests and expertises by selecting and tagging resources. We take this information to construct tagging profiles, whose similarities are utilized to predict potential trust relations. Most existing similarity approaches split the three-dimensional relations between users, resources, and tags, to create and compare general tagging profiles directly. Instead, our algorithms consider (i) the understandings and interests of actors in tailored subsets of resources and (ii) the similarity of resources from a certain actor-group’s point of view.

Florian Skopik, Daniel Schall, Schahram Dustdar
Finding Comparative Facts and Aspects for Judging the Credibility of Uncertain Facts

Users often encounter unreliable information on the Web, but there is no system to check the credibility easily and efficiently. In this paper, we propose a system to search useful information for checking the credibility of uncertain facts. The objective of our system is to help users to efficiently judge the credibility by comparing other facts related to the input uncertain fact without checking a lot of Web pages for comparison. For this purpose, the system collects comparative facts for the input fact and important aspect for comparing them from the Web and estimates the validity of each fact.

Yusuke Yamamoto, Katsumi Tanaka
Query Evaluation on Probabilistic RDF Databases

Over the last few years, RDF has been used as a knowledge representation model in a wide variety of domains. Some domains are full of uncertainty. Thus, it is desired to process and manage probabilistic RDF data. The core operation of queries on an RDF probabilistic database is computing the probability of the result to a query. In this paper, we describe a general framework for supporting SPARQL queries on probabilistic RDF databases. In particular, we consider transitive inference capability for RDF instance data. We show that the

find

operation for an atomic query with the transitive property can be formalized as the problem of computing path expressions on the transitive relation graph and we also propose an approximate algorithm for computing path expressions efficiently. At last, we implement and experimentally evaluate our approach.

Hai Huang, Chengfei Liu

Recommendation and Quality of Service

Recommending Improvements to Web Applications Using Quality-Driven Heuristic Search

Planning out maintenance tasks to increase the quality of Web applications can be difficult for a manager. First, it is hard to evaluate the precise effect of a task on quality. Second, quality improvement will generally be the result of applying a combination of available tasks; identifying the best combination can be complicated. We present a general approach to recommend improvements to Web applications. The approach uses a meta-heuristic algorithm to find the best sequence of changes given a quality model responsible to evaluate the fitness of candidate sequences. This approach was tested using a navigability model on 15 different Web pages. The meta-heuristic recommended the best possible sequence for every tested configuration, while being much more efficient than an exhaustive search with respect to execution time.

Stephane Vaucher, Samuel Boclinville, Houari Sahraoui, Naji Habra
A Web Recommender System for Recommending, Predicting and Personalizing Music Playlists

In this paper, we present a Web recommender system for recommending, predicting and personalizing music playlists based on a user model. We have developed a hybrid similarity matching method that combines collaborative filtering with ontology-based semantic distance measurements. We dynamically generate a personalized music playlist, from a selection of recommended playlists, which comprises the most relevant tracks to the user. Our Web recommender system features three functionalities: (1)

predict

the likability of a user towards a specific music playlist, (2)

recommend

a set of music playlists, and (3)

compose

a new personalized music playlist. Our experimental results will show the efficacy of our hybrid similarity matching approach and the information personalization method.

Zeina Chedrawy, Syed Sibte Raza Abidi
Verification of Composite Services with Temporal Consistency Checking and Temporal Satisfaction Estimation

This paper aims to address the issue of consistency and satisfaction of composite services with the presence of temporal constraints. These constraints may cause conflict between services and affect the estimation over composition requirements. Existing verification approaches have not adequately addressed this issue. Therefore, this paper contributes to the verification method with temporal consistency checking and temporal satisfaction estimation. A set of checking rules and estimation formulae are presented according to workflow patterns and temporal dependencies. The method will lead to three major outcomes; consistent with satisfactory combination, consistent with unsatisfactory combination and inconsistent with unsatisfactory combination.

Azlan Ismail, Jun Yan, Jun Shen

User Interfaces

Adaptive Rich User Interfaces for Human Interaction in Business Processes

In recent years, business process research has primarily focussed on optimization by automation, resulting in modeling and service orchestration concepts implying machine-to-machine communication. New standards for the integration of human participants into such processes have only recently been proposed [1,2]. However, they do not cover user interface development and deployment. There is a lack of concepts for rich business process UIs supporting flexibility, reusability and context-awareness. We address this issue with a concept for building human task presentations from service-oriented UIs. Those

User Interface Services

provide reusable, rich UI components and are selected, configured and exchanged with respect to the current context.

Stefan Pietschmann, Martin Voigt, Klaus Meißner
Personalizing the Interface in Rich Internet Applications

Recently, existing design methodologies targeting traditional Web applications have been extended for Rich Internet Application modeling support. These extended methodologies currently cover the traditionally well-established design concerns, i.e. data and navigation design, and provide additional focus on user interaction and presentation capabilities. However, there is still a lack of design support for more advanced functionality that now is typically offered in state-of-the-art Web applications. One yet unsupported design concern is the personalization of content and presentation to the specific user and his/her context, making use of the extra presentational possibilities offered by RIAs. This article addresses this concern and presents an extension of the RIA design approach OOH4RIA, to include presentation personalization support. We show how to extend the RIA development process to model the required personalization at the correct level of abstraction, and how these specifications can be realized using present RIA technology

Irene Garrigós, Santiago Meliá, Sven Casteleyn
Towards Improving Web Search: A Large-Scale Exploratory Study of Selected Aspects of User Search Behavior

Recently, the Web has made dramatic impact on our lives becoming for many people a main information source. We believe that the continuous study of user needs and their search behavior is a necessary key factor for a technology to be able to keep along with society changes. In this paper we report the results of a large scale online questionnaire conducted in order to investigate the ways in which users search the Web and the kinds of needs they have. We have analyzed the results based on the respondents’ attributes such as age and gender. The findings should be considered as hypotheses for further systematic studies.

Hiroaki Ohshima, Adam Jatowt, Satoshi Oyama, Satoshi Nakamura, Katsumi Tanaka

Web Understanding

An Architecture for Open Cross-Media Annotation Services

The emergence of new media technologies in combination with enhanced information sharing functionality offered by the Web provides new possibilities for cross-media annotations. This in turn raises new challenges in terms of how a true integration across different types of media can be achieved and how we can develop annotation services that are sufficiently flexible and extensible to cater for new document formats as they emerge. We present a general model for cross-media annotation services and describe how it was used to define an architecture that supports extensibility at the data level as well as within authoring and visualisation tools.

Beat Signer, Moira C. Norrie
Video Search by Impression Extracted from Social Annotation

This paper proposes a novel indexing and ranking method for video clips on video sharing Web sites that overcomes some of the problems with conventional systems. These problems include the difficulty of finding target video clips by the emotional impression they make, such as level of happiness, level of sadness, and so on because text summaries of video clips on video sharing Web sites usually do not contain such information. Our system extracts this type of information from comments on the video clips and generates an impression index for searching and ranking. In this work, we present analytical studies of video sharing Web site. Then, we propose an impression ranking method and show the usefulness of this method on the experimental test. In addition, we describe the future direction of this work.

Satoshi Nakamura, Katsumi Tanaka
Single Pattern Generating Heuristics for Pixel Advertisements

Pixel advertisement represents the presentation of small advertisements on a banner. With the Web becoming more important for marketing purposes, pixel advertisement is an interesting development. In this paper, we present a comparison of three heuristic algorithms for generating allocation patterns for pixel advertisements. The algorithms used are the orthogonal algorithm, the left justified algorithm, and the GRASP constructive algorithm. We present the results of an extensive simulation in which we have experimented with the sorting of advertisements and different banner and advertisement sizes. The purpose is to find a pattern generating algorithm that maximizes the revenue of the allocated pixel advertisements on a banner. Results show that the best algorithm for our goal is the orthogonal algorithm. We also present a Web application in which the most suitable algorithm is implemented. This Web application returns an allocation pattern for a set of advertisements provided by the user.

Alex Knoops, Victor Boskamp, Adam Wojciechowski, Flavius Frasincar

Industrial Session II

Generation of Specifications Forms through Statistical Learning for a Universal Services Marketplace

In a few business sectors, there exist marketplace sites that provide the consumer with specifications forms, which the consumer can fill out to learn and compare the service terms of multiple service providers. At HP Labs, we are working towards building a universal marketplace site, i.e., a marketplace site that covers thousands of sectors and hundreds to thousands of providers per sector. We automatically generate the specifications forms for the sectors through a statistical clustering algorithm that utilizes both business directories and web forms from service provider sites.

Kivanc Ozonat
An Urban Planning Web Viewer Based on AJAX

The

Program for Promoting the Urbanism Network

is a Spanish project promoted by

red.es

. The main goal of this project is the systematization of the urban planning of all the municipalities of the country using a single conceptual model that will end with the regional differences. The program has two main goals: first, building a

Transactional Planning Management System

based on a service-oriented architecture (SOA) of spatial services for processing the urban information, and second, providing an environment for publishing the information in the urban planning based on the standards for the creation of Spatial Data Infrastructures (SDIs). In a first phase, a collection of databases for the urban planning of different areas was generated, as well as a collection of services for the exploitation of the databases, and finally, different versions of viewers for the urban planning information. The services isolate the designers of urban data systems from the internal complexities of the data model, whereas the viewers allow final users to access the information in an easy and fast way. We show in this paper the Urban Planning Web Viewer for the Spanish municipality of Abegondo. The main features of the application are the modular architecture based on standard services, and the exclusive use of both AJAX (Asynchronous JavaScript and XML) and DHTML (Dynamic HTML) technologies in order to provide an extensible and very useful application with a high level of accessibility.

Miguel R. Luaces, David Trillo Pérez, J. Ignacio Lamas Fonte, Ana Cerdeira-Pena
Data Center Hosting Services Governance Portal and Google Map-Based Collaborations

In the IT services business, a multi-year enterprise application hosting contract often carries a price tag that is an order of magnitude larger than that of the solution development. For hosting services providers to compete over the revenue stream, the ability to provide rapid application deployment is a critical consideration on top of the price differences. In fact, a data center is tested repeatedly in its responsiveness, as application hosting requires iterations of deployment adjustments due to business condition, IT optimization, security, and compliance reasons. In this paper, we report an enterprise application deployment governance portal, which coordinates service delivery roles, integrates system management tools, and above all keeps the clients involved or at least informed. In the data center operations such as: early engagement, requirement modeling, solution deployment designs, service delivery, steady state management, and close out; this paper illustrates how the Google Map technology can be used in representing both the target deployment architecture and delivery process. The Google map model can then be used in delivery process execution and collaborations. The resulting governance portal has been fully implemented and is in active use for the data center business transformation in IBM.

Jih-Shyr Yih, Yew-Huey Liu

Exploiting Structured Information on the Web

Focused Search in Digital Archives

We present a system description for an archival information system with three different approaches to gain online access to digital archives created in the metadata standard Encoded Archival Description (EAD). We show that an aggregation-based system can be developed on archival data using XML Information Retrieval (XML IR). We describe the different stages and components, such as the indexing of the digital finding aids in an XML database, the subsequent querying and retrieval of information from that database, and the eventual delivery of that information to the users in a contextual interface.

Junte Zhang, Jaap Kamps
Automated Ontology-Driven Metasearch Generation with Metamorph

We present Metamorph, a system and framework for generating vertical deep Web search engines in a knowledge-based way. The approach enables the separation between the roles of a higher skilled ontology engineer and a less skilled service engineer, which adds new web sources in an intuitive, semi-automatic manner using the proven Lixto suite. One part of the framework is the understanding process for complex web search forms, and the generation of an ontological representation of each form and its intrinsic run-time dependencies. Based on these representations, a unified meta form and matchings from the meta form to the individual search forms and vice versa are created, taking into account different form element types, contents and labels. We discuss several aspects of the Metamorph ontology, which focuses especially on the interaction semantics of web forms, and give a short account of our semi-automatic tagging system.

Wolfgang Holzinger, Bernhard Krüpl, Robert Baumgartner
Integrated Environment for Visual Data-Level Mashup Development

The visual creation tools in the mashup frameworks are supposed to be simple and accessible. Yet at the same time there is a need to extend the capabilities and complexity of mashups. Therefore, in practice, the frameworks become increasingly hard to learn for a casual user. In relation to the emerging mashup creation techniques in Semantic Web area, in our work we propose an idea to split the development of mashups into two stages: data-level and service-level. Each managed by a separate, although well integrated, environments.

Adam Westerski

Systems

Concept of Competency Examination System in Virtual Laboratory Environment

In the article authors consider applying the concept of a virtual laboratory to creating intelligent systems of competency examination. Competences make up the base for building qualifications on the basis of transferring theoretical and procedural knowledge. Three types of virtual laboratories are distinguished regarding their purpose. Additionally, for a virtual laboratory working at the level of competences, a procedure for competency examination was proposed.

Przemysław Różewski, Emma Kusztina
Integrating a Usability Model into Model-Driven Web Development Processes

Usability evaluations should start early in the Web development process and occur repeatedly throughout all stages to ensure the quality of the Web application, not just when the product is completed. This paper presents a Web Usability Model, which is aligned with the SQuaRE standard, to evaluate usability at several stages of a Web development process that follows a Model-Driven Development (MDD) approach. The Web Usability Model is generic and must be operationalized into a concrete MDD method by specifying the relationships between the usability attributes of the Usability Model and the modeling primitives of the specific Web development method. To illustrate the feasibility of the approach, we present a case study where the Usability Model has been applied in the evaluation of the models that are produced during the Web application development process.

Adrian Fernandez, Emilio Insfran, Silvia Abrahão
Entry Pairing in Inverted File

This paper proposes to exploit content and usage information to rearrange an inverted index for a full-text IR system. The idea is to merge the entries of two frequently co-occurring terms, either in the collection or in the answered queries, to form a single, paired, entry. Since postings common to paired terms are not replicated, the resulting index is more compact. In addition, queries containing terms that have been paired are answered faster since we can exploit the pre-computed posting intersection. In order to choose which terms have to be paired, we formulate the term pairing problem as a Maximum-Weight Matching Graph problem, and we evaluate in our scenario efficiency and efficacy of both an exact and a heuristic solution. We apply our technique: (

i

) to compact a compressed inverted file built on an actual Web collection of documents, and (

ii

) to increase capacity of an in-memory posting list. Experiments showed that in the first case our approach can improve the compression ratio of up to 7.7%, while we measured a saving from 12% up to 18% in the size of the posting cache.

Hoang Thanh Lam, Raffaele Perego, Nguyen Thoi Minh Quan, Fabrizio Silvestri

Data Mining and Querying

STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching

Results clustering in Web Searching is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. However, the task of deriving single-word or multiple-word names for the clusters (usually referred as

cluster labeling

) is difficult, because they have to be syntactically correct and predictive. Moreover efficiency is an important requirement since results clustering is an online task.

Suffix Tree Clustering (STC)

is a clustering technique where search results (mainly snippets) can be clustered fast (in linear time), incrementally, and each cluster is labeled with a phrase. In this paper we introduce: (a) a variation of the STC, called STC+, with a scoring formula that favors phrases that occur in document titles and differs in the way base clusters are merged, and (b) a novel non merging algorithm called NM-STC that results in hierarchically organized clusters. The comparative user evaluation showed that both STC+ and NM-STC are significantly more preferred than STC, and that NM-STC is about two times faster than STC and STC+.

Stella Kopidaki, Panagiotis Papadakos, Yannis Tzitzikas
Spatio-Temporal-Thematic Analysis of Citizen Sensor Data: Challenges and Experiences

We present work in the

spatio-temporal-thematic

analysis of citizen-sensor observations pertaining to real-world events. Using Twitter as a platform for obtaining crowd-sourced observations, we explore the interplay between the 3 dimensions in extracting insightful summaries of observations. We present our experiences in building a web mashup application,

Twitris

[1] that also facilitates the spatio-temporal-thematic exploration of

social signals

underlying events.

Meenakshi Nagarajan, Karthik Gomadam, Amit P. Sheth, Ajith Ranabahu, Raghava Mutharaju, Ashutosh Jadhav
Visual Mining of Web Logs with DataTube2

We present in this paper a new method for the visual and interactive exploration of Web sites logs. Web usage data is mapped onto a 3D tube which axis represents time and where each facet corresponds to the hits of a given page and for a given time interval. A rearrangement clustering algorithm is used to create groups among pages. Several interactions have been implemented within this visualization such as the possibility to add annotations or the use of a virtual reality equipment. We present results for two Web sites (1148 pages over 491 days, and 107 pages over 625 days). We highlight the actual limits of our system (9463 pages over 153 days) and show that it outperforms similar existing approaches.

Florian Sureau, Frederic Plantard, Fatma Bouali, Gilles Venturini

Querying and Workflow

Keys in XML: Capturing Identification and Uniqueness

In this article a new type of key constraint in XML, called an

XKey

, is proposed. The motivation for an XKey is based on the observation that existing approaches do not always capture the fundamental properties of a key, namely identification and uniqueness, and it is shown that an XKey always has these properties. It is also shown that an XKey has the desirable property of extending the notion of a relational key.

Michael Karlinger, Millist Vincent, Michael Schrefl
Query Expansion Based on Query Log and Small World Characteristic

Automatic query expansion is an effective way to solve the word mismatching and short query problems. This paper presents a novel approach to Expand Queries Based on User log and Small world characteristic of the document (QEBUS). When the query is submitted, the synonymic concept of the query is gotten by searching a synonymic concept dictionary. Then the query log is explored and the key words are extracted from the user clicked documents based on small world network (SWN) characteristic. By analyzing the semantic network of the document based on SWN and exploring the correlations between the key words and the queries based on mutual information, high-quality expansion terms can be gotten. The experiment results show that our technique outperforms some traditional query expansion methods significantly.

Yujuan Cao, Xueping Peng, Zhao Kun, Zhendong Niu, Gx Xu, Weiqiang Wang
E-Biology Workflows with Calvin

Web portals enable sharing, execution and monitoring of scientific workflows, but usually depend on external development systems, with notations, which strive to support general workflows, but are still too complex for every-day use by biologists. The distinction between web-based and non-web based tools is likely to further irritate users. We extend our work on collaborative workflow design, by introducing a web-based scientific workflow system, that enables easy-to-use semantic service composition with a domain specific workflow notation.

Markus Held, Wolfgang Blochinger, Moritz Werning

Architecture

Security Policy Definition Framework for SOA-Based Systems

This paper presents an extended architecture of a policy definition framework fine-tuned for service-oriented environments conforming to the SOA distributed processing paradigm. We establish key requirements for such a framework, and use these to confront existing distributed policy frameworks. We also define a policy language destined to fulfill all recognized requirements and give a brief overview of its syntax.

Bartosz Brodecki, Piotr Sasak, Michał Szychowiak
Engineering Accessibility in Web Content Management System Environments

Law in most countries around the world enforces accessibility requirements in websites, mainly the ones related with public administration. Evaluating accessibility is a long and laborious process that requires manual evaluation. In Web 2.0 environments, the great amount of data generated by users makes necessary further effort in order to validate web content accessibility. This paper introduces an accessibility evaluation methodology based on web content accessibility analysis and the study of web content management by users in Web Content Management System (CMS) environments. The main aim of proposed approach is to optimize the accessibility evaluation process by minimizing the effort it takes to achieve a certain accessibility level. Proposed methodological approach is used as the basis for a generic framework, which is intended to engineer accessibility in all kind of CMS environments. Proposed framework also suggests corrective accessibility maintenance activities for webmasters who are interested in the improvement of the accessibility. The paper how a prototype developed following the framework works in a concrete CMS environment.

Juan Miguel López, Afra Pascual, Antoni Granollers
Backmatter
Metadata
Title
Web Information Systems Engineering - WISE 2009
Editors
Gottfried Vossen
Darrell D. E. Long
Jeffrey Xu Yu
Copyright Year
2009
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-04409-0
Print ISBN
978-3-642-04408-3
DOI
https://doi.org/10.1007/978-3-642-04409-0