Keynotes

Providing Scalable Database Services on the Cloud

The Cloud is fast gaining popularity as a platform for deploying Software as a Service (SaaS) applications. In principle, the Cloud provides unlimited compute resources, enabling deployed services to scale seamlessly. Moreover, the pay-as-you-go model in the Cloud reduces the maintenance overhead of the applications. Given the advantages of the Cloud, it is attractive to migrate existing software to this new platform. However, challenges remain as most software applications need to be redesigned to embrace the Cloud.

In this paper, we present an overview of our current on-going work in developing epiC – an elastic and efficient power-aware data-intensive Cloud system. We discuss the design issues and the implementation of epiC’s storage system and processing engine. The storage system and the processing engine are loosely coupled, and have been designed to handle two types of workload simultaneously, namely data-intensive analytical jobs and online transactions (commonly referred as OLAP and OLTP respectively). The processing of large-scale analytical jobs in epiC adopts a phase-based processing strategy, which provides a fine-grained fault tolerance, while the processing of queries adopts indexing and filter-and-refine strategies.

Chun Chen, Gang Chen, Dawei Jiang, Beng Chin Ooi, Hoang Tam Vo, Sai Wu, Quanqing Xu

Search and Social Integration

Search and Social have been widely considered to be two separate applications. Indeed, most people use search engines and visit social sites to conduct vastly different activities. This talk presents opportunities where search and social can be integrated synergistically. For instance, on the one hand, a search engine can mine search history data to facilitate uses to engage in social activities. On the other hand, user activities at social sites can provide information for search engines to improve personalized targeting. This talk uses Confucius, a Q&A system which Google develops and has launched in more than 60 countries, to illustrate how computer algorithms can assist synergistic integration between search and social. Algorithmic issues in data mining, information ranking, and system scalability are discussed.

Edward Y. Chang

Elements of a Spatial Web

Driven by factors such as the increasingly mobile use of the web and the proliferation of geo-positioning technologies, the web is rapidly acquiring a spatial aspect. Specifically, content and users are being geo-tagged, and services are being developed that exploit these tags. The research community is hard at work inventing means of efficiently supporting new spatial query functionality.

Points of interest with a web presence, called spatial web objects, have a location as well as a textual description. Spatio-textual queries return such objects that are near a location argument and are relevant to a text argument. An important element in enabling such queries is to be able to rank spatial web objects. Another is to be able to determine the relevance of an object to a query. Yet another is to enable the efficient processing of such queries. The talk covers recent results on spatial web object ranking and spatio-textual querying obtained by the speaker and his colleagues.

Christian S. Jensen

The Ubiquitous DBMS

Recent widespread use of mobile technologies and advancement in computing power prompted strong needs of database systems that can be used in small devices such as sensors, cellular phones, PDA, ultra PCs, and navigators. We call database systems that are customizable from small-scale applications for small devices to large-scale applications such as large-scale search engines ubiquitous database management systems (UDBMSs). In this talk, we first review requirements of UDBMSs. The requirements we identified include selective convergence (or “devicetization”), flash-optimized storage systems, data synchronization, supportability of unstructured/semi-structured data, and complex database operations. We then review existing systems and research prototypes. We first review the functionality of UDBMSs including the footprint size, support of standard SQL, supported data types, transactions, concurrency control, indexing, and recovery. We then review the supportability of requirements by those UDBMSs surveyed. We highlight ubiquitous features of a family of Odysseus systems that have been under development at KAIST for over 20 years. Functionalities of Odysseus can be “devicetized” or customized depending on the device types and applications as in Odysseus/Mobile for small devices, Odysseus/XML for unstructured/semistructured data, Odysseus/GIS for map data, and Odysseus/IR for large-scale search engines. We finally present research topics that are related to the UDBMSs.

Kyu-Young Whang

Web Service

Building Web Services Middleware with Predictable Service Execution

This paper presents a set of guidelines, algorithms and techniques that enable web services middleware to achieve predictable execution times. Existing web service middleware execute requests in a

best-effort

manner. While this allows them to achieve a higher throughput, it results in highly unpredictable execution times, rendering them unsuitable for applications that require predictability in execution. The guidelines, algorithms and techniques presented are generic in nature and can be used, to enhance existing SOAP engines and application servers, or when newly being built. The proposed algorithms schedules requests for execution explicitly based on their deadlines and select requests for execution based on laxity. This ensures a high variance in laxities of the requests selected, and enables requests to be scheduled together by phasing out execution. These techniques need to be supported by specialised development platforms and operating systems that enable increased control over the execution of threads and high precision operations. Real-life implementation of these techniques on a single server and a cluster hosting web services are presented as a case study and with the resultant predictability of execution, they achieve more than 90% of the deadlines, compared to less than 10%, without these enhancements.

Vidura Gamini Abhaya, Zahir Tari, Peter Bertok

Event Driven Monitoring for Service Composition Infrastructures

We present an event-based monitoring approach for service composition infrastructures. While existing approaches mostly monitor these infrastructures in isolation, we provide a holistic monitoring approach by leveraging Complex Event Processing (CEP) techniques. The goal is to avoid fragmentation of monitoring data across different subsystems in large enterprise environments by connecting various event producers. They provide monitoring data that might be relevant for composite service monitoring. Event queries over monitoring data allow to correlate different monitoring data to achieve more expressiveness. The proposed system has been implemented for a WS-BPEL composition infrastructure and the evaluation demonstrates the low overhead and feasibility of the system.

Oliver Moser, Florian Rosenberg, Schahram Dustdar

On Identifying and Reducing Irrelevant Information in Service Composition and Execution

The increasing availability of massive information on the Web causes the need for information aggregation by filtering and ranking according to user’s goals. In the last years both industrial and academic researchers have investigated the way in which quality of services can be described, matched, composed and monitored for service selection and composition. However, very few of them have considered the problem of evaluating and certifying the quality of the provided service information to reduce irrelevant information for service consumers, which is crucial to improve the efficiency and correctness of service composition and execution. This paper discusses several problems due to the lack of appropriate way to manage quality and context in service composition and execution, and proposes a research roadmap for reducing irrelevant service information based on context and quality aspects. We present a novel solution for dealing with irrelevant information about Web services by developing information quality metrics and by discussing experimental evaluations.

Hong-Linh Truong, Marco Comerio, Andrea Maurino, Schahram Dustdar, Flavio De Paoli, Luca Panziera

Propagation of Data Protection Requirements in Multi-stakeholder Web Services Systems

Protecting data in multi-stakeholder Web Services systems is a challenge. Using current Web Services security standards, data might be transmitted under-protected or blindly over-protected when the receivers of the data are not known directly to the sender. This paper presents an approach to aiding collaborative partner services to properly protect each other’s data. Each partner derives an adequate protection mechanism for each message it sends based on those of the relevant messages it receives. Our approach improves the message handling mechanisms of Web Services engines to allow for dynamic data protection requirements derivation. A prototype is used to validate the approach and to show the performance gain.

Tan Phan, Jun Han, Garth Heward, Steve Versteeg

Social Networks

Refining Graph Partitioning for Social Network Clustering

Graph partitioning is a traditional problem with many applications and a number of high-quality algorithms have been developed. Recently, demand for social network analysis arouses the new research interest on graph clustering. Social networks differ from conventional graphs in that they exhibit some key properties which are largely neglected in popular partitioning algorithms. In this paper, we propose a novel framework for finding clusters in real social networks. The framework consists of several key features. Firstly, we define a new metric which measures the small world strength between two vertices. Secondly, we design a strategy using this metric to greedily, yet effectively, refine existing partitioning algorithms for common objective functions. We conduct an extensive performance study. The empirical results clearly show that the proposed framework significantly improve the results of state-of-the-art methods.

Tieyun Qian, Yang Yang, Shuo Wang

Fast Detection of Size-Constrained Communities in Large Networks

The community detection in networks is a prominent task in the graph data mining, because of the rapid emergence of the graph data; e.g., information networks or social networks. In this paper, we propose a new algorithm for detecting communities in networks. Our approach differs from others in the ability of constraining the size of communities being generated, a property important for a class of applications. In addition, the algorithm is greedy in nature and belongs to a small family of community detection algorithms with the pseudo-linear time complexity, making it applicable also to large networks. The algorithm is able to detect small-sized clusters independently of the network size. It can be viewed as complementary approach to methods optimizing modularity, which tend to increase the size of generated communities with the increase of the network size. Extensive evaluation of the algorithm on synthetic benchmark graphs for community detection showed that the proposed approach is very competitive with state-of-the-art methods, outperforming other approaches in some of the settings.

Marek Ciglan, Kjetil Nørvåg

Evolutionary Taxonomy Construction from Dynamic Tag Space

Collaborative tagging allows users to tag online resources. We refer to the large database of tags and their relationships as a tag space. In a tag space, the popularity and correlation amongst tags capture the current social interests, and taxonomy is a useful way to organize these tags. As tags change over time, it is imperative to incorporate the temporal tag evolution into the taxonomies. In this paper, we formalize the problem of evolutionary taxonomy generation over a large database of tags. The proposed evolutionary taxonomy framework consists of two key features. Firstly, we develop a novel context-aware edge selection algorithm for taxonomy extraction. Secondly, we propose several algorithms for evolutionary taxonomy fusion. We conduct an extensive performance study using a very large real-life dataset (i.e., Del.ici.ous). The empirical results clearly show that our approach is effective and efficient.

Bin Cui, Junjie Yao, Gao Cong, Yuxin Huang

Co-clustering for Weblogs in Semantic Space

Web clustering is an approach for aggregating web objects into various groups according to underlying relationships among them. Finding co-clusters of web objects in semantic space is an interesting topic in the context of web usage mining, which is able to capture the underlying user navigational interest and content preference simultaneously. In this paper we will present a novel web co-clustering algorithm named Co-Clustering in Semantic space (COCS) to simultaneously partition web users and pages via a latent semantic analysis approach. In COCS, we first, train the latent semantic space of weblog data by using Probabilistic Latent Semantic Analysis (PLSA) model, and then, project all weblog data objects into this semantic space with probability distribution to capture the relationship among web pages and web users, at last, propose a clustering algorithm to generate the co-cluster corresponding to each semantic factor in the latent semantic space via probability inference. The proposed approach is evaluated by experiments performed on real datasets in terms of precision and recall metrics. Experimental results have demonstrated the proposed method can effectively reveal the co-aggregates of web users and pages which are closely related.

Yu Zong, Guandong Xu, Peter Dolog, Yanchun Zhang, Renjin Liu

Web Data Mining

A Linear-Chain CRF-Based Learning Approach for Web Opinion Mining

The task of opinion mining from product reviews is to extract the product entities and determine whether the opinions on the entities are positive, negative or neutral. Reasonable performance on this task has been achieved by employing rule-based, statistical approaches or generative learning models such as hidden Markov model (HMMs). In this paper, we proposed a discriminative model using linear-chain Conditional Random Field (CRFs) for opinion mining. CRFs can naturally incorporate arbitrary, non-independent features of the input without making conditional independence assumptions among the features. This can be particularly important for opinion mining on product reviews. We evaluated our approach base on three criteria: recall, precision and F-score for extracted entities, opinions and their polarities. Compared to other methods, our approach was proven more effective for accomplishing opinion mining tasks.

Luole Qi, Li Chen

An Unsupervised Sentiment Classifier on Summarized or Full Reviews

These days web users searching for opinions expressed by others on a particular product or service

PS

can turn to review repositories, such as Epinions.com or Imdb.com. While these repositories often provide a high quantity of reviews on

PS

, browsing through archived reviews to locate different opinions expressed on

PS

is a time-consuming and tedious task, and in most cases, a very labor-intensive process. To simplify the task of identifying reviews expressing positive, negative, and neutral opinions on

PS

, we introduce a simple, yet effective sentiment classifier, denoted

SentiClass

, which categorizes reviews on

PS

using the semantic, syntactic, and sentiment content of the reviews. To speed up the classification process,

SentiClass

summarizes each review to be classified using

eSummar

, a single-document, extractive, sentiment summarizer proposed in this paper, based on various sentence scores and anaphora resolution.

SentiClass

(

eSummar

, respectively) is domain and structure independent and does not require any training for performing the classification (summarization, respectively) task. Empirical studies conducted on two widely-used datasets, Movie Reviews and Game Reviews, in addition to a collection of Epinions.com reviews, show that

SentiClass

(i) is highly accurate in classifying summarized or full reviews and (ii) outperforms well-known classifiers in categorizing reviews.

Maria Soledad Pera, Rani Qumsiyeh, Yiu-Kai Ng

Neighborhood-Restricted Mining and Weighted Application of Association Rules for Recommenders

Association rule mining algorithms such as Apriori were originally developed to automatically detect patterns in sales transactions and were later on also successfully applied to build collaborative filtering recommender systems (RS). Such rule mining-based RS not only share the advantages of other model-based systems such as scalability or robustness against different attack models, but also have the advantages that their recommendations are based on a set of comprehensible rules. In recent years, several improvements to the original Apriori rule mining scheme have been proposed that, for example, address the problem of finding rules for rare items. In this paper, we first evaluate the accuracy of predictions when using the recent IMSApriori algorithm that relies on multiple minimum-support values instead of one global threshold. In addition, we propose a new recommendation method that determines personalized rule sets for each user based on his neighborhood using IMSApriori and at recommendation time combines these personalized rule sets with the neighbors’ rule sets to generate item proposals. The evaluation of the new method on common collaborative filtering data sets shows that our method outperforms both the IMSApriori recommender as well as a nearest-neighbor baseline method. The observed improvements in predictive accuracy are particularly strong for sparse data sets.

Fatih Gedikli, Dietmar Jannach

Semantically Enriched Event Based Model for Web Usage Mining

With the increasing use of dynamic page generation, asynchronous page loading (AJAX) and rich user interaction in the Web, it is possible to capture more information for web usage analysis. While these advances seem a great opportunity to collect more information about web user, the complexity of the usage data also increases. As a result, traditional page-view based web usage mining methods have become insufficient to fully understand web usage behavior. In order to solve the problems with current approaches our framework incorporates semantic knowledge in the usage mining process and produces semantic event patterns from web usage logs. In order to model web usage behavior at a more abstract level, we define the concept of semantic events, event based sessions and frequent event patterns.

Enis Söztutar, Ismail H. Toroslu, Murat Ali Bayir

Keyword Search

Effective and Efficient Keyword Query Interpretation Using a Hybrid Graph

Empowering users to access RDF data using keywords can relieve them from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In recent years, translating keywords into SPARQL queries has been widely studied. Approaches relying on the original RDF graph (

instance-based approaches

) usually generate precise query interpretations at the cost of a long processing time while those relying on the summary graph extracted from RDF data (

schema-based approaches

) significantly speed up query interpretation disregarding the loss of accuracy. In this paper, we propose a novel approach based on a hybrid graph, for the trade-off between interpretation accuracy and efficiency. The hybrid graph can preserve most of the connectivity information of the corresponding instance graph in a small size. We conduct experiments on three widely-used data sets of different sizes. The results show that our approach can achieve significant efficiency improvement with a limited accuracy drop compared with instance-based approaches, and meanwhile, can achieve promising accuracy gain at an affordable time cost compared with schema-based approaches.

Junquan Chen, Kaifeng Xu, Haofen Wang, Wei Jin, Yong Yu

From Keywords to Queries: Discovering the User’s Intended Meaning

Regarding web searches, users have become used to keyword-based search interfaces due to their ease of use. However, this implies a semantic gap between the user’s information need and the input of search engines, as keywords are a simplification of the real user query. Thus, the same set of keywords can be used to search different information. Besides, retrieval approaches based only on syntactic matches with user keywords are not accurate enough when users look for information not so popular on the Web. So, there is a growing interest in developing semantic search engines that overcome these limitations.

In this paper, we focus on the front-end of semantic search systems and propose an approach to translate a list of user keywords into an unambiguous query, expressed in a formal language, that represents the

exact

semantics intended by the user. We aim at not sacrificing any possible interpretation while avoiding generating semantically equivalent queries. To do so, we apply several semantic techniques that consider the properties of the operators and the semantics behind the keywords. Moreover, our approach also allows us to present the queries to the user in a compact representation. Experimental results show the feasibility of our approach and its effectiveness in facilitating the users to express their intended query.

Carlos Bobed, Raquel Trillo, Eduardo Mena, Sergio Ilarri

Efficient Interactive Smart Keyword Search

Traditional information systems usually return few answers if a user submits an incomplete query. Users often feel “left in the dark” when they have limited knowledge about the underlying data. They have to use a try-and-see approach to modify queries and find answers. In this paper we propose a novel approach to keyword search which can provide predicted keywords when a user submits a few characters of the underlying data in order. We study research challenges in this framework for large amounts of data. Since each keystroke of the user could invoke a query on the backend, we need efficient algorithms to process each query within milliseconds. We develop an incremental-search algorithm using previously computed and cached results in order to achieve an interactive speed. Some experiments have been conducted to prove the practicality of this new computing paradigm.

Shijun Li, Wei Yu, Xiaoyan Gu, Huifu Jiang, Chuanyun Fang

Relevant Answers for XML Keyword Search: A Skyline Approach

Identifying relevant results is a key task in XML keyword search (XKS). Although many approaches have been proposed for this task, effectively identifying results for XKS is still an open problem. In this paper, we propose a novel approach for identifying relevant results for XKS by adopting the concept of Mutual Information and skyline semantics. Specifically, we introduce a measurement to effectively quantify the relevance of a candidate by using the concept of Mutual Information and provide an effective mechanism to identify the most relevant results amongst a large number of candidates by using skyline semantics. Extensive experimental studies show that in overall our approach is more effective than existing approaches and can identify relevant results and top k results in acceptable computational costs.

Khanh Nguyen, Jinli Cao

Web Search I

A Children-Oriented Re-ranking Method for Web Search Engines

Due to the explosive growth of the Internet technology, children commonly search information using a Web search engine for their homework and satisfy their curiosity. However, there are few Web search engines considering children’s inherent characteristics, e.g., children prefer to view images on a Web page rather than difficult texts. Therefore, general search results are neither friendly nor satisfactory to children. In this paper, to support children to obtain suitable information for them, we propose a method to re-rank a general search engine’s ranking according to the children-friendly score. Our method determines the score based on the structure of a Web page and its text. We conduct an experiment to verify the re-ranked results match children’s preferences. As a ground-truth, we chose 300 Web pages and asked 34 elementary school students whether these Web pages are preferable for them. The result shows that our method can re-rank children-friendly pages highly.

Mayu Iwata, Yuki Arase, Takahiro Hara, Shojiro Nishio

TURank: Twitter User Ranking Based on User-Tweet Graph Analysis

In this paper, we address the problem of finding authoritative users in a micro-blogging service, Twitter, which is one of the most popular micro-blogging services [1]. Twitter has been gaining a public attention as a new type of information resource, because an enormous number of users transmit diverse information in real time. In particular, authoritative users who frequently submit useful information are considered to play an important role, because useful information is disseminated quickly and widely. To identify authoritative users, it is important to consider actual information flow in Twitter. However, existing approaches only deal with relationships among users. In this paper, we propose TURank (Twitter User Rank), which is an algorithm for evaluating users’ authority scores in Twitter based on link analysis. In TURank, users and tweets are represented in a

user-tweet graph

which models information flow, and ObjectRank is applied to evaluate users’ authority scores. Experimental results show that the proposed algorithm outperforms existing algorithms.

Yuto Yamaguchi, Tsubasa Takahashi, Toshiyuki Amagasa, Hiroyuki Kitagawa

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries

In this paper, we propose a method for identifying and ranking possible categories of any user query based on the meanings and common usages of the terms and phrases within the query. Our solution utilizes WordNet and Wikipedia to recognize phrases and to determine the basic meanings and usages of each term or phrase in a query. The categories are ranked based on their likelihood in capturing the query’s intention. Experimental results show that our method can achieve high accuracy.

Reza Taghizadeh Hemayati, Weiyi Meng, Clement Yu

Best-Effort Refresh Strategies for Content-Based RSS Feed Aggregation

During the past several years RSS-based content syndication has become a standard technique for efficiently and timely disseminating information on the web. From a data processing perspective RSS feeds are standard XML resources which are periodically refreshed by feed aggregators for generating continuous streams of items. In this article, we study the problem of information loss in the context of a content-based feed aggregation system and we propose a new best-effort refresh strategy for RSS feeds under limited bandwidth. This strategy is evaluated experimentally and compared to other state-of-the-art crawling strategies for web pages.

Roxana Horincar, Bernd Amann, Thierry Artières

Mashup-Aware Corporate Portals

Unlike other Web applications, corporate portals reckon to provide an integration space for corporate services. Mashups contribute to this goal by bringing a relevant customization technique whereby portal users can supplement portal services with their own data needs. The challenge is to find a balance between portal reliability and mashup freedom. Our approach is to split responsibilities between service providers and portal users. Providers decide on how services can be mashuped, portal users determine the supplemented content, and finally, the portal engine mediates between the two. This permits portal services to be reliably customized through user mashups. The approach is realized for

Liferay

as the portal engine, portlets as the realization of portal services, and XBL as the integration technology.

Sandy Pérez, Oscar Díaz

Web Data Modeling

When Conceptual Model Meets Grammar: A Formal Approach to Semi-structured Data Modeling

Currently, XML is a standard for information exchange. An important task in XML management is designing particular XML formats suitable for particular kinds of information exchange. There exist two kinds of approaches to this problem. Firstly, there exist XML schema languages and their formalization – regular tree grammars. Secondly, there are approaches based on conceptual modeling and automatic derivation of an XML schema from a conceptual schema.

In this paper, we provide a unified formalism for both kind of approaches. It is based on formal specification of XML schemas, conceptual schemas, and mappings between both kinds of schemas. The formalism gives necessary conditions on the mappings. The mapping may then be applied in practice not only for unified process of designing XML schemas on both levels, i.e. conceptual and grammatical, but also for integration and evolution of XML schemas.

Martin Nečaský, Irena Mlýnková

Crowdsourced Web Augmentation: A Security Model

Web augmentation alters the rendering of

existing

Web applications at the back of these applications. Changing the layout, adding/removing content or providing additional hyperlinks/

widgets

are examples of Web augmentation that account for a more personalized user experience.

Crowdsourced

Web augmentation considers end users not only the beneficiaries but also the contributors of augmentation scripts. The fundamental problem with so augmented Web applications is that code from numerous and possibly untrusted users are placed into the same security domain, hence, raising security and integrity concerns. Current solutions either coexist with the danger (e.g.

Greasemonkey,

where scripts work on the same security domain that the hosting application) or limit augmentation possibilities (e.g.

virtual iframes

in

Google’s Caja,

where the widget is prevented from accessing the application space). This work introduces

Modding Interface

s: application-specific interfaces that regulate inflow and outflow communication between the

hosting

code

and the

user

code

. The paper shows how the combined use of sandboxed

iframes

and “modding-interface”

HTML5 channels

ensures application integrity while permitting controlled augmentation on the hosting application.

Cristóbal Arellano, Oscar Díaz, Jon Iturrioz

Design of Negotiation Agents Based on Behavior Models

Despite the widespread adoption of e-commerce and online purchasing by consumers over the last two decades, automated software agents that can negotiate the issues of an e-commerce transaction with consumers still do not exist. A major challenge in designing automated agents lies in the ability to predict the consumer’s behavior adequately throughout the negotiation process. We employ switching linear dynamical systems (SLDS) within a minimum description length framework to predict the consumer’s behavior. Based on the SLDS prediction model, we design software agents that negotiate e-commerce transactions with consumers on behalf of online merchants. We evaluate the agents through simulations of typical negotiation behavior models discussed in the negotiation literature and actual buyer behavior from an agent-human negotiation experiment.

Kivanc Ozonat, Sharad Singhal

High Availability Data Model for P2P Storage Network

With the goal to provide high data availability, replicas of data will be distributed based on the idea of threshold, meaning that data service is guaranteed to be available so long as any k out of n peers are online. The key distribution algorithm of the model as well as its scalability, management, and other availability-related factors are presented and analyzed.

BangYu Wu, Chi-Hung Chi, Cong Liu, ZhiHeng Xie, Chen Ding

Recommender Systems

Modeling Multiple Users’ Purchase over a Single Account for Collaborative Filtering

We propose a probabilistic topic model for enhancing recommender systems to handle multiple users that share a single account. In several web services, since multiple individuals may share one account (e.g. a family), individual preferences cannot be estimated from a simple perusal of the purchase history of the account, thus it is difficult to accurately recommend items to those who share an account. We tackle this problem by assuming latent users sharing an account and establish a model by extending Probabilistic Latent Semantic Analysis (PLSA). Experiments on real log datasets from online movie services and artificial datasets created from these real datasets by combining the purchase histories of two accounts demonstrate high prediction accuracy of users and higher recommendation accuracy than conventional methods.

Yutaka Kabutoya, Tomoharu Iwata, Ko Fujimura

Interaction-Based Collaborative Filtering Methods for Recommendation in Online Dating

We consider the problem of developing a recommender system for suggesting suitable matches in an online dating web site. The main problem to be solved is that matches must be highly personalized. Moreover, in contrast to typical product recommender systems, it is unhelpful to recommend popular items: matches must be extremely specific to the tastes and interests of the user, but it is difficult to generate such matches because of the two way nature of the interactions (user initiated contacts may be rejected by the recipient). In this paper, we show that collaborative filtering based on interactions between users is a viable approach in this domain. We propose a number of new methods and metrics to measure and predict potential improvement in user interaction success, which may lead to increased user satisfaction with the dating site. We use these metrics to rigorously evaluate the proposed methods on historical data collected from a commercial online dating web site. The evaluation showed that, had users been able to follow the top 20 recommendations of our best method, their success rate would have improved by a factor of around 2.3.

Alfred Krzywicki, Wayne Wobcke, Xiongcai Cai, Ashesh Mahidadia, Michael Bain, Paul Compton, Yang Sok Kim

Developing Trust Networks Based on User Tagging Information for Recommendation Making

Recommender systems are one of the recent inventions to deal with ever growing information overload. Collaborative filtering seems to be the most popular technique in recommender systems. With sufficient background information of item ratings, its performance is promising enough. But research shows that it performs very poor in a cold start situation where previous rating data is sparse. As an alternative, trust can be used for neighbor formation to generate automated recommendation. User assigned explicit trust rating such as how much they trust each other is used for this purpose. However, reliable explicit trust data is not always available. In this paper we propose a new method of developing trust networks based on user’s interest similarity in the absence of explicit trust data. To identify the interest similarity, we have used user’s personalized tagging information. This trust network can be used to find the neighbors to make automated recommendations. Our experiment result shows that the proposed trust based method outperforms the traditional collaborative filtering approach which uses users rating data. Its performance improves even further when we utilize trust propagation techniques to broaden the range of neighborhood.

Touhid Bhuiyan, Yue Xu, Audun Jøsang, Huizhi Liang, Clive Cox

Towards Three-Stage Recommender Support for Online Consumers: Implications from a User Study

In this paper, a three-stage recommender support was implied from a user study. The purpose of the user study was to understand how to best utilize different types of social information (e.g., product popularity, user reviews) for facilitating online consumers’ decision-making process in the e-commerce environment. Through both of in-depth tracking users’ objective behavior and qualitative interviewing their reflective thoughts, we have not only refined a traditional two-stage decision process into a more precise three-stage process, but also identified at each stage what information users are inclined to seek for. Based on the study’s results, suggestions were made to related recommender systems about their practical roles in the three-stage framework and how they can more effectively support users’ information needs.

Li Chen

RDF and Web Data Processing

Query Relaxation for Star Queries on RDF

Query relaxation is an important problem for querying RDF data flexibly. The previous work mainly uses ontology information for relaxing user queries. The ranking models proposed, however, are either non-quantifiable or imprecise. Furthermore, the recommended relaxed queries may return no results. In this paper, we aim to solve these problems by proposing a new ranking model. The model ranks the relaxed queries according to their similarities to the original user query. The similarity of a relaxed query to the original query is measured based on the difference of their estimated results. To compute similarity values for star queries efficiently and precisely, Bayesian networks are employed to estimate the result numbers of relaxed queries. An algorithm is also proposed for answering top-k queries. At last experiments validate the effectiveness of our method.

Hai Huang, Chengfei Liu

Efficient and Adaptable Query Workload-Aware Management for RDF Data

This paper presents a

flexible

and

adaptable

approach for achieving efficient and scalable management of RDF using relational databases. The main motivation behind our approach is that several benchmarking studies have shown that each RDF dataset requires a tailored table schema in order to achieve efficient performance during query processing. We present a

two-phase

approach for designing efficient tailored but flexible storage solution for RDF data based on its query workload, namely: 1) a workload-aware

vertical partitioning

phase. 2) an automated

adjustment

phase that reacts to the changes in the characteristics of the continuous stream of query workloads. We perform comprehensive experiments on two real-world RDF data sets to demonstrate that our approach is superior to the state-of-the-art techniques in this domain.

Hooran MahmoudiNasab, Sherif Sakr

RaUL: RDFa User Interface Language – A Data Processing Model for Web Applications

In this paper we introduce RaUL, the RDFa User Interface Language, a user interface markup ontology that is used to describe the structure of a web form as RDF statements. RaUL separates the markup of the control elements on a web form, the

form model

, from the

data model

that the form controls operate on. Form controls and the data model are connected via a data binding mechanism. The form elements include references to an RDF graph defining the data model. For the rendering of the instances of a RaUL model on the client-side we propose ActiveRaUL, a processor that generates XHTML+RDFa elements for displaying the model on the client.

Armin Haller, Jürgen Umbrich, Michael Hausenblas

Synchronising Personal Data with Web 2.0 Data Sources

Web 2.0 users may publish a rich variety of personal data to a number of sites by uploading personal desktop data or actually creating it on the Web 2.0 site. We present a framework and tools that address the resulting problems of information fragmentation and fragility by providing users with fine grain control over the processes of publishing and importing Web 2.0 data.

Stefania Leone, Michael Grossniklaus, Alexandre de Spindler, Moira C. Norrie

An Artifact-Centric Approach to Generating Web-Based Business Process Driven User Interfaces

Workflow-based web applications are important in workflow management systems as they interact with users of business processes. With the Model-driven approach, user interfaces (UIs) of these applications can be partially generated based on functional and data requirements obtained from underlying process models. In traditional activity-centric modelling approaches, data models and relationships between tasks and data are not clearly defined in the process model; thus, it is left to UI modellers to manually identify data requirement in generated UIs. We observed that artifact-centric approaches can be applied to address the above problems. However, it brings in challenges to automatically generate UIs due to the declarative manner of describing the processes. In this paper, we propose a model-based automatic UI generation framework with related algorithms for deriving UIs from process models.

Sira Yongchareon, Chengfei Liu, Xiaohui Zhao, Jiajie Xu

XML and Query Languages

A Pattern-Based Temporal XML Query Language

The need to store large amount of temporal data in XML documents makes temporal XML document query an interesting and practical challenge. Researchers have proposed various temporal XML query languages with specific data models, however, these languages just extend XPath or XQuery with simple temporal operations, thus lacking both declarativeness and consistency in terms of usability and reasonability. In this paper we introduce TempXTQ, a pattern-based temporal XML query language, with a Set-based Temporal XML (STX) data model which uses hierarchically-grouped data sets to uniformly represent both temporal information and common XML data. TempXTQ deploys various patterns equipped with certain pattern restructuring mechanism to present requests on extracting and constructing temporal XML data. These patterns are hierarchically composed with certain operators like logic connectives, which enables TempXTQ to specify temporal queries consistently with the STX model and declaratively present various kinds of data manipulation requests. We further demonstrate that TempXTQ can present complicated temporal XML queries clearly and efficiently.

Xuhui Li, Mengchi Liu, Arif Ghafoor, Philip C-Y. Sheu

A Data Mining Approach to XML Dissemination

Currently user’s interests are expressed by XPath or XQuery queries in XML dissemination applications. These queries require a good knowledge of the structure and contents of the documents that will arrive; As well as knowledge of XQuery which few consumers will have. In some cases, where the distinction of relevant and irrelevant documents requires the consideration of a large number of features, the query may be impossible. This paper introduces a data mining approach to XML dissemination that uses a given document collection of the user to automatically learn a classifier modelling of his/her information needs. Also discussed are the corresponding optimization methods that allow a dissemination server to execute a massive number of classifiers simultaneously. The experimental evaluation of several real XML document sets demonstrates the accuracy and efficiency of the proposed approach.

Xiaoling Wang, Martin Ester, Weining Qian, Aoying Zhou

Semantic Transformation Approach with Schema Constraints for XPath Query Axes

XPath queries are essentially composed of a succession of axes defining the navigation from a current context node. Among the XPath query axes family,

child

,

descendant, parent

can be optionally specified using the path notations {/,//,..} which have been commonly used. Axes such as

following-sibling and preceding-sibling

have unique functionalities which provide different required information that cannot be achieved by others. However, XPath query optimization using schema constraints does not yet consider these axes family.

The performance of queries denoting the same result by means of different axes may significantly differ. The difference in performance can be affected by some axes, but this can be avoided. In this paper, we propose a semantic transformation typology and algorithms that transform XPath queries using axes, with no optional path operators, into semantic equivalent XPath queries in the presence of an XML schema. The goal of the transformation is to replace whenever possible the axes that unnecessarily impact upon performance. We show how, by using our semantic transformation, the accessing of the database using such queries can be avoided in order to boost performance. We implement the proposed algorithms and empirically evaluate their efficiency and effectiveness as semantic query optimization devices.

Dung Xuan Thi Le, Stephane Bressan, Eric Pardede, Wenny Rahayu, David Taniar

Domain-Specific Language for Context-Aware Web Applications

Context-awareness is a requirement in many modern web applications. While most model-driven web engineering approaches have been extended with support for adaptivity, state-of-the-art development platforms generally provide only limited means for the specification of adaptation and often completely lack a notion of context. We propose a domain-specific language for context-aware web applications that builds on a simple context model and powerful context matching expressions.

Michael Nebeling, Michael Grossniklaus, Stefania Leone, Moira C. Norrie

Web Search II

Enishi: Searching Knowledge about Relations by Complementarily Utilizing Wikipedia and the Web

How global warming and agriculture mutually influence each other?

It is possible to answer the question by searching knowledge about the relation between global warming and agriculture. As exemplified by this question, strong demands exist for searching relations between objects. However, methods or systems for searching relations are not well studied. In this paper, we propose a relation search system named “Enishi.” Enishi supplies a wealth of diverse multimedia information for deep understanding of relations between two objects by complementarily utilizing knowledge from Wikipedia and the Web. Enishi first mines elucidatory objects constituting relations between two objects from Wikipedia. We then propose new approaches for Enishi to search more multimedia information about relations on the Web using elucidatory objects. Finally, we confirm through experiments that our new methods can search useful information from the Web for deep understanding of relations.

Xinpeng Zhang, Yasuhito Asano, Masatoshi Yoshikawa

Potential Role Based Entity Matching for Dataspaces Search

Explosion of the amount of personal information has made the technique of dataspaces search become a hot topic. However current search engines for dataspaces are becoming increasingly inadequate for the query tasks with users’ diverse preferences. In this paper, we present a potential role based entity matching model called POEM. We respectively propose the strategies of homologous entity matching and heterogeneous entity matching, which can better utilize both entity features and entity relationships to match and organize entities. By combining homologous entity matching and heterogeneous entity matching, the query result can be adaptable to the diverse needs from different users. The experiments demonstrate the feasibility and effectiveness of the key techniques of POEM.

Yue Kou, Derong Shen, Tiezheng Nie, Ge Yu

Personalized Resource Search by Tag-Based User Profile and Resource Profile

With the increase of media-sharing web sites such as YouTube and Flickr, there are more and more shared multimedia resources on the Web. Multimedia search becomes more important and challenging, as users demand higher retrieval quality. To achieve this goal, multimedia search needs to take users’ personalized information into consideration. Collaborative tagging systems allow users to annotate resources with their own tags, which provide a simple but powerful way for organizing, retrieving and sharing different types of social media. The user profiles obtained from collaborative tagging systems should be very useful for resource retrieval. In this paper, we propose a new method to model user profiles and resource profiles from wider perspectives and apply them to personalized resource search in a collaborative tagging environment. We implement a prototype system named as FMRS. Experiments in FMRS show that our proposed method outperforms baseline methods.

Yi Cai, Qing Li, Haoran Xie, Lijuan Yu

Incremental Structured Web Database Crawling via History Versions

Web database crawling is one of the major kinds of design choices solution for Deep Web data integration. To the best of our knowledge, the existing works only focused on how to crawl all records in a web database at one time. Due to the high dynamic of web databases, it is not practical to always crawl the whole database in order to harvest a small proportion of new records. To this end, this paper studies the problem of incremental web database crawling, which targets at crawling the new records from a web database as many as possible while minimizing the communication costs. In our approach, a new graph model, an incremental crawling task is transformed into a graph traversal process. Based on this graph, appropriate queries are generated for crawling by analyzing the history versions of the web database. Extensive experimental evaluations over real Web databases validate the effectiveness of our techniques and provide insights for future efforts in this direction.

Wei Liu, Jianguo Xiao

Web Information Systems

An Architectural Style for Process-Intensive Web Information Systems

REpresentational State Transfer (REST) is the architecture style behind the World Wide Web (WWW), allowing for many desirable quality attributes such as adaptability and interoperability. However, as many process-intensive Web information systems do not make use of REST, they often do not achieve these qualities. This paper addresses this issue by proposing RESTful Business Processes (RESTfulBP), an architectural style that adapts REST principles to Web-based business processes. RESTfulBP views processes and activities as transferrable resources by representing them as process fragments associated with a set of standard operations. Distributed process fragments interoperate by adhering to these operations and exchanging process information. The process information contains basic workflow patterns that are used for dynamic process coordination at runtime. We validate our approach through an industry case study.

Xiwei Xu, Liming Zhu, Udo Kannengiesser, Yan Liu

Model-Driven Development of Adaptive Service-Based Systems with Aspects and Rules

Service-oriented computing (SOC) has become a dominant paradigm in developing distributed Web-based software systems. Besides the benefits such as interoperability and flexibility brought by SOC, modern service-based software systems are frequently required to be highly adaptable in order to cope with rapid changes and evolution of business goals, requirements, as well as physical context in a dynamic business environment. Unfortunately, adaptive systems are still difficult to build due to its high complexity. In this paper, we propose a novel approach called MoDAR to support the development of dynamically adaptive service-based systems (DASS). Especially in this approach, we first model the functionality of a system by two constituent parts: i) a stable part called the

base model

described using business processes, and ii) a volatile part called the

variable model

described using business rules. This model reflects the fact that business processes and rules are two significant and complementary aspects of business requirements, and business rules are usually much more volatile than business processes. We then use an aspect-oriented approach to weave the base model and variable model together so that they can evolve independently without interfering with each other. A model-driven platform has been implemented to support the development lifecycle of a DASS from specification, design, to deployment and execution. Systems developed with the MoDAR platform are running on top of a BPEL process engine and a Drools rule engine. Experimentation shows that our approach brings high adaptability and maintainability to service-based systems with reasonable performance overhead.

Jian Yu, Quan Z. Sheng, Joshua K. Y. Swee

An Incremental Approach for Building Accessible and Usable Web Applications

Building accessible Web applications is difficult, moreover considering the fact that they are constantly evolving. To make matters more critical, an application which conforms to the well-known W3C accessibility standards is not necessarily usable for handicapped persons. In fact, the user experience, when accessing a complex Web application, using for example screen readers, tends to be far from friendly. In this paper we present an approach to safely transform Web applications into usable and accessible ones. The approach is based on an adaptation of the well-known software refactoring technique. We show how to apply accessibility refactorings to improve usability in accessible applications, and how to make the process of obtaining this “new” application cost-effective, by adapting an agile development process.

Nuria Medina Medina, Juan Burella, Gustavo Rossi, Julián Grigera, Esteban Robles Luna

CPH-VoD: A Novel CDN–P2P-Hybrid Architecture Based VoD Scheme

Taking advantages of both CDN and P2P networks has been considered as a feasible solution for large-scale video stream delivering systems. Recent researches have shown great interested in CDN-P2P-hybrid architecture and ISP-friendly P2P content delivery. In this paper, we propose a novel VoD scheme based on CDN-P2P-hybrid architecture. First, we design a multi-layer CDN-P2P-hybrid overlay network architecture. Second, in order to provide a seamless connection for different layers, we propose a super-node mechanism, serving as connecting points. Third, as part of experiment works we discuss CPH-VoD implementation scheme based on peer node local RTSP server mechanism, including peer download/upload module, and their working processes. The experimental results show that VoD based on CDN-P2P-hybrid is superior to either pure CDN approach or pure P2P approach.

Zhihui Lu, Jie Wu, Lijiang Chen, Sijia Huang, Yi Huang

Information Retrieval and Extraction

A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval

Term-partitioning is an efficient way to distribute a large inverted index. Two fundamentally different query processing approaches are pipelined and non-pipelined. While the pipelined approach provides higher query throughput, the non-pipelined approach provides shorter query latency. In this work we propose a third alternative, combining non-pipelined inverted index access, heuristic decision between pipelined and non-pipelined query execution and an improved query routing strategy. From our results, the method combines the advantages of both approaches and provides high throughput and short query latency. Our method increases the throughput by up to 26% compared to the non-pipelined approach and reduces the latency by up to 32% compared to the pipelined.

Simon Jonassen, Svein Erik Bratsberg

Towards Flexible Mashup of Web Applications Based on Information Extraction and Transfer

Mashup combines information or functionality from two or more existing Web sources to create a new Web page or application. The Web sources that are used to build mashup applications mainly include Web applications and Web services. The traditional way of building mashup applications is using Web services by writing a script or a program to invoke those Web services. To help the users without programming experience to build flexible mashup applications, we propose a mashup approach of Web applications in this paper. Our approach allows users to build mashup applications with existing Web applications without programming. In addition, with our approach users can transfer information between Web applications to implement consecutive query mashup applications. This approach is based on the information extraction, information transfer and functionality emulation methods. Our implementation shows that general Web applications can also be used to build mashup applications easily, without programming.

Junxia Guo, Hao Han, Takehiro Tokuda

On Maximal Contained Rewriting of Tree Pattern Queries Using Views

The problem of rewriting tree pattern queries using views has attracted much attention in recent years. Previous works have proposed algorithms for finding the maximal contained rewriting using views when the query and the view are limited to some special cases, e.g., tree patterns not having the wildcard *. In the general case, i.e, when both //-edges and * are present, the previous methods may fail to find the maximal contained rewriting. In this paper, we propose a method to find the maximal contained rewriting for the general case, as well as an extension of the previous method to more special cases.

Junhu Wang, Jeffrey Xu Yu

Implementing Automatic Error Recovery Support for Rich Web Clients

The way developers usually implement recoverability in object oriented applications is by delegating the backward error recovery logic to the ever-present database transactions, discarding the in-memory object graphwhen something goes wrong and reconstructing its previous version from the repository. This is not elegant from the point of view of design,but a cheap and efficient way to recover the system from an error. In some architectures like RIA, the domain logic is managed in the client without that resource, and the error prone and complex recoverability logic must be implemented manually, leading to a tangled and obfuscated code. An automatic recovery mechanism is adapted to that architecture by means of a JavaScript implementation. We developed several benchmarks representing common scenarios to measure the benefits and costs of this approach, evidencing the feasibility of the automatic recovery logic but an unexpected overhead of the chosen implementation of AOP for JavaScript.

Manuel Quintela-Pumares, Daniel Fernández-Lanvin, Raúl Izquierdo, Alberto-Manuel Fernández-Álvarez

Springer Professional

About this book

Table of Contents

Frontmatter

Keynotes

Providing Scalable Database Services on the Cloud

Search and Social Integration

Elements of a Spatial Web

The Ubiquitous DBMS

Web Service

Building Web Services Middleware with Predictable Service Execution

Event Driven Monitoring for Service Composition Infrastructures

On Identifying and Reducing Irrelevant Information in Service Composition and Execution

Propagation of Data Protection Requirements in Multi-stakeholder Web Services Systems

Social Networks

Refining Graph Partitioning for Social Network Clustering

Fast Detection of Size-Constrained Communities in Large Networks

Evolutionary Taxonomy Construction from Dynamic Tag Space

Co-clustering for Weblogs in Semantic Space

Web Data Mining

A Linear-Chain CRF-Based Learning Approach for Web Opinion Mining

An Unsupervised Sentiment Classifier on Summarized or Full Reviews

Neighborhood-Restricted Mining and Weighted Application of Association Rules for Recommenders

Semantically Enriched Event Based Model for Web Usage Mining

Keyword Search

Effective and Efficient Keyword Query Interpretation Using a Hybrid Graph

From Keywords to Queries: Discovering the User’s Intended Meaning

Efficient Interactive Smart Keyword Search

Relevant Answers for XML Keyword Search: A Skyline Approach

Web Search I

A Children-Oriented Re-ranking Method for Web Search Engines

TURank: Twitter User Ranking Based on User-Tweet Graph Analysis

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries

Best-Effort Refresh Strategies for Content-Based RSS Feed Aggregation

Mashup-Aware Corporate Portals

Web Data Modeling

When Conceptual Model Meets Grammar: A Formal Approach to Semi-structured Data Modeling

Crowdsourced Web Augmentation: A Security Model

Design of Negotiation Agents Based on Behavior Models

High Availability Data Model for P2P Storage Network

Recommender Systems

Modeling Multiple Users’ Purchase over a Single Account for Collaborative Filtering

Interaction-Based Collaborative Filtering Methods for Recommendation in Online Dating

Developing Trust Networks Based on User Tagging Information for Recommendation Making

Towards Three-Stage Recommender Support for Online Consumers: Implications from a User Study

RDF and Web Data Processing

Query Relaxation for Star Queries on RDF

Efficient and Adaptable Query Workload-Aware Management for RDF Data

RaUL: RDFa User Interface Language – A Data Processing Model for Web Applications

Synchronising Personal Data with Web 2.0 Data Sources

An Artifact-Centric Approach to Generating Web-Based Business Process Driven User Interfaces

XML and Query Languages

A Pattern-Based Temporal XML Query Language

A Data Mining Approach to XML Dissemination

Semantic Transformation Approach with Schema Constraints for XPath Query Axes

Domain-Specific Language for Context-Aware Web Applications

Web Search II

Enishi: Searching Knowledge about Relations by Complementarily Utilizing Wikipedia and the Web

Potential Role Based Entity Matching for Dataspaces Search

Personalized Resource Search by Tag-Based User Profile and Resource Profile

Incremental Structured Web Database Crawling via History Versions

Web Information Systems

An Architectural Style for Process-Intensive Web Information Systems

Model-Driven Development of Adaptive Service-Based Systems with Aspects and Rules

An Incremental Approach for Building Accessible and Usable Web Applications

CPH-VoD: A Novel CDN–P2P-Hybrid Architecture Based VoD Scheme

Information Retrieval and Extraction

A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval

Towards Flexible Mashup of Web Applications Based on Information Extraction and Transfer

On Maximal Contained Rewriting of Tree Pattern Queries Using Views

Implementing Automatic Error Recovery Support for Rich Web Clients

Backmatter

Premium Partner