nach oben

2009 | Buch

Weaving Services and People on the World Wide Web

herausgegeben von: Irwin King, Ricardo Baeza-Yates

Verlag: Springer Berlin Heidelberg

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Ever since its inception, the Web has changed the landscape of human experiences on how we interact with one another and data through service infrastructures via various computing devices. This interweaving environment is now becoming ever more embedded into devices and systems that integrate seamlessly on how we live, both in our working or leisure time. For this volume, King and Baeza-Yates selected some pioneering and cutting-edge research work that is pointing to the future of the Web. Based on the Workshop Track of the 17th International World Wide Web Conference (WWW2008) in Beijing, they selected the top contributions and asked the authors to resubmit their work with a minimum of one third of additional material from their original workshop manuscripts to be considered for this volume. After a second-round of reviews and selection, 16 contributions were finally accepted. The work within this volume represents the tip of an iceberg of the many exciting advancements on the WWW. It covers topics like semantic web services, location-based and mobile applications, personalized and context-dependent user interfaces, social networks, and folksonomies. The presentations aim at researchers in academia and industry by showcasing latest research findings. Overall they deliver an excellent picture of the current state-of-the-art, and will also serve as the basis for ongoing research discussions and point to new directions.

Inhaltsverzeichnis

Frontmatter

Web Services

Frontmatter

Classification of Automated Search Traffic

Abstract

As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.

Greg Buehrer, Jack W. Stokes, Kumar Chellapilla, John C. Platt

Semantic Services for Wikipedia

Abstract

Wikipedia, a killer application in Web 2.0, has embraced the power of collaborative editing to harness collective intelligence. It features many attractive characteristics, like entity-based link graph, abundant categorization and semi-structured layout, and can serve as an ideal data source to extract high quality and well-structured data. In this chapter, we first propose several solutions to extract knowledge from Wikipedia. We do not only consider information from the relational summaries of articles (infoboxes) but also semi-automatically extract it from the article text using the structured content available. Due to differences with information extraction from the Web, it is necessary to tackle new problems, like the lack of redundancy in Wikipedia that is dealt with by extending traditional machine learning algorithms to work with few labeled data. Furthermore, we also exploit the widespread categories as a complementary way to discover additional knowledge. Benefiting from both structured and textural information, we additionally provide a suggestion service for Wikipedia authoring. With the aim to facilitate semantic reuse, our proposal provides users with facilities such as link, categories and infobox content suggestions. The proposed enhancements can be applied to attract more contributors and lighten the burden of professional editors. Finally, we developed an enhanced search system, which can ease the process of exploiting Wikipedia. To provide a user-friendly interface, it extends the faceted search interface with relation navigation and let the user easily express his complex information needs in an interactive way. In order to achieve efficient query answering, it extends scalable IR engines to index and search both the textual and structured information with an integrated ranking support.

Haofen Wang, Thomas Penin, Linyun Fu, Qiaoling Liu, Guirong Xue, Yong Yu

Context-based Semantic Mediation in Web Service Communities

Abstract

Communities gather Web services that provide a common functionality, acting as an intermediate layer between end users and Web services. On the one hand, they provide a single endpoint that handles user requests and transparently selects and invokes Web services, thus abstracting the selection task and leveraging the provided quality of service level. On the other hand, they maximize the visibility and use rate of Web services. However, data exchanges that take place between Web services and the community endpoint raise several issues, in particular due to semantic heterogeneities of data. Specific mediation mechanisms are required to adapt data operated by Web services to those of the community. Hence, mediation facilititates interoperability and reduces the level of difficulty for Web services to join and interact with communities. In this chapter, we propose a mediation approach that builds on (1) context-based semantic representation for Web services and the community; and (2) mediation mechanisms to resolve the semantic heterogeneities occuring during data exchanges. We validate our solution through some experiments as part of the WSMO framework over a test community and show the limitations of our approach.

Michael Mrissa, Stefan Dietze, Philippe Thiran, Chirine Ghedira, Djamal Benslimane, Zakaria Maamar

An Effective Aggregation Policy for RSS Services

Abstract

RSS is the XML-based format for syndication of Web contents and users aggregate RSS feeds with RSS feed aggregators. As the usage of RSS service has been diffused, it is crucial to have a good aggregation policy that enables users to efficiently aggregate postings that are generated. Aggregation policies may determine not only the number of aggregations for each RSS feed, but also schedule when aggregations take place. In this paper, we first propose the algorithms of minimum missing aggregation policy which reduces the number of missing postings during aggregations. Second, we compare and analyze the experimental results of ours with the existing minimum delay aggregation policy. Our analysis shows that the minimum missing aggregation policy can reduce approximately 29% of the posts the existing minimum delay aggregation policy would miss.

Jae Hwi Kim, Sang Ho Lee, Young Geun Han

Evolution of the Mobile Web

Abstract

The Mobile Web refers to accessing the World Wide Web from a mobile device. This chapter discusses the evolution of the mobile web from a perspective of the mobile web research community that has published at the WorldWideWeb conference satellite workshop series – MobEA. This workshop has been developed over a period of six years covering a wide range of topics. We summarize some of our findings and report on how the mobile Web has evolved over the past six years.

Rittwik Jana, Daniel Appelquist

Personalized Service Creation and Provision for the Mobile Web

Abstract

The convergence of telecom networks and the Internet is fostering the emergence of environments where Web services are available to mobile users. The variability in computing resources, display terminal, and communication channel require intelligent support on personalized delivery of relevant data and services to mobile users. Personalized service provisioning presents several research challenges on context information management, service creation, and inherent limitations of mobile devices. In this chapter, we describe a novel framework that supports weaving context information and services for personalized service creation and execution. By leveraging technologies on Web services, agents, and publish/subscribe systems, our framework enables an effective, user-centric access of integrated services over the mobile Web environments. This chapter overviews the design, architecture, and implementation of the framework.

Quan Z. Sheng, Jian Yu, José M. del Álamo, Paolo Falcarin

Selecting the Best Mobile Information Service with Natural Language User Input

Abstract

Information services accessed via mobile phones provide information directly relevant to subscribers’ daily lives and are an area of dynamic market growth worldwide. Although many information services are currently offered by mobile operators, many of the existing solutions require a unique gateway for each service, and it is inconvenient for users to have to remember a large number of such gateways. Furthermore, the Short Message Service (SMS) is very popular in China and Chinese users would prefer to access these services in natural language via SMS. This chapter describes a Natural Language Based Service Selection System (NL3S) for use with a large number of mobile information services. The system can accept user queries in natural language and navigate it to the required service. Since it is difficult for existing methods to achieve high accuracy and high coverage and anticipate which other services a user might want to query, the NL3S is developed based on a Multi-service Ontology (MO) and Multi-service Query Language (MQL). The MO and MQL provide semantic and linguistic knowledge, respectively, to facilitate service selection for a user query and to provide adaptive service recommendations. Experiments show that the NL3S can achieve 75–95% accuracies and 85–95% satisfactions for processing various styles of natural language queries. A trial involving navigation of 30 different mobile services shows that the NL3S can provide a viable commercial solution for mobile operators.

Qiangze Feng, Hongwei Qi, Toshikazu Fukushima

Location Concepts for the Web

Abstract

The concept of location has become very popular in many applications on the Web, in particular for those which aim at connecting the real world with resources on the Web. However, the Web as it is today has no overall location concept, which means that applications have to introduce their own location concepts and have done so in incompatible ways. On the other hand, there are a number of interfaces and techniques that make location information available to networked devices. By turning the Web into a location-aware Web location-oriented applications get better support for their location concepts on the Web, and the Web becomes an information system where location-related information can be more easily shared across different applications and application areas. This chapter describes a location concept for the Web supporting different location types and its embedding into some of the Web’s core technologies.

Martin Kofahl, Erik Wilde

Ad Hoc Determination of Geographic Regions for Concept@Location Queries

Abstract

Textual geographic queries to search engines usually consist of the desired concept and also of one or more terms describing a location, which is often the name of a city, which in turn can usually be grounded with the help of a gazetteer. On other occasions, though, the location refers to a (vague) geographic region and may also be a vernacular expression for that region, so that this location specification cannot be found in a gazetteer.

In this chapter we describe an approach to determine the boundaries for such locations and how to integrate this approach into the query process. The key features of our approach are that a geographic search engine is able to handle any textual description of a geographic region at query time and that this computation can be done completely automatically. In our approach we derive a representation for a region from the toponyms found in the top web documents resulting from a query using the terms describing the location.

In addition to that, we introduce two other uses of this approach: first, this method can be used for answering where-is queries (where only a query location, but no query concept is given), and second, we can determine geographic representations for arbitrary terms that are not genuine geographic regions. In that case, the geographic representation provides a visual impression of the geographic correlation of those terms.

Andreas Henrich, Volker Lüdecke

Acquisition of Vernacular Place Names from Web Sources

Abstract

Vernacular place names are names that are commonly in use to refer to geographical places. For purposes of effective information retrieval, the spatial extent associated with these names should reflect peoples perception of the place, even though this may differ sometimes from the administrative definition of the same place name. Due to their informal nature, vernacular place names are hard to capture, but methods to acquire and define vernacular place names are of great benefit to search engines and all kinds of information services that deal with geographic data. This paper discusses the acquisition of vernacular use of place names from web sources and their representation as surface models derived by kernel density estimators. We show that various web sources containing user created geographic information and business data can be used to represent neighbourhoods in Cardiff, UK. The resulting representations can differ in their spatial extent from administrative definitions. The chapter closes with an outlook on future research questions.

Florian A. Twaroch, Christopher B. Jones, Alia I. Abdelmoty

Social Computing

Frontmatter

Social Web and Knowledge Management

Abstract

Knowledge Management is the study and practice of representing, communicating, organizing, and applying knowledge in organizations. Moreover, being used by organizations, it is inherently social. The Web, as a medium, enables new forms of communications and interactions and requires new ways to represent knowledge assets. It is therefore obvious that the Web will influence and change Knowledge Management, but it is very unclear what the impact of these changes will be. This chapter raises questions and discusses visions in the area that connects the Social Web and Knowledge Management – an area of research that is only just emerging. The World Wide Web conference 2008 in Beijing hosted a workshop on that question, bringing together researchers and practitioners to gain first insights toward answering questions of that area.

Peter Dolog, Markus Krötzsch, Sebastian Schaffert, Denny Vrandečić

Setting Access Permission through Transitive Relationship in Web-based Social Networks

Abstract

The rising popularity of various social networking websites has created a huge problem on Internet privacy. Although it is easy to post photos, comments, opinions on some events, etc. on the Web, some of these data (such as a person’s location at a particular time, criticisms of a politician, etc.) are private and should not be accessed by unauthorized users. Although social networks facilitate sharing, the fear of sending sensitive data to a third party without knowledge or permission of the data owners discourages people from taking full advantage of some social networking applications. We exploit the existing relationships on social networks and build a ‘‘trust network’’ with transitive relationship to allow controlled data sharing so that the privacy and preferences of data owners are respected. The trust network linking private data owners, private data requesters, and intermediary users is a directed weighted graph. The permission value for each private data requester can be automatically assigned in this network based on the transitive relationship. Experiments were conducted to confirm the feasibility of constructing the trust network from existing social networks, and to assess the validity of permission value assignments in the query process. Since the data owners only need to define the access rights of their closest contacts once, this privacy scheme can make private data sharing easily manageable by social network participants.

Dan Hong, Vincent Y. Shen

Multiple Interests of Users in Collaborative Tagging Systems

Abstract

Performance of recommender systems depends on whether the user profiles contain accurate information about the interests of the users, and this in turn relies on whether enough information about their interests can be collected. Collaborative tagging systems allow users to use their own words to describe their favourite resources, resulting in some user-generated categorisation schemes commonly known as folksonomies. Folksonomies thus contain rich information about the interests of the users, which can be used to support various recommender systems. Our analysis of the folksonomy in Delicious reveals that the interests of a single user can be very diverse. Traditional methods for representing interests of users are usually not able to reflect such diversity. We propose a method to construct user profiles of multiple interests from folksonomies based on a network clustering technique. Our evaluation shows that the proposed method is able to generate user profiles which reflect the diversity of user interests and can be used as a basis of providing more focused recommendation to the users.

Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt

On the Effect of Group Structures on Ranking Strategies in Folksonomies

Abstract

Folksonomies have shown interesting potential for improving information discovery and exploration. Recent folksonomy systems explore the use of tag assignments, which combine Web resources with annotations (tags), and the users that have created the annotations. This article investigates on the effect of grouping resources in folksonomies, i.e. creating sets of resources, and using this additional structure for the tasks of search & ranking, and for tag recommendations. We propose several group-sensitive extensions of graph-based search and recommendation algorithms, and compare them with non group-sensitive versions. Our experiments show that the quality of search result ranking can be significantly improved by introducing and exploiting the grouping of resources (one-tailed t-Test, level of significance α=0.05). Furthermore, tag recommendations profit from the group context, and it is possible to make very good recommendations even for untagged resources– which currently known tag recommendation algorithms cannot fulfill.

Fabian Abel, Nicola Henze, Daniel Krause, Matthias Kriesell

Resolving Person Names in Web People Search

Abstract

Disambiguating person names in a set of documents (such as a set of web pages returned in response to a person name) is a key task for the presentation of results and the automatic profiling of experts. With largely unstructured documents and an unknown number of people with the same name the problem presents many difficulties and challenges. This chapter treats the task of person name disambiguation as a document clustering problem, where it is assumed that the documents represent particular people. This leads to the person cluster hypothesis, which states that similar documents tend to represent the same person. Single Pass Clustering, k-Means Clustering, Agglomerative Clustering and Probabilistic Latent Semantic Analysis are employed and empirically evaluated in this context. On the SemEval 2007 Web People Search it is shown that the person cluster hypothesis holds reasonably well and that the Single Pass Clustering and Agglomerative Clustering methods provide the best performance.

Krisztian Balog, Leif Azzopardi, Maarten de Rijke

Studies on Editing Patterns in Large-scale Wikis

Abstract

Wiki systems have developed over the past years as lightweight, community-editable, web-based hypertext systems. With the emergence of Semantic Wikis, these collections of interlinked documents have also gained a dual role as ad-hoc RDF graphs. However, their roots lie at the limited hypertext capabilities of the World Wide Web: embedded links, without support for composite objects or transclusion. In this chapter, we present experimental evidence that hyperstructure changes, as opposed to content changes, form a substantial proportion of editing effort on a large-scale wiki.We then follow this with a in-detail experiment, studying how individual editors work to edit articles on the wiki. These experiments are set in the wider context of a study of how the technologies developed during decades of hypertext research may be applied to improve management of wiki document structure and, with semantic wikis, knowledge structure.

Philip Boulain, Nigel Shadbolt, Nicholas Gibbins

Backmatter

Titel: Weaving Services and People on the World Wide Web
herausgegeben von: Irwin King
Ricardo Baeza-Yates
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-00570-1
Print ISBN: 978-3-642-00569-5
DOI: https://doi.org/10.1007/978-3-642-00570-1

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Web Services

Frontmatter

Classification of Automated Search Traffic

Semantic Services for Wikipedia

Context-based Semantic Mediation in Web Service Communities

An Effective Aggregation Policy for RSS Services

Evolution of the Mobile Web

Personalized Service Creation and Provision for the Mobile Web

Selecting the Best Mobile Information Service with Natural Language User Input

Location Concepts for the Web

Ad Hoc Determination of Geographic Regions for Concept@Location Queries

Acquisition of Vernacular Place Names from Web Sources

Social Computing

Frontmatter

Social Web and Knowledge Management

Setting Access Permission through Transitive Relationship in Web-based Social Networks

Multiple Interests of Users in Collaborative Tagging Systems

On the Effect of Group Structures on Ranking Strategies in Folksonomies

Resolving Person Names in Web People Search

Studies on Editing Patterns in Large-scale Wikis

Backmatter

Premium Partner