Skip to main content
Top

2006 | Book

Federation over the Web

International Workshop, Dagstuhl Castle, Germany, May 1-6, 2005. Revised Selected Papers

Editors: Klaus P. Jantke, Aran Lunzer, Nicolas Spyratos, Yuzuru Tanaka

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The lives of people all around the world, especially in industrialized nations, continue to be changed by the presence and growth of the Internet. Its in?uence is felt at scales ranging from private lifestyles to national economies, boosting thepaceatwhichmoderninformationandcommunicationtechnologiesin?uence personal choices along with business processes and scienti?c endeavors. In addition to its billions of HTML pages, the Web can now be seen as an open repository of computing resources. These resources provide access to computational services as well as data repositories, through a rapidly growing variety of Web applications and Web services. However, people’s usage of all these resources barely scratches the surface of the possibilities that such richness should o?er. One simple reason is that, given the variety of information available and the rate at which it is being extended, it is di?cult to keep up with the range of resources relevant to one’s interests. Another reason is that resources are o?ered in a bewildering variety of formats and styles, so that many resources e?ectively stand in isolation. This is reminiscent of the challenge of enterprise application integration, - miliar to every large organization be it in commerce, academia or government. Thechallengearisesbecauseoftheaccumulationofinformationandcommuni- tion systems over decades, typically without the technical provision or political will to make them work together. Thus the exchange of data among those s- tems is di?cult and expensive, and the potential synergetic e?ects of combining them are never realized.

Table of Contents

Frontmatter

Knowledge Look-Up and Matching

Text Mining Using Markov Chains of Variable Length
Abstract
When dealing with knowledge federation over text documents one has to figure out whether or not documents are related by context. A new approach is proposed to solve this problem.
This leads to the design of a new search engine for literature research and related problems. The idea is that one has already some documents of interest. These documents are taken as input. Then all documents known to a classical search engine are ranked according to their relevance. For achieving this goal we use Markov chains of variable length.
The algorithms developed have been implemented and testing over the Reuters-21578 data set has been performed.
Björn Hoffmeister, Thomas Zeugmann
Faster Pattern Matching Algorithm for Arc-Annotated Sequences
Abstract
We present an improvement of pattern matching algorithm for arc-annotated sequences. Arc-annotated sequences are used for representing the structural information, e.g., RNA and protein sequences in molecular biology. Given two sequences with arcs, a text of length n and a pattern of length m, the problem is to determine whether the pattern is an arc-preserving subsequence of the text. Although it is NP-complete in a general case, an O(mn) algorithm has been proposed if the given sequences have no crossing-arcs. Our contribution is to revise it and to obtain more simple one. We also present our experimental results of the running time.
Takuya Kida
VSOP (Valued-Sum-of-Products) Calculator for Knowledge Processing Based on Zero-Suppressed BDDs
Abstract
Recently, Binary Decision Diagrams (BDDs) are widely used for efficiently manipulating large-scale Boolean function data. BDDs are also applied for handling combinatorial item set data. Zero-suppressed BDDs (ZBDDs) are special type of BDDs which are suitable for implicitly handling large-scale combinatorial item set data. In this paper, we present VSOP program developed for calculating combinatorial item set data specified by symbolic expressions based on ZBDD techniques. Our program supports not only combinatorial set operations but also numerical arithmetic operations based on Valued-Sum-Of-Products algebra, such as addition, subtraction, multiplication, division, numerical comparison, etc. We discuss the data structures and algorithms in our program, and show some typical applications. VSOP calculator will be useful for solving many problems in Computer Science. We show one of the promising application to find a hidden data group related each other under the huge amount of web space. Our method will facilitates knowledge federation over the web, and also useful for many other applications in computer science.
Shin-ichi Minato

Knowledge Search and Clustering

A Method for Pinpoint Clustering of Web Pages with Pseudo-Clique Search
Abstract
This paper presents a method for Pinpoint Clustering of web pages. We try to find useful clusters of web pages which are significant in the sense that their contents are similar to ones of higher-ranked pages. Since we are usually careless of lower-ranked pages, they are unconditionally discarded even if their contents are similar to some pages with high ranks. Such hidden pages together with significant higher-ranked pages are extracted as a cluster. As the result, our clusters can provide new valuable information for users.
In order to obtain such clusters, we first extract semantic correlations among terms by applying Singular Value Decomposition (SVD) to the term-document matrix generated from a corpus. Based on the correlations, we can evaluate potential similarities among web pages to be clustered. The set of web pages is represented as a weighted graph G based on the similarities and their ranks. Our clusters can be found as pseudo-cliques in G. An algorithm for finding Top-N weighted pseudo-cliques is presented. Our experimental result shows that a quite valuable cluster can be actually extracted according to our method.
We also discuss an idea for improvement on meanings of clusters. With the help of Formal Concept Analysis, our clusters, called FC-based clusters, can be provided with clear meanings. Our preliminary experimentation shows that the extended method would be a promising approach to finding meaningful clusters.
Makoto Haraguchi, Yoshiaki Okubo
Specific-Purpose Web Searches on the Basis of Structure and Contents
Abstract
We introduce methods for two specific-purpose Web searches. One is a search for Web communities related to given keywords, and the other is a search for texts having a certain relation to given keywords. Our methods are based on both structure and contents of WWW. Our method of Web community search uses global structure of WWW to discover communities, and uses content information to label found communities, where global structure means Web graph composed of Web pages and hyperlinks between them. On the other hand, our method of related text search uses local structure of WWW to extract candidate texts, and uses content information to filter out wrongly extracted ones, where local structure means DOM-tree structure of each page. We report the latest results on these Web search methods.
Mineichi Kudo, Atsuyoshi Nakamura
Graph Clustering Based on Structural Similarity of Fragments
Abstract
Resources available over the Web are often used in combination to meet a specific need of a user. Since resource combinations can be represented as graphs in terms of the relations among the resources, locating desirable resource combinations can be formulated as locating the corresponding graph. This paper describes a graph clustering method based on structural similarity of fragments (currently, connected subgraphs are considered) in graph-structured data. A fragment is characterized based on the connectivity (degree) of a node in the fragment. A fragment spectrum of a graph is created based on the frequency distribution of fragments. Thus, the representation of a graph is transformed into a fragment spectrum in terms of the properties of fragments in the graph. Graphs are then clustered with respect to the transformed spectra by applying a standard clustering method. We also devise a criterion to determine the number of clusters by defining a pseudo-entropy for clusters. Preliminary experiments with synthesized data were conducted and the results are reported.
Tetsuya Yoshida, Ryosuke Shoda, Hiroshi Motoda

Knowledge Mediation

Connecting Keywords Through Pointer Paths over the Web
Abstract
We propose a framework for discovering connections from a source keyword to a target keyword through the Web pages containing them. We are interested in connections provided by pointer paths leading from a source page to a target page (a source page being a page containing the source keyword, and a target page being a page containing the target keyword). Each such path provides an “explanation” of the connection, and the set of all such paths is considered as the “semantics” of the connection.
When one talks about federation in the context of the Web, one usually means connecting a number of Web resources to cooperate towards a common goal. A complementary though less known aspect is that of discovering federations of Web resources by interpreting the pointer paths connecting them. The work presented in this paper is a step in that direction, introducing concepts and tools for discovering federations over the Web.
Mina Akaishi, Nicolas Spyratos, Koichi Hori, Yuzuru Tanaka
Querying with Preferences in a Digital Library
Abstract
We consider a collection of federated sources on the Web, and a community of users who are interested in documents residing in one or more of those federated sources. The search for documents of interest is supported by a mediator that we call a digital library. The library simply indexes all documents that are made available to users by the federated sources. When a user addresses a query to the library, the library returns the URLs of documents satisfying the query. In such a context, one of the factors influencing user satisfaction is the size of the answer set, in particular when it is too small (few or no documents) or too large (several hundreds or thousands of documents). In this paper, we address the problem of answer sets that are too large, and we call personalized query a usual query together with (a) an upper bound on the number of documents returned, and (b) a set of preferences as to the order in which the returned documents should be presented to the user; both these parameters are defined by the user online, during query formulation. The main contribution of the paper is to propose a framework in which the problem can be stated formally, and a method for the evaluation of personalized queries.
Nicolas Spyratos, Vassilis Christophides

Interoperation of Web-Based Resources

An Enhanced Spreadsheet Supporting Calculation-Structure Variants, and Its Application to Web-Based Processing
Abstract
This paper reports our work towards an end user environment for building and experimenting with federations of Web-based processing resources. We present the key concepts and an initial interface for the RecipeSheet, a spreadsheet-like environment with explicit support for creating and comparing alternative scenarios, based on the principles of subjunctive interfaces. A key feature of the RecipeSheet is that alternative scenarios can differ in terms of the processing used to calculate cells’ values; in the context of the Web, this is useful for gathering and comparing results from alternative resources that offer nominally the same processing. We show various usage cases for our prototype, including an example from Web-based bioinformatics.
Aran Lunzer, Kasper Hornbæk
Knowledge Federation over the Web Based on Meme Media Technologies
Abstract
This paper proposes a formal model for the aggregate ad hoc federation of geographically distributed intelligent resources accessible through the Web. We already proposed frameworks for one-by-one ad hoc federation of intelligent resources over the Web. They can define interoperation among intelligent resources over the Web as a set of interoperations between two intelligent resources. They allow us to define an overall federation by repetitively combining resources. This paper deals with a case in which we have large sets of intelligent resources accessible through the Web. Each set is assumed to consist of resources of the same type, and to be accessible through the Web in the same way. This paper focuses on how to define and to execute a large set of federations, each of which defines interoperation among resources taken from different sets of resources. Such federation is called aggregate federation. Our new framework for ad hoc aggregate federation will be formalized based on meta relations and their relational expressions, and enables users to flexibly select some resources satisfying a specified condition from each set of resources of the same kind, to define a relation of resources satisfying a specified condition as a subset of the Cartesian product of these different resource sets, and to define and to execute interoperation among resources in each tuple in the defined relation.
Yuzuru Tanaka

Knowledge Evolution

Towards Understanding Meme Media Knowledge Evolution
Abstract
Successful communication involves the individual utterances being interpreted within a suitable context. Systems that fail to acquire and share the context required for some topic are likely to fail to communicate successfully about that topic. Software systems populating an open medium such as the Web are unlikely to have been designed or otherwise prepared to communicate with each other, so if they are to communicate they face this challenge of acquiring and sharing the necessary context. We consider this situation for software systems implemented as meme media objects that contain representations of human knowledge. The mentioned acquisition can be understood as an enhancement of the knowledge representation they contain. Thus we consider establishing successful communication among meme media objects on the Web as an instance of knowledge evolution. The paper provides a conceptual framework for studying knowledge evolution. That framework is based on a particular interpretation of the concept of model. We give an example of use of the framework in an e-learning case study within a medical context.
Roland Kaschek, Klaus P. Jantke, István-Tibor Nébel
Mechanisms of Knowledge Evolution for Web Information Extraction
Abstract
The knowledge that is needed in Web information extraction can, under certain assumptions, be characterized as the knowledge held by wrappers that are used to extract the semantics of documents. The evolution of this knowledge can be divided into the phase of initial learning of the wrappers and the later phase of wrapper maintenance. In this paper we will focus only on the initial learning phase. Based on the LExIKON System, the principal structure of learning algorithms for island wrappers is explained.
Carsten Müller
Backmatter
Metadata
Title
Federation over the Web
Editors
Klaus P. Jantke
Aran Lunzer
Nicolas Spyratos
Yuzuru Tanaka
Copyright Year
2006
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-32587-1
Print ISBN
978-3-540-31018-1
DOI
https://doi.org/10.1007/11605126

Premium Partner