Keynote Papers

Trends in Mobile Multimedia and Networks

NTT DoCoMo is the company to “DO COmmunications over the MObile Network”. It reached 43 million subscribers in Japan, early 2003. This presentation covers the multimedia services over mobile networks and give a glimpse of future directions of mobile multimedia networks and applications. DoCoMo’s 2G and 3G mobile networks currently offer mobile visual phones, multimedia mails, and video clip download as well as enhanced “i-mode” services. After introducing the mobile multimedia applications and related technologies, future challenges and directions beyond the current networks are discussed, taking three keywords: hyper operator network, mobile content, and seamless service.

Minoru Etoh

DB-Enabled Peers for Managing Distributed Data

Peer-to-peer (P2P) computing is the sharing of computer resources, services and information by direct negotiation and exchange between autonomous and heterogeneous systems. An alternative approach to distributed and parallel computing, known as Grid Computing, has also emerged, with a similar intent of scaling the system performance and availability by sharing resources. Like P2P computing, Grid Computing has been popularized by the need for resource sharing and consequently, it rides on existing underlying organizational structure. In this paper, we compare P2P and Grid computing to highlight some of their differences. We then examine the issues of P2P distributed data sharing systems, and how database applications can ride on P2P technology. We use our Best-Peer project, which is an on-going peer-based data management system, as an example to illustrate what P2P computing can do for database management.

Beng Chin Ooi, Yanfeng Shu, Kian Lee Tan

XML and Database Design

Functional Dependencies for XML

In this paper we address the problem of how to extend the definition of functional dependencies (FDs) in incomplete relations to XML documents. An incomplete relation is said to strongly satisfy a FD if every completion of the relation, obtained by replacing all null values by data values, satisfies the FD in the ordinary sense. We propose a syntactic definition of strong FD satisfaction in a XML document (called a XFD) and then justify it by proving that for a very general class of mappings of a relation into a XML document, a relation strongly satisfies a unary FD if and only if the XML document also strongly satisfies the equivalent XFD.

Millist W. Vincent, Jixue Liu

On Transformation to Redundancy Free XML Schema from Relational Database Schema

While XML is emerging as the universal format for publishing and exchanging data on the Web, most business data is still stored and maintained in relational database management systems. As a result, there is an increasing need to efficiently publish relational data as XML documents for Internet-based applications. One way to publish relational data is to provide virtual XML documents for relational data via an XML schema which is transformed from the underlying relational database schema, then users can access the relational database through the XML schema. In this paper, we discuss issues in transforming a relational database schema into corresponding schema in XML Schema. We aim to achieve high level of nesting while introducing no data redundancy for the transformed XML schema. In the paper, we first propose a basic transformation algorithm which introduces no data redundancy, then we improve the algorithm by exploring further nesting of the transformed XML schema.

Chengfei Liu, Jixue Liu, Minyi Guo

Constraint Preserving XML Updating

With the rapid development of Internet, XML becomes the standard for data representation, integration and exchange on the web. In order to fully evolve XML into a universal data representation and sharing format, it is necessary to update XML documents efficiently while preserving constraints. We consider an important class of constraints, XML keys. In this paper, based on XML keys and the constraint-preserving normalized storage of XML over relational databases, we present a novel method for updating XML data. Our method first propagates the update on XML into the relational database. Then taking the updated relational data and the original document as input, the resulting XML document updated can be produced through locating the positions of updates in the original one by annotation technology. Preliminary performance studies have shown that our method is very effective and efficient.

Kun Yue, Zhengchuan Xu, Zhimao Guo, Aoying Zhou

Efficient XML Data Management

ENAXS: Efficient Native XML Storage System

XML is a self-describing meta-language and fast emerging as a dominant standard for Web data exchange among various applications. With the tremendous growth of XML documents, an efficient storage system is required to manage them. The conventional databases, which require all data to adhere to an explicitly specified rigid schema, are unable to provide an efficient storage for tree-structured XML documents. A new data model that is specifically designed for XML documents is required. In this paper, we propose a new storage system, named Efficient Native XML Storage System (ENAXS), for large and complex XML documents. ENAXS stores all XML documents in its native format to overcome the deficiencies of the conventional databases, achieve optimal storage utilization and support efficient query processing. In addition, we propose a path-based indexing scheme which is embedded in ENAXS for fast data retrieval. We have implemented ENAXS and evaluated its performance with real data sets. Experimental results show the efficiency and scalability of the proposed system in utilizing storage space and executing various types of queries.

Khin-Myo Win, Wee-Keong Ng, Ee-Peng Lim

A Fast Index for XML Document Version Management

With the increasing popularity of storing content on the WWW and intranet in XML form, there arises the need for the control and management of this data. As this data is constantly evolving, users want to be able to query and retrieve previous versions efficiently. This paper proposes an efficient index that support fast version updates and retrievals. Experimental results have shown that the system carries little overhead compared to those without version management support.

Nicole Lam, Raymond K. Wong

A Commit Scheduler for XML Databases

The hierarchical and semistructured nature of XML data may cause complicated update-behavior. Updates should not be limited to entire document trees, but should ideally involve subtrees and even individual elements. Providing a suitable scheduling algorithm for semistructured data can significantly improve collaboration systems that store their data — e.g. word processing documents or vector graphics — as XML documents. In this paper we improve upon earlier work (see [5]) which presented two equivalent concurrency control mechanisms based on Path Locks. In contrast to the earlier work, we now provide details regarding the workings of a commit scheduler for XML databases which uses the path lock conflict rules. We also give a comprehensive proof of serializability which enhances and clarifies the ideas in our previous work.

Stijn Dekeyser, Jan Hidders

An Efficient Path Index for Querying Semi-structured Data

(Extended Abstract)

The richness of semi-structured data allows data of varied and inconsistent structures to be stored in a single database. Such data can be represented as a graph, and queries can be constructed using path expressions, which describe traversals through the graph.Instead of providing optimal performance for a limited range of path expressions, we propose a mechanism which is shown to have consistent and high performance for path expressions of any complexity, including those with descendant operators (path wildcards). We further detail mechanisms which employ our index to perform more complex processing, such as evaluating both path expressions containing links and entire (sub) queries containing path based predicates. Performance is shown to be independent of the number of terms in the path expression(s), even where these expressions contain wildcards. Experiments show that our index is faster than conventional methods by up to two orders of magnitude for certain query types, is compact, and scales well.

Michael Barg, Raymond K. Wong, Franky Lam

Integrating Path Index with Value Index for XML Data

With the advent of XML, it is becoming the de facto standard required by the Web applications. To facilitate path expression processing, we propose an index structure adopted in our native XML database system Orient-X. Our index is constructed by utilizing DTD to get paths that will appear in the XML documents. It represents structural summary of XML data collection conforming to certain DTD, so we can process any label path query without accessing original data. In addition, it is integrated with value indexes. Preliminary experiments show quite promising results.

Jing Wang, Xiaofeng Meng, Shan Wang

Automatic Layout Generation with XML Wrapping

Because of the increasing variety of user terminals, there is a great demand for server side content adaptation. This paper describes a method to generate different layouts of an XML document automatically on server side, according to some information about the client needs. As a result of the algorithm, an XHTML document is generated from an XML source. The layout of the resulting XHTML adapts to the client’s requirements. The information about the client is sent to the server by CC/PP protocol. The generation of the layout is done dynamically on the server according to a predefined general layout template. A simple document is introduced, and tested with the most common web browsers as an example.

Istvan Beszteri, Petri Vuorimaa

XML Transformation

Efficient Evaluation in XML to XML Transformations

Different communities specify different standards (DTDs) and only those XML documents conforming to the given DTD can be processed inside a certain community. The goal of DTD-conforming XML to XML transformations with XML Transformation Grammars is to make exchanging XML documents between two communities whose DTDs are distinct feasible. However, in essence XTG evaluation is the process of executing a number of XML queries and thus this presents new challenges to query optimization. In this paper, we investigate each step of evaluating an XTG, and after modelling XML queries, we propose some optimization techniques to speed up XTG evaluation. Finally, the experimental results indicate that those techniques are efficient to XTG evaluation.

Qing Wang, Junmei Zhou, Hongwei Wu, Yizhong Wu, Yang Yuan, Aoying Zhou

Formalizing Semantics of XSLT Using Object-Z

In this paper, a formal object-oriented semantic model for XSLT in Object-Z is presented. The semantic model is constructed based on XSLT’s W3C Working Draft (August 2002). Formal description of XSLT language can provide deeper understanding of the language and support the standardisation effort for XSLT. All XSLT language constructs are modeled as Object-Z classes and the XSLT stylesheet itself is also specified by a formal class. This highly structured semantic model is concise, composable and extensible.

Hong Li Yang, Jin Song Dong, Ke Gang Hao, Jun Gang Han

An XML-Based Context-Aware Transformation Framework for Mobile Execution Environments

We propose an XML-based Context-Aware transformation Framework (X-CAF). In X-CAF, we design three main techniques — (1) an XML-based programming model to program applications, (2) a user interface adaptation mechanism to adjust UIs of applications, and (3) a transformation scheme to transform programs adapt to various MExE environments. Moreover, we design two methods, Static-time Component Generation and Runtime XUL Transformation for transforming efficiently and flexibility. Through applying these techniques, this framework may make applications device-independent.

Tzu-Han Kao, Sheng-Po Shen, Shyan-Ming Yuan, Po-Wen Cheng

FIXT: A Flexible Index for XML Transformation

XML is emerging as the main standard of presentation and exchange on the Internet. This highlights the important question of XML to XML transformation. In this paper, a flexible index structure, FIXT, is presented for efficient XML transformation. It is easy to extract and reconstruct subparts of FIXT index to build indexes for intermediate results. Thus, it can improve the performance of XML to XML transformation.

Jianchang Xiao, Qing Wang, Min Li, Aoying Zhou

Formal Foundation of Web Navigation Stereotypes and Their Transformation into XML

One of the most important needs within hypermedia systems is concise and easy to follow navigation support. Unfortunately many web-based systems are overloaded with hyperlinks bearing the risk of “getting lost” within the site, which leads subsequently to frustrated end users and less frequency of the site. To solve this problem web navigation patterns which represent well-established navigation paths within hypermedia systems have been proposed in the literature and have consequently been integrated into web design languages.Feeling the need for a formal foundation we propose a labeled graph which describes the overall navigational structure of web-based systems. This graph constitutes a basis for the formal definition of web navigation patterns which are in turn the base for high-level UML stereotypes. These UML stereotypes are transformed into XML structures and finally into HTML. This approach to support navigation design of web-based systems is exemplified using the case of the filtered index navigation pattern.

Georg Sonneck, Thomas Mueck

Web Mining

Mining “Hidden Phrase” Definitions from the Web

Keyword searching is the most common form of document search on the Web. Many Web publishers manually annotate the META tags and titles of their pages with frequently queried phrases in order to improve their placement and ranking. A “ hidden phrase” is defined as a phrase that occurs in the META tag of a Web page but not in its body. In this paper we present an algorithm that mines the definitions of hidden phrases from the Web documents. Phrase definitions allow (i) publishers to find relevant phrases with high query frequency, and, (ii) search engines to test if the content of the body of a document matches the phrases. We use co-occurrence clustering and association rule mining algorithms to learn phrase definitions from high-dimensional data sets. We also provide experimental results.

Hung. V. Nguyen, P. Velamuru, D. Kolippakkam, H. Davulcu, H. Liu, M. Ates

Extending a Web Browser with Client-Side Mining

We present WBext (Web Browser extended), a web browser extended with client-side mining capabilities. WBext learns sophisticated user interests and browsing habits by tailoring and integrating data mining techniques including association rules mining, clustering, and text mining, to suit the web browser environment. Upon activation, it automatically expands user searches, re-ranks and returns expanded search results in a separate window, in addition to returning the original search results in the main window. When a user is viewing a page containing a large number of links, WBext is able to recommend a few links from those that are highly relevant to the user, considering both the user’s interests and browsing habits. Our initial results show that WBext performs as fast as a common browser and that it greatly improves individual users’ search and browsing experience.

Hongjun Lu, Qiong Luo, Yeuk Kiu Shun

Real-Time Segmenting Time Series Data

There has been increased interest in time series data mining recently. In some cases, approaches of real-time segmenting time series are necessary in time series similarity search and data mining, and this is the focus of this paper. A real-time iterative algorithm that is based on time series prediction is proposed in this paper. Proposed algorithm consists of three modular steps. (1) Modeling: the step identifies an autoregressive moving average (ARMA) model of dynamic processes from a time series data; (2) prediction: this step makes k steps ahead prediction based on the ARMA model of the process at a crisp time point. (3) Change-points detection: the step is what fits a piecewise segmented polynomial regressive model to the time series data to determine whether it contains a new change point. Finally, high performance of the proposed algorithm is demonstrated by comparing with Guralnik-Srivastava algorithm.

Aiguo Li, Shengping He, Zheng Qin

An Efficient Data Mining Algorithm for Discovering Web Access Patterns

In this paper, we propose a data mining technology to find non-simple frequent traversal patterns in a web environment where users can travel from one object to another through the corresponding hyperlinks. We keep track and remain the original user traversal paths in a web log, and apply the proposed data mining techniques to discover the complete traversal path which is traversed by a sufficient number of users, that is, non-simple frequent traversal patterns, from web logs. The non-simple frequent traversal patterns include forward and backward references, which are used to suggest potentially interesting traversal path to the users. The experimental results show that the discovered patterns can present the complete browsing paths traversed by most of the users and our algorithm outperforms other algorithms in discovered information and execution times.

Show-Jane Yen, Yue-Shi Lee

Applying Data Mining Techniques to Analyze Alert Data

Architecture of the policy-based network management has a hierarchical structure that consists of management layer and enforcement layer. A security policy server in the management layer should be able to generate new policy, delete, update the existing policy and decide the policy when security policy is requested. Therefore the security policy server must analyze and manage alert messages received from policy enforcement system. In this paper, we propose an alert analyzer with data mining engine. It is a helpful system to manage the fault users or hosts. The implemented mining system supports the alert analyzer and the high level analyzer efficiently for the security policy management.

Moonsun Shin, Hosung Moon, Keunho Ryu, KiYoung Kim, JinOh Kim

Web Clustering, Ranking and Profiling

Web Page Clustering: A Hyperlink-Based Similarity and Matrix-Based Hierarchical Algorithms

This paper proposes a hyperlink-based web page similarity measurement and two matrix-based hierarchical web page clustering algorithms. The web page similarity measurement incorporates hyperlink transitivity and page importance within the concerned web page space. One clustering algorithm takes cluster overlapping into account, another one does not. These algorithxms do not require predefined similarity thresholds for clustering, and are independent of the page order. The primary evaluations show the effectiveness of the proposed algorithms in clustering improvement.

Jingyu Hou, Yanchun Zhang, Jinli Cao

A Framework for Decentralized Ranking in Web Information Retrieval

Search engines are among the most important applications or services on the web. Most existing successful search engines use global ranking algorithms to generate the ranking of documents crawled in their databases. However, global ranking of documents has two potential problems: high computation cost and potentially poor rankings. Both of the problems are related to the centralized computation paradigm. We propose to decentralize the task of ranking. This requires two things: a decentralized architecture and a logical framework for ranking computation. In the paper we introduce a ranking algebra providing such a formal framework. Through partitioning and combining rankings, we manage to compute document rankings of large-scale web data sets in a localized fashion. We provide initial results, demonstrating that the use of such an approach can ameliorate the above-mentioned problems. The approach presents a step towards P2P Web search engines.1

Karl Aberer, Jie Wu

A Web User Profiling Approach

People display regularities in almost everything they do. This paper proposes characteristics of an idealized algorithm that would allow an automatic extraction of web user profil based on user navigation paths. We describe a simple predictive approach with these characteristics and show its predictive accuracy on a large dataset from KDD-Cup web logs (a commercial web site), while using fewer computational and memory resources. To achieve this objective, our approach is articulated around three notions: (1) Applying probabilistic exploration using Markov models. (2) Avoiding the problem of Markov model high-dimensionality and sparsity by clustering web documents, based on their content, before applying the Markov analysis. (3) Clustering Markov models, and extraction of their gravity centers. On the basis of these three notions, the approach makes possible the prediction of future states to be visited in k steps and navigation sessions monitoring, based on both content and traversed paths.

Younes Hafri, Chabane Djeraba, Peter Stanchev, Bruno Bachimont

A New Algorithm for Performing Ratings-Based Collaborative Filtering

Collaborative filtering is the most successful recommender system technology to date. It has been shown to produce high quality recommendations, but the performance degrades with the number of customers and products. In this paper, according to the feature of the rating data, we present a new similarity function Hsim(), and a signature table-based Algorithm for performing collaborative filtering. This method partitions the original data into sets of signature, then establishes a signature table to avoid a sequential scan. Our preliminary experiments based on a number of real data sets show that the new method can both improve the scalability and quality of collaborative filtering. Because the new method applies data clustering algorithms to rating data, predictions can be computed independently within one or a few partitions. Ideally, partition will improve the quality of collaborative filtering predictions. We’ll continue to study how to further improve the quality of predictions in the future research.

Fengzhao Yang, Yangyong Zhu, Bole Shi

Payment and Security

Architecture for a Component-Based, Plug-In Micro-payment System

Micro-payment systems have the potential to provide non-intrusive, high-volume and low-cost pay-as-you-use services for a wide variety of web-based applications. However, adding micro-payment support to web-sites is usually time-consuming and intrusive, both to the web site’s software architecture and its user interface implementation. We describe a plug-in, component model for adding micro-payment support to web applications. We use J2EE software components to encapsulate micro-payment E-coin debiting and redemption and discrete user interface enhancement. A CORBA infrastructure is used to inter-connect J2EE and non-J2EE vendors and micro-payment brokers. We demonstrate the feasibility of our approach with an on-line, pay-as-you-use journal portal example and outline an approach to using web services to further generalize our architecture.

Xiaoling Dai, John Grundy

Verifying the Purchase Request in SET Protocol

The Secure Electronic Transaction (SET) protocol has been jointly developed by Visa and MasterCard toward achieving secure online-transactions. This paper presents formal verification of the Purchase Request phase of SET, by using ENDL (extension of non-monotonic logic). The analysis unveils some potential flaws. To overcome these vulnerabilities, some feasible countermeasures are proposed accordingly during the validation. Also, the modelling of Purchase Request is described to implement the mechanically model checking instead of manual verification.

Qingfeng Chen, Chengqi Zhang, Shichao Zhang, Chunsheng Li

Self-organizing Coefficient for Semi-blind Watermarking

In this paper, we present a watermarking scheme based on the DWT (Discrete Wavelet Transform) and the ANN (Artificial Neural Network) to ensure the copyright protection of the digital images. To embed the watermark, the interested regions where the watermark is embed must be decided by the SOFM (Self-Organizing Feature Maps). Among the classified nodes, we select the middle of nodes and establish the average of node as threshold. The established threshold applies the only wavelet coefficients of the selected node, so we can reduce the time cost. Using the SOFM that much safer than other algorithms because unauthorized user can’t know the result of training by the SOFM. So even the watermark casting process is in public, the attackers or unauthorized users still cannot remove the watermark from the watermarked image. As the result, the fidelity of the image is excellent than any other algorithm, and the process is good at the strength test-filtering, geometric transform and etc. Furthermore, it is also robust in JPEG compression as well.

Sung-kwan Je, Chang-jin Seo, Jin-young Lee, Eui-young Cha

Applying RBAC Providing Restricted Permission Inheritance to a Corporate Web Environment

A successful marriage of Web and RBAC technology can support effective enterprise-wide security in large-scale systems. But RBAC has a role hierarchy concept that senior role inherits all permissions of junior roles. In the corporate environments, senior role need not have all authority of junior roles, and unconditional inheritance in role hierarchy causes undesirable side effects(permission abuse) and violates the principle of least privilege. In this paper1, we re-explore role and permission inheritance and propose a new model providing restricted permission inheritance. To do this, we divide a single role into sub-roles(Corporate/Department Common role, Restricted Inheritance role, Private Role) based on the degree of inheritance and business characteristics and make role hierarchy with sub-roles. It is very useful to solve unconditional inheritance problem in a corporate environment. And we describe formal description of proposed model. Lastly, we show a system architecture applying RBAC with proposed model within a corporate web environment.

YongHoon Yi, MyongJae Kim, YoungLok Lee, HyungHyo Lee, BongNam Noh

eSignature Verification on Web Using Statistical Mining Approach

This research is related to the field of biometrics. The biometrics research consists of fingerprint scans, retina scans, voiceprint analyses, and so on [1]. Although an electronic signature (eSignature) does not actually come from human, it comes from an indirect tissue (i.e. handwriting) of a human. For instance, a handwritten signature will be collected from a cardholder when filling out the application form of credit card and formularized from a normal signature to an electronic signature. This eSignature will then be transmitted and stored into XML document in a data center. We will extract the eSignature that is a group of numbers from the database. This group of numbers is a factor in preceding the Online Analytical Mining (OLAM) [2]. We use the Internet as a network channel. We will also use the XML-RPC [6] to implement the active rules and apply OLAM to verify incoming eSignatures.

Joseph Fong, San Kuen Cheung, Irene Kwan

Web Application Architectures

Deriving Architectures of Web-Based Applications

Web-based applications that features intensive date manipulation, user-interaction and complicated business processing have been widely used especially in areas such as e-commerce. They are comparable to traditional GUI client/server applications in terms of functionality, structure and development activities but unique in the fact that they must use the web as an infrastructure for their deployment and execution. In this paper we present a procedure of deriving an architecture specific for web-based applications based on the analysis their features attributed to the uniqueness. We illustrate how this architecture accommodates the features and compare it with other commonly used architectures.

Weiquan Zhao, David Kearney

A New Agent Framework with Behavior Delegation Using SOAP

The functional extension of intelligent agents has been a difficult problem because typical software should be edited and recompiled when it needs modifications or replacements in functions after launching. To extend agent’s functions dynamically, it is desirable to separate functions from the agent’s hard-coded source. In this paper, we propose a new agent framework based on the concept of behavior delegation. We design a new behavior description language, called BDL, for users to assemble agent functions without programming. All behaviors in agent are executed on external server using SOAP. Proposed BDL editor provides users with easy way to assemble agent applications. An example, called Intelligence Price Finder, is implemented to show the use of proposed BDL and the editor.

Ki-Hwa Lee, Eui-Hyun Jung, Hang-Bong Kang, Yong-Jin Park

Improving the Web Presentation Layer Architecture

In this paper we provide a discussion of the Model 2 architecture for web interface programming. We show how the main purpose of Model 2, namely separation of concerns, can be achieved solely by functional decomposition. Enabling technology for this is NSP, a typed, composable server pages technology. The chosen approach is seamlessly integrated with Form-Oriented Analysis.

Dirk Draheim, Elfriede Fehr, Gerald Weber

An Event Based Approach to Web Service Design and Interaction

This paper advocates an approach to web service design and interaction that is based on web services simultaneously participating in shared business events. In contrast to one-to-one method invocations, such events are broadcast in parallel to all web services that participate in it. Moreover, the transactional business events are distinguished from non-transactional attribute inspections. The paper first discusses the role of the business event concept as the cornerstone for a methodical analysis and design phase. Then, it is shown how the event broadcasting paradigm can be implemented by means of SOAP messaging.

Wilfried Lemahieu, Monique Snoeck, Cindy Michiels, Frank Goethals

Active Document Framework ADF: Concept and Method

This paper proposes the concept and method of the active document framework ADF, which is a self-representable, self-explainable and self-executable document mechanism. The content of the document is reflected by granularity hierarchy, template hierarchy, background knowledge, and semantic links between document fragments. An ADF has a set of build-in engines for browsing, retrieving, and reasoning. It can work in the manners that are best suited to the content. The ADF supports not only the browse and retrieval services but also intelligent services like complex question answering and online teaching. The client side service provision mechanism is only responsible for obtaining the required ADF mechanism that provides the particular information services. This improves the current Web information retrieval approaches in efficiency, preciseness and mobility of information services as well as enabling intelligent services.

Hai Zhuge

Advanced Applications

Goal-Oriented Analysis and Agent-Based Design of Agile Supply Chain

In dynamic enterprise environment, it is a pressing problem to reconfigure supply chain swiftly with the formation and dissolution of virtual enterprises. Multiagent technology provides a promising solution to this problem. In this paper, we present the approach for developing agent-based agile supply chain from goal-oriented analysis to agent-based design. To begin with, a meta-model used for the analysis and design of agile supply chain is given, which defines the abstract entities identified in the analysis and design phase as well as their relationship, such as goal, role, activity, rule and agents. In the analysis phase, the goal of supply chain is represented in AND-OR goal graph and then activities, roles and business rules are identified accordingly. In the design phase, roles are late binded (assigned) to agents and thus the analysis model can be reused. Finally, the architecture of multiagent supply chain based on CORBA platform is described.

Dong Yang, Shen-sheng Zhang

Methodologies and Mechanism Design in Group Awareness Support for Internet-Based Real-Time Distributed Collaboration

The first purpose of this paper is to provide an overview of the most commonly-used awareness mechanisms, namely, What You See Is What I See, telepointers, multi-user scrollbars, radar views and distortion-oriented views. These mechanisms were derived from researchers’ intuition, without prior experimental investigation of what awareness information end-users really need. This research utilised a completely user-centered approach to determine relevant awareness mechanisms. The novelty of this approach is in the use of usability experiments to identify awareness mechanisms. In addition to the illustration of several innovative mechanisms, this research has also successfully differentiated the importance of different awareness information in maintaining group awareness. The significance of different awareness information has been thoroughly compared. These results help designers to know which information must be provided to all team members.

Minh Hong Tran, Gitesh K. Raikundalia, Yun Yang

Statistics Based Predictive Geo-spatial Data Mining: Forest Fire Hazardous Area Mapping Application

In this paper, we propose two statistics based predictive geo-spatial data mining methods and apply them to predict the forest fire hazardous area. The proposed prediction models used in geo-spatial data mining are likelihood ratio and conditional probability methods. In these approaches, the prediction models and estimation procedures depend on the basic quantitative relationships of geo-spatial data sets relevant to the forest fire with respect to the selected areas of previous forest fire ignition. In order to make the prediction map for the forest fire hazardous area prediction map using the two proposed prediction methods and evaluate the performance of prediction power, we applied a FHR (Forest Fire Hazard Rate) and a PRC (Prediction Rate Curve) respectively. When the prediction power of the two proposed prediction models is compared, the likelihood ratio method is more powerful than the conditional probability method. The proposed model for prediction of the forest fire hazardous area would be helpful to increase the efficiency of forest fire management such as prevention of forest fire occurrences and effective placement of forest fire monitoring equipment and manpower.

Jong Gyu Han, Keun Ho Ryu, Kwang Hoon Chi, Yeon Kwang Yeon

Knowledge Representation, Ontologies, and the Semantic Web

A unified representation for web data and web resources, is absolutely necessary in nowdays large scale Internet data management systems. This representation will allow for the machines to meaningfully process the available information and provide semantically correct answers to imposed queries. Ontologies are expected to play an important role towards this direction of web technology which defines the so called, Semantic Web. The goal of this paper is to provide an overview of the Knowledge Representation (KR) techniques and languages that can be used as standards in the Semantic Web.

Evimaria Terzi, Athena Vakali, Mohand-Saïd Hacid

Web Wrapper Validation

Web wrapper extracts data from HTML document. The accuracy and quality of the information extracted by web wrapper relies on the structure of the HTML document. If an HTML document is changed, the web wrapper may or may not function correctly. This paper presents an Adjacency-Weight method to be used in the web wrapper extraction process or in a wrapper self-maintenance mechanism to validate web wrappers. The algorithm and data structures are illustrated by some intuitive examples.

Eng-huan Pek, Xue Li, Yaozong Liu

Web and Multimedia

Web-Based Image Retrieval with a Case Study

Advances in content-based image retrieval(CBIR)lead to numerous efficient techniques for retrieving images based on their content features, such as colours, textures and shapes. However, CBIR to date has been mainly focused on a centralised environment, ignoring the rapidly increasing image collection in the world, the images on the Web. In this paper, we study the problem of distributed CBIR in the environment of the Web where image collections are represented as normal and typically autonomous websites. After an analysis of challenging issues in applying current CBIR techniques to this new environment, we explore architectural possibilities and discuss their advantages and disadvantages. Finally we present a case study of distributed CBIR based exclusively on texture features. A new method to derive texture-based global similarity ranking suggests that, with a deep understanding of feature extraction algorithms, it is possible to have a better and more predictable way to merge local rankings from heterogeneous sources than using the commonly used method of assigning different weights.

Ying Liu, Danqing Zhang

Extracting Content Structure for Web Pages Based on Visual Representation

A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure. Experiments show satisfactory results.

Deng Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma

XML Data Integration and Distribution in a Web-Based Object Video Server System

Data integration and distribution, albeit “old” topics, are necessary for developing a distributed video server system which can support multiple key functions such as video retrieval, video production and editing capabilities. In a distributed object video server (DOVS) system, objects from (homogeneous and heterogeneous) servers usually need to be integrated for efficient operations such as query processing and video editing. On the other hand, due to practical factors and concerns (such as resource cost and/or intellectual property concerns), raw/source video files often need to be well protected. XML is becoming the standard for multimedia data description (e.g. MPEG-7), and is very suitable for Web-based data presentation owing to its expressiveness and flexibility. In this paper, we describe our approach to process XML descriptions for data integration and distribution in a Web-based DOVS system.

Shermann Sze-Man Chan, Qing Li

Network Protocols

Genetic Algorithm-Based QoS Multicast Routing for Uncertainty in Network Parameters

This paper discusses the multicast routing problem with multiple QoS constraints in networks with uncertain parameters, and describes a network model that is suitable to research such QoS multicast routing problem. The paper mainly presents GAQMR, a multicast routing policy for Internet, mobile network or other high-performance networks, that is based on the genetic algorithm, and can provide QoS sensitive paths in a scalable and flexible way, in the networks environment with uncertain parameters. The GAQMR can also optimize the network resources such as bandwidth and delay, and can converge to the optimal or near-optimal solution within few iterations, even for the networks environment with uncertain parameters. The incremental rate of computational cost can close to polynomial and is less than exponential rate. The performance measures of the GAQMR are evaluated using simulations. The results shows that GAQMR provides an available approach to QoS Multicast routing for uncertainty in network parameters.

Layuan Li, Chunlin Li

Tagged Fragment Marking Scheme with Distance-Weighted Sampling for a Fast IP Traceback

IP traceback technique allows a victim to trace the routing path that an attacker has followed to reach his system. It has an effect of deterring future attackers as well as capturing the current one. FMS (Fragment Marking Scheme) is an efficient implementation of IP traceback. Every router participating in FMS leaves its IP information on the passing-through packets, partially and with some probability. The victim, then, can collect the packets and analyze them to reconstruct the attacking path. FMS and similar schemes, however, suffer a long convergence time to build the path when the attack path is lengthy. Also they suffer a combinatorial explosion problem when there are multiple attack paths. This paper suggests techniques to restrain the convergence time and the combinatorial explosion. The convergence time is reduced considerably by insuring all routers have close-to-equal chance of sending their IP fragments through a distance-weighted sampling technique. The combinatorial explosion is avoided by tagging each IP fragment with the corresponding router’s hashed identifier.

Ki Chang Kim, Jin Soo Hwang, Byung Yong Kim, Soo-Duk Kim

An Ant Algorithm Based Dynamic Routing Strategy for Mobile Agents

Routing strategy is one of the most important aspects in a mobile agent system, which is a complex combinatorial problem. Most of current mobile agent systems adopt static routing strategies, which don’t consider dynamic network status and host status. This is a hinder to the performance and autonomy of mobile agents. Ant Algorithm is good at solving such kind of problems. After analyzing existing routing strategies of typical mobile agent systems, this paper summarizes factors that may affect routing strategy of mobile agents, proposes an Ant Algorithm based dynamic routing strategy by using both experience and network environment such as resource information, network traffic, host workload, presents an acquiring and storing method of routing parameters and decision rules according to the major characteristics of mobile agent migration. The simulation experiment is implemented and the results show our dynamic routing strategy can effectively improve the performance and autonomy of mobile agents.

Dan Wang, Ge Yu, Mingsong Lv, Baoyan Song, Derong Shen, Guoren Wang

Integration of Mobile IP and NHRP over ATM Networks

In this paper, we propose a scheme to integrate the Mobile IP and NHRP over NBMA networks including ATM network. This paper also defines the signaling and control mechanisms required to integrate NHRP and Mobile IP. The integration decreases the end-to-end path delay between a MN and CN by using the features of ATM which are fast switching and high scalability. We mathematically analyze the end-to-end path delay between end hosts in integrated Mobile IP networks, also shows the improvement of delay by simulation.

Tae-Young Byun, Moo-Ho Cho

Workflow Management Systems

e_SWDL : An XML Based Workflow Definition Language for Complicated Applications in Web Environments

e_SWDL is the workflow definition language of a prototype WfMS-e_ScopeWork, which is designed to support complex cross-enterprises workflow applications among heterogeneous sites by using XML approach. On this basis, e_SWDL follows WfMC’s XML-based process definition language standard (XPDL), and makes necessary extensions for semantics-rich modeling ability in three major aspects: (1) the complicated transitions between tasks for workflow process modeling; (2) the workflow relevant data and workflow environment data for data modeling; and (3) the role, participant and participant group for organization modeling. Furthermore, Compensation entities (CDSet) are provided for failure handling of distributed workflow scheduling, and Concurrency entities (ConSet) are provided for correctness of concurrent workflow concurrency execution. e_SWDL provides strong modeling ability for complicated workflow logic and suits distributed and heterogeneous Web environments.

Wei Ge, Baoyan Song, Derong Shen, Ge Yu

An Efficient User Task Handling Mechanism Based on Dynamic Load-Balance for Workflow Systems

User task is one of the major task types of complicated workflow applications. The way how to handle user tasks impacts the performance of a workflow system significantly, which involves many issues such as description of the duty of each participant, calculation of the workload of each participant, and policy to dispatch work items among participants. After analyzing the characteristics of user tasks, this paper proposes an efficient user task handling mechanism based on dynamic load-balance approach. To do this, the organization model and the workload model are defined, the load-balance policies and the workload dispatching algorithms are designed, and the implementing techniques in a prototype WfMS — e_ScopeWork are presented. The performance experiments are made and show that the new mechanism can improve the workflow system performance effectively.

Baoyan Song, Ge Yu, Dan Wang, Derong Shen, Guoren Wang

Exception Specification and Handling in Workflow Systems

Various unexpected events frequently happen in workflow system supporting web-based business processes. Thus workflow system should be equipped with handlers to cope with the unexpected events. But in practical terms, we cannot expect for a workflow system to prepare all the handlers for events that might potentially occur. It is more reasonable to let process designers specify exceptional situations and define corresponding exception handlers at process build time. At that time, when exceptional events occur, the workflow system detects the exceptions and invokes corresponding exception handlers. To support this mechanism, a workflow system should provide a means of specifying exceptions and facilities to detect exceptions and invoke corresponding exception handlers. In this paper, we devise an exception specification method using an event-transition approach and its handling mechanism using a design pattern. Detecting exceptions and mechanism for invoking exception-handling routines are developed and incorporated into our research workflow system (ICU/COWS).

Yoonki Song, Dongsoo Han

Key Issues and Experiences in Development of Distributed Workflow Management Systems

Research on workflow technology has been promoting the development of workflow management systems (WfMSs). Compared to centralised WfMSs, the development of distributed WfMSs are much more complex. Therefore, it needs to analyse their development from the software engineering perspective. This paper first presents the development lifecycle of distributed WfMSs, and then discusses issues of enterprise-wide workflow modeling and implementation of distributed WfMSs. Furthermore, we report on a case study and discuss issues related to the implementation of some components.

Hongchen Li, Yun Yang, Meilin Shi

Advanced Search

Intelligent Search for Distributed Information Sources Using Heterogeneous Neural Networks

As the number and diversity of distributed information sources on the Internet exponentially increase, various search services are developed to help the users to locate relevant information. But they still exist some drawbacks such as the difficulty of mathematically modeling retrieval process, the lack of adaptivity and the indiscrimination of search. This paper shows how heterogeneous neural networks can be used in the design of an intelligent distributed information retrieval (DIR) system. In particular, three typical neural network models — Kohoren’s SOFM Network, Hopfield Network, and Feed Forward Network with Back Propagation algorithm are introduced to overcome the above drawbacks in current research of DIR by using their unique properties. This preliminary investigation suggests that Neural Networks are useful tools for intelligent search for distributed information sources.

Hui Yang, Minjie Zhang

A Localness-Filter for Searched Web Pages

With the spreading of the Internet, information about our daily life and our residential region is becoming to be more and more active on the WWW (World Wide Web). That’s to say, there are a lot of Web pages, whose content is ‘local’ and may only interest residents of a narrow region. The conventional information retrieval systems and search engines, such as Google[1], Yahoo[2], etc., are very useful to help users finding interesting information. However, it’s not yet easy to find or exclude ‘local’ information about our daily life and residential region. In this paper, we propose a localness-filter for searched Web pages, which can discover and exclude information about our daily life and residential region from the searched Web pages. We compute the localness degree of a Web page by 1) estimating its region dependence: the frequency of geographical words and the content coverage of this Web page, and 2) estimating the ubiquitousness of its topic: in other words, we estimate if it is usual information that appears everyday and everywhere in our daily life.

Qiang Ma, Chiyako Matsumoto, Katsumi Tanaka

DEBIZ: A Decentralized Lookup Service for E-commerce

Existing e-commerce specifications such as ebXML and UDDI manage resource by logical centralized approach, which lead to single point failure and performance bottleneck. As a new computing model, peer-to-peer addresses existing e-commerce resource management problems in a natural way. This paper presents DEBIZ, a decentralized service for resource management in e-commerce. In DEBIZ, resource is managed in peer-to-peer approach, and metadata of the resource and query message is presented in XML document. Experimental results show that DEBIZ has lower space overhead, well-balanced load, and good robustness.

Zeng-De Wu, Wei-Xiong Rao, Fan-Yuan Ma

Design of B+Tree-Based Predicate Index for Efficient Event Matching

Efficient event matching algorithms are the core of publish/subscribe systems. Such algorithms are typically designed based on memory structure for performance reasons. Given the explosive growth of information, it is not always practically feasible to keep the index for event filtering memory-resident, thereby necessitating the need for a secondary storage structure. Incidentally, even though search algorithms designed for active databases and spatio-temporal databases are applicable to publish/subscribe systems, these algorithms are not specifically designed for publish/subscribe systems which require both fast search as well as efficient support for dynamic insertions and deletions. To address this problem, we propose a predicate index for secondary storage structures with space complexity O(n) and search time complexity O(log n). Analytical comparison of our proposed algorithms with existing work indicates that our secondary storage predicate index is efficient for event matching.

Botao Wang, Wang Zhang, Masaru Kitsuregawa

Data Allocation and Replication

Data Replication at Web Proxies in Content Distribution Network

This paper investigates the problem of optimally replicating objects at the candidate proxies in content distribution network. In our model, each proxy in the set of candidates has a finite storage capacity for replicating objects and charges fee for use. The optimization problem is to find a set of proxies from candidates set for replicating objects at them such that the total access cost is minimized, subject to the constraints that the objects placed at a proxy should not exceed the storage capacity of the proxy and the total fees charged by the proxies should not exceed a pre-specified budget. We formulate this problem as a combinational optimization problem and show that this optimization problem is NP complete. We propose two heuristics and evaluate them by simulation. The simulation results show that these two heuristics could significantly reduce the access cost.

Xuanping Zhang, Weidong Wang, Xiaopeng Tan, Yonghu Zhu

A Hash-Based Collaborative Transcoding Proxy System

This paper proposes a hash-based collaborative transcoding proxy system for heterogeneous client environment. The system aims to improve the system performance in two aspects, caching efficiency and workload balancing. This system employs a hash-based object caching strategy, which optimizes the cache storage utilization by removing redundant objects scattering in different locations of the system cache. In addition, cache replacement algorithm deployed at the transcoding proxy is examined. Conclusion is drawn that object access rate should be considered when making eviction decision. On the other hand, a hash-based workload distribution strategy is proposed to share the expensive transcoding load among proxies. This strategy performs well with balanced hash function. With unbalanced hash function, satisfactory load sharing is achieved by an optimized strategy, which allows the overloaded proxy to outsource some transcoding tasks to less overloaded neighbors.

Xiu Wu, Kian-Lee Tan

Dynamic Materialized View Management Based on Predicates

For the purpose of satisfying different users’ profiles and accelerating the subsequence OLAP (Online Analytical Processing) queries in a large data warehouse, dynamic materialized OLAP view management is highly desirable. Previous work caches data as either chunks or multidimensional range fragments. In this paper, we focus on ROLAP (Relational OLAP) in an existing relational database system. We propose a dynamic predicate-based partitioning approach, which can support a wide range of OLAP queries. We conducted extensive performance studies using TPC_H benchmark data on IBM DB2 and encouraging results are obtained which indicate that our approach is highly feasible.

Chi-Hon Choi, Jeffrey Xu Yu, Hongjun Lu

DPR: A Dynamic Partial Replication Protocol Based on Group Communication for a Web-Enable Database Cluster

This paper proposes a dynamic partial replication protocol based upon group communication system for use with a web-enable database cluster. It dynamically combines the advantages of both a partial and a full replication model according to a query pattern. Most eager-update replication protocols that have been suggested as the best replication for a database cluster are based on the full replication. However, an actual database cluster system needs partial replication rather than full replication to achieve high throughputs and scalability. The proposed Dynamic partial Replication (DPR) protocol guarantees consistency among replicas and reduces the overhead due to remote access inherent in the previous partial replication protocols. The proposed protocol consists of three parts: partial replica control, scale-out factor estimation and dynamic replica allocation. Partial replica control part is the framework for the DPR protocol. Scale-out factor estimation part determines the optimal number of replicas according to the current query pattern and access frequency to maximize throughput and efficiency. Dynamic replica allocation part creates or removes the temporary replica in a local site. The simulated evaluation shows that the proposed protocol outperforms the existing eager-update protocols, achieving improvements of approximately 16% in response time and 20% in scalability.

Chung-Ho Lee, Jae-Dong Lee, Hae-Young Bae

Springer Professional

Table of Contents

Frontmatter