Skip to main content

2004 | Buch

Content Computing

Advanced Workshop on Content Computing, AWCC 2004, ZhenJiang, JiangSu, China, November 15-17, 2004. Proceedings

herausgegeben von: Chi-Hung Chi, Kwok-Yan Lam

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Session 1: Mobile Code and Agent Technology

Mobility Prediction-Based Wireless Resource Allocation and Reservation

Due to the mobility uncertainty of mobile user, it is a real challenge for wireless network to efficiently allocate and reserve resource. Firstly, motivated from a rationale which a good data compressor should be a good predictor, this paper develops a mobility prediction algorithm based on the Ziv-Lempel algorithm, which is both theoretically optimal and good in practice. Theoretically, the prediction algorithm can predict not only to which cell a mobile user will handoff but also when the handoff will occur. Then, we propose an efficient resource allocation and reservation scheme, called predict-based GC, which integrates the prediction algorithm into the guard channels (GC) policy. The simulation results show that the time-complexity of our proposed scheme (i.e., predict-based GC) is worse, but it outperforms Fixed-percent and ExpectedMax in the QoS support effectiveness.

Xiaolong Yang, Qianbin Chen, Youju Mao, Keping Long, Bin Ma
An Agent-Enabled Content-Explicit Authorization Model for OGSA-Compliant Grid

Traditional methods for authorization within Grid computing have many shortcomings. Firstly, the enrolments of users and services into the server are done manually, which is not adaptable to the dynamic environment where the service providers and service consumers join or leave dynamically. Secondly, the authorization policy language is not expressive enough to represent more complex policies, and can’t resolve the problem of semantic inconsistency between different parties who treat the same policy. This paper takes advantage of characteristics of intelligent agent such as autonomy, proactivity, and sociality, to treat with the authorization issues, including automated registrations and management of agents (service providers and service consumers), autonomous authentication and authorization based on policies, etc. On the other hand, an ontology-based content-explicit policy modeling framework is presented, which resolves the semantic inconsistency problem among different parties.

Yunliang Jiang, Beishui Liao, Yong Liu, Jun Hu
A Predictable Mobile Agent Computation Model and Its Fabric Architecture

Using the fabric of virtual organization architecture, a novel formalized mobile agent computation model is defined. In this model, all the actions (e.g. service, migration and communication etc.) of the mobile agents are treated as states. The process of the mobile agents’ workflow is controlled by a finite-state-machine. This ensures the atomic action for each mobile agent to avoid the abnormal condition of communication mismatch. We propose a tolerance named service density of group, which will greatly decrease the probability of the mobile agent waiting for resource. It also can balance the service occupancy for the whole network.

Yong Liu, Congfu Xu, Zhaohui Wu, Yunhe Pan
A Novel Reverse Rerouting Mechanism in Mobile Wireless Internet

In this paper, based on the analysis of existing partial and complete rerouting mechanisms, a novel reverse rerouting mechanism in mobile wireless Internet is presented, taking the characteristics of handover into account. It integrates a reverse rerouting algorithm with the resource handover and reservation mechanisms. Simulation results have shown that the signaling overhead for handover gets effective control, high resource utilization is achieved, and QoS (Quality of Service) guarantees are provided to mobile users to certain degree.

Xingwei Wang, Bo Song, Changqing Yuan, Huang Min
An Agents Based Grid Infrastructure of Social Intelligence

The Grid and agent communities both develop concepts and mechanisms for open distributed systems, albeit from different perspectives. The Grid community has focused on infrastructure, tools and application for reliable and secure resource sharing within dynamic and geographically distributed virtual organizations. In contrast, the agents community has focused on autonomous problem solvers that can flexibly in uncertain and dynamic environments. Yet as the scale and ambition of both Grid and agent deployments increase, we see that multi-agent systems require robust infrastructure and Grid systems require autonomous, flexible behaviors. So, an Agent Based Grid Infrastructure of Social Intelligence (ABGISI) is presented in this paper. With multi-agents cooperation as main line, this paper expatiates on ABGISI from three aspects: agent information representation; the support system for agent social behavior, which includes agent mediate system and agent rational negotiation mechanism, and agent federation structure.

Jun Hu, Ji Gao, Beishui Liao, Jiujun Chen
Agent Aided Workflow Modeling

Nowadays, workflow processes are mostly built out of the designers’ experience, which is usually full of skills and tactics and the correctness of the resulting processes is hard to guarantee, especially for complex ones. To address this issue, a tool called AAWM (Agent Aided Workflow Modeler) is proposed. This paper details the design and implementation issues related to AAWM such as system architecture, user interface and so on.

Jinlei Jiang, Meilin Shi

Session 2: Content Sharing and Consistency Management

An Improved Hybrid Method of Maintaining Content Consistency

Content distribution networks use certain mechanisms to guarantee that the replicated documents are consistent with the original documents undergoing updates. In this paper, we present an improved hybrid consistency method based on an existing algorithm which dynamically combines server-side propagation and invalidation. A new threshold, in terms of the ratio of a document’s request rate to its update rate, is established to determine which approach (invalidation or propagation) should be used. In addition, the improved algorithm makes consistency enhancement decisions based on request temporal locality and document size. Simulation results show that the improved algorithm reduces network traffic, achieves high request freshness rates, and introduces little extra response time when compared to other algorithms.

Changming Ma, Daniel Cooke
Advanced Architecture for Distributed Systems with a Network Infrastructure Based on CAN and Internet for Content Distribution

A new layered architecture for the implementation of intelligent distributed control systems is proposed. This architecture distinguishes four levels in a distributed system. Upper layer consists of a digital control layer, where high level decisions are taken. This level is implemented by means of intelligent software agents and distributed expert systems that carry out the discrete control functions, system supervision as well as diagnosis and fault tolerance. Third layer deals with numeric values, performs analog operations and implement analog control loops. It is also in carry of the conversion from numerical variables values to evaluated expressions. This layer has been implemented by means of neural networks. Networking appears in the second layer, formed by CAN and Internet for content distribution. Finally, every node should implement a hardware interface with the process. Some interesting features provided by this architecture are its low-cost implementation, easy content distribution through the communication infrastructure, distributed execution in virtual generic nodes -with no hardware dependency-, bounded response time and fault tolerance mechanisms.

Juan V. Capella, Alberto Bonastre, Rafael Ors
Distributed Document Sharing with Text Classification over Content-Addressable Network

Content-addressable network is a scalable and robust distributed hash table providing distributed applications to store and retrieve information in an efficient manner. We consider design and implementation issues of a document sharing system over a content-addressable overlay network. Improvements and their applicability on a document sharing system are discussed. We describe our system prototype in which a hierarchical text classification approach is proposed as an alternative hash function to decompose dimensionality into lower dimensional realities. Properties of hierarchical document categories are used to obtain probabilistic class labels which also improves searching accuracy.

Tayfun Elmas, Oznur Ozkasap
Content Distribution Stochastic Fluid Models for Multi-regions P2P Networks

Region P2P networks (called RP2P), based on Chord protocol, are constructed by regions, but not the whole Internet as previous work. Nodes’ joining and leaving, lookups and content forward algorithms are presented. To analyze the performance of the RP2P networks and study the dynamic content distribution process between RP2P networks, the stochastic fluid models as an important fluid-flow analytical model are adopted. The final results we got are as follows: the total number of RP2P networks has a more impact on the get probability of downloading content than the total number of nodes; and the content get probability will tend to be a stable value with the increase of nodes’ number. These results demonstrate that the method of constructing RP2P networks is better than those of building the universal P2P network.

Zhiqun Deng, Dejun Mu, Guanzhong Dai, Wanlin Zhu
Construct Campus Peer-to-Peer Networks

The campus peer-to-peer networks, called CP2P networks, are proposed based on the existing Chord protocol. According to colleges’ IP address ranges and users’ interests, every college network constructs a CP2P network. The performance parameters we got are as follows: The average maximum lookup length of the first node in the CP2P network is O(log2N– log2k). And other nodes’ lookup lengths are O(log2m i ). Here N is the total number of nodes in the whole network; k is the total number of CP2P networks; m i is nodes’ number of each CP2P network. CP2P networks enable the transfer locally and reduce the traffic in the campus network backbone. Meanwhile, nodes join and leave only in the local CP2P network. Security problems such as DDOS can be traced back to the attackers’ source colleges.

Zhiqun Deng, Guanzhong Dai, Dejun Mu, Zhicong Liu

Session 3: Networking Infrastructure and Performance

Fractional Gaussian Noise: A Tool of Characterizing Traffic for Detection Purpose

Detecting signs of distributed denial-of-service (DDOS) flood attacks based on traffic time series analysis needs characterizing traffic series using a statistical model. The essential thing about this model should consistently characterize various types of traffic (such as TCP, UDP, IP, and OTHER) in the same order of magnitude of modeling accuracy. Our previous work [1] uses fractional Gaussian noise (FGN) as a tool for featuring traffic series for the purpose of reliable detection of signs of DDOS flood attacks. As a supplement of [1], this article gives experimental investigations to show that FGN can yet be used for modeling autocorrelation functions of various types network traffic (TCP, UDP, IP, OTHER) consistently in the sense that the modeling accuracy (expressed by mean square error) is in the order of magnitude of 10− −3.

Ming Li, Chi-Hung Chi, Dongyang Long
Performance Analysis of Virtual Time Optimistic Transaction Processing

Aiming at solving the problems in the mobile computing environment such as low bandwidth, frequent disconnection and low battery capacity, we propose an improved optimistic transaction processing method – the virtual time optimistic transaction processing protocol. This protocol improves the performance of optimistic transaction processing by extending the concept of committability or, more specifically, by releasing the constraint of the total order relation between all transactions based on our analysis of the transaction processing approaches from a different angle. In this paper, we first explain and give the algorithm of the virtual time optimistic approach. Then we present and show the result of a simulation on the virtual time optimistic approach. Finally, we make comparison and performance analysis based on the simulation. The comparison and performance analysis show that the protocol has interesting performance gain in the metric of the number of abort.

Cong Liu, Wei Huang, Zhiguo Zhang
A Measurement-Based TCP Congestion Control Scheme

TCP congestion control is being in a dilemma of if it should reckon on routers. Network measurement technology promises a different resolution. By analyzing several important schemes, a measurement-based TCP congestion control scheme basing on Fast is proposed. The basic idea is to introduce a macroscopical guidance layer upon end systems to determine for them appropriate parameter values according to the measured performance of network backbone. Simulation results indicate that this scheme can get a more steady power than Fast by having the bottleneck queue tend to a fixed length under no presumptions upon routers. Finally, the implemented measurement system is briefly introduced.

Lihua Song, Haitao Wang, Ming Chen
BM-VF-SBD: An Efficient Data Channel Scheduling Algorithm to Support QoS for Optical Burst Switching Networks

Currently optical burst switching (OBS) has been regarded as the most promising backbone networking technology for the next-generation Internet. In the OBS network, the data channel scheduling is one of key problems. Bandwidth efficiency and QoS support are its two concern focuses. However, the existing algorithms pay more attentions to bandwidth efficiency. In this paper, we develop an efficient data channel-scheduling algorithm, called BM-VF-SBD. It effectively integrates several mechanisms (i.e., void filliing, burst migration and selective burst discard) to reduce the bandwidth fragment and support QoS. Its basic idea is in that a new burst is scheduled by migrating some bursts to other channels if none of voids in any channels can accommodate it; otherwise repeating the above processes after selectively dropping some bursts. Meanwhile under an effective data structure, such as the balanced binary search tree, its computational complexity will be o((2w+1)log w) at most, and be close to LAUC-VF and ODBR. In the proposed algorithm, burst migration plays a key role in the improvement of bandwidth efficiency while selective burst discard has great effects on the two sides. The simulation results show that it performs much better than LAUC-VF and ODBR in burst loss probability (overall or individual) and bandwidth fragment ratio.

Xiaolong Yang, Demin Zhang, Qianbin Chen, Keping Long, Lianghao Ji
A Predictive Controller for AQM Router Supporting TCP with ECN

Although the P (Proportional) or PI(Proportional-Integral) controller for active queue management improves the stability. It is no systematic method in selecting the controller parameters to guarantee the transient performance, especially in rapidly changing environment. We use Generalized Predictive Control methods to propose a controller for AQM router supporting TCP with slightly modified ECN to enhance the robustness and transient performance of the P and PI controller. The simulation results demonstrated the effectiveness of the proposed controller.

Ruijun Zhu, Haitao Teng, Weili Hu

Session 4: Content Aware Security (I)

Enhancing the Content of the Intrusion Alerts Using Logic Correlation

To solve the problem of the alert flooding and information semantics in the existing IDS, the approach using the logic correction to enhance the content of the alerts is presented. The Chronicle based on time intervals is presented to describe the temporal time constrains among intrusion alerts, and the Chronicle patterns are designed to integrate the alerts of the sequence generated by an attacker into a high-level alert. Then the preparing relation between the high-level alerts is defined and the one-order logic algorithm is applied to correlate these high-level alerts with the preparing relationship. The attack scenario is constructed by drawing the attack graph. In the end an example is given to show the performance of this algorithm in decreasing the number and improving the information semantics of the intrusion alerts.

Liang-Min Wang, Jian-Feng Ma, Yong-Zhao Zhan
Real-Time Emulation of Intrusion Victim in HoneyFarm

Security becomes increasingly important. However, existing security tools, almost all defensive, have many vulnerabilities which are hard to overcome because of the lack of information about hackers techniques or powerful tools to distinguish malicious traffic from the huge volume of production traffic. Although honeypots mainly aim at collecting information about hackers’ behaviors, they are not very effective in that honeypot implementers tend to block or limit hackers’ outbound connections to avoid harming non-honeypot systems, thus making honeypots easy to be fingerprinted. Additionally, the main concern is that if hackers were allowed outbound connections, they may attack the actual servers thus the honeypot could become a facilitator of the hacking crime. In this paper we present a new method to real-time emulate intrusion victims in a honeyfarm. When hackers request outbound connections, they are redirected to the intrusion victims which emulate the real targets. This method provides hackers with a less suspicious environment and reduces the risk of harming other systems.

Xing-Yun He, Kwok-Yan Lam, Siu-Leung Chung, Chi-Hung Chi, Jia-Guang Sun
On the Formal Characterization of Covert Channel

This paper presents a formal characterization model for covert channel. Some characteristic properties are proposed. The system characteristics are used to guide the development of covert channel identification and elimination algorithms. In addition, we audit a covert channel and evaluate the developed algorithms quantitatively with our formalism.

Shiguang Ju, Xiaoyu Song
Availability Analysis and Comparison of Different Intrusion-Tolerant Systems

Based on the adopted redundancy techniques the intrusion-tolerant systems are classified into three kinds: resource redundancy based systems, complete information redundancy based systems, and partial information redundancy based systems. With the description of the generalized stochastic Petri net (GSPN) models, the availabilities of the three kinds of systems are analyzed and compared. The numerical results show that, for the most part, the partial information redundancy based systems have the highest availability and the resource redundancy based systems the lowest, the complete information redundancy based systems the intermediate. Also explained are the situations of the application of these different kinds of intrusion-tolerant systems.

Chao Wang, Jian-Feng Ma
Security Analysis of User Efficient Blind Signatures

Blind signature schemes allow a person to get a message signed by another party without revealing any information about the message to the other party. To believe the message contains a certain form, cut and choose protocol and partially blind signature protocol are used to prevent cheating. In electronic cash system, unconditional anonymity may be misused for criminal activities such as blackmailing and money laundering. Fair electronic cash schemes are introduced for preventing these fraudulent activities. In this paper, we point out a weakness in Fan and Lei’s user efficient blind signatures. Utilizing this weakness, a user can cheat the signer in cut and choose protocol, and the user can also break Fan and Lei’s low-computation partially blind signature scheme and Yu et al.’s user efficient fair e-cash scheme.

Tianjie Cao, Dongdai Lin, Rui Xue

Session 5: Content Aware Security (II)

A Novel DDoS Attack Detecting Algorithm Based on the Continuous Wavelet Transform

Distributed denial-of-service(DDoS) attacks have recently emerged as a major threat to the security and stability of the Internet. As we know, traffic bursts always go with DDoS attacks. Detecting the network traffic bursts accurately in real-time can catch such attacks as quickly as possible. In this paper, we categorize the traffic bursts into three kinds: Single-point-burst, Short-flat-burst and Long-flat-burst, and propose a network traffic burst detecting algorithm (BDA-CWT) based on the continuous wavelet transform. In this algorithm, we use a slip window to analyze the traffic data uninterruptedly to detect the Short-flat-burst or the Long-flat-burst, which always represents DDoS attacks. Our experiment has demonstrated that the proposed detection algorithm is responsive and effective in curbing DDoS attacks, in contrast with the discrete wavelet transform and traditional methods (N-point-average and gradient).

Xinyu Yang, Yong Liu, Ming Zeng, Yi Shi
Enhancing the Scalability of the Community Authorization Service for Virtual Organizations

Grid computing has emerged as a special form of distributed computing and is distinguished from conventional distributed computing by its focus on dynamic, large-scale resource sharing over a wide geographic distribution. Grid Computing System (GCS) is a distributed system infrastructure over which distributed applications with cross-organization resource sharing are operated. Grid applications are modelled by the notion of virtual organization which is generally composed of participants from different organizations driven by specific tasks. In order to control participation and access to shared resource, authorization is essential in VO. Authorization in VO is challenging because of the dynamic and distributed nature of VO. A community authorization service (CAS) was proposed recently to meet the Grid challenges and to enforce fine-grained access control policies in the VO. However, the situation is aggravated when VO is used to model business application systems such as financial systems of commercial enterprises where security and accountability are of key concerns. The emphasis on separation of duties in business applications only make things worse. This paper aims to address these authorization issues when the GCS is used to support business applications. In this paper, we introduce the use of threshold closure as a tool for enhancing the CAS in order for the Grid to better support commercial VO.

Jian-Ping Yong, Kwok-Yan Lam, Siu-Leung Chung, Ming Gu, Jia-Guang Sun
Securing Multicast Groups in Ad Hoc Networks

We propose a reliable and ubiquitous group key distribution scheme that is suitable for ad hoc networks. The scheme has self-initialisation and self-securing features. The former feature allows a cooperation of an arbitrary number of nodes to initialise the system, and it also allows node admission to be performed in a decentralised fashion. The latter feature allows a group member to determine the group key remotely while maintaining the system security.We also consider a decentralised solution of establishing secure point-to-point communication. The solution allows a new node to establish a secure channel with every existing node if it has pre-existing secure channels with a threshold number of the existing nodes.

Hartono Kurnio, Huaxiong Wang, Josef Pieprzyk, Kris Gaj
Improved Privacy-Protecting Proxy Signature Scheme

The proxy signature allows a proxy signer to sign on behalf of an original signer and can be verified by anyone with access to the original signer’s public key. Recently, Dai et al. proposed a privacy-protecting proxy signature scheme. In this scheme, the messages the original signer entrust to the proxy signer to sign on behalf of him are kept secret from the proxy signer during the generation of the proxy signature except the receiver designated by the original signer. Therefore, the privacy of the original signer is protected. Unfortunately, Dai et al.’s scheme is insecure and inefficient. Particularly, the receiver can cheat the proxy signer and obtain a proxy signature on any message. To eliminate these weaknesses, we propose an improved scheme based on Nyberg-Rueppel signature.

Tianjie Cao, Dongdai Lin, Rui Xue
Improving Security Architecture Development Based on Multiple Criteria Decision Making

This paper describes an effort to improve security architecture development of information systems based on the multiple criteria decision making (MCDM) techniques. First, we introduce the fundamental of MCDM, describe how the security architecture is developed and analyze the main problems in the development. Finally, this paper shows how the MCDM techniques were applied to solve two problems in security architecture development. And an approach which could assist in prioritizing threats and selecting security technologies is illustrated. The practices indicate that MCDM techniques are valuable in formulating and solving problems in security architecture development.

Fang Liu, Kui Dai, Zhiying Wang

Session 6: Multimedia Content

A LSB Substitution Oriented Image Hiding Strategy Using Genetic Algorithms

Image hiding is an important technique in information security field. The simplest method is the least significant bit (LSB) substitution method that embeds a secret image in the least significant bits of the pixels of a host image. The embedding idea is simple, but the embedding process may degrade the host image quality so much that a hacker’s suspicion may be raised. To improve the image quality of the stego-image, we applied the LSB substitution and genetic algorithm (GA) to develop two different optimal substitution strategies: one is the global optimal substitution strategy and the other is the local optimal substitution strategy. The experimental results confirm that our methods can provide better image quality than the simple LSB and Wang et al.’s method do while provide large hiding capacity.

Ming-Ni Wu, Min-Hui Lin, Chin-Chen Chang
A Prediction Scheme for Image Vector Quantization Based on Mining Association Rules

Vector Quantization (VQ) is an efficient method for image compression. Many conventional VQ algorithms for lower bit rates, such as SMVQ, consider only adjacent neighbors in determining a codeword. This leads to awful distortion. In this paper, we propose an efficient association rules mining method inspired by an approach widely adopted in data mining, for predicting image blocks based on the spatial correlation. The proposed method is divided into two parts. First, it generates dominant vertical, horizontal, and diagonal association rules of training images. Then it searches for a suitable replacement according to the matched rules. The rule-based method for prediction is more efficient than conventional VQ since finding the matched rules is easier than calculating the distances between codewords. The experimental results show that our method is excellent in the performance in terms of both image quality and compression rate.

Chih-Yang Lin, Chin-Chen Chang
Fuzzy Logic-Based Image Retrieval

Classical mathematic method adopts the rigid logic to measure the similarity of images, and therefore cannot deal with the uncertainty and imprecision exist in the human’s thoughts. This paper imports fuzzy logic method into image retrieval to simulate these properties of human’s thoughts. Different from other researches that also adopt the fuzzy logic method, we emphasis on the followings: (1) adopting the fuzzy language variables to describe the similarity degree of image features, not the features themselves. In this way, we can simulate the nonlinear property of human’s judgments of the image similarity. (2) Making use of the fuzzy inference to instruct the weights assignment among various image features. The fuzzy rules that embed the users’ general perceive of an object guarantee their good robustness to the images of various fields. On the other hand, the user’s subjective intentions can be expressed by the fuzzy rules perfectly. In this paper, we propose a novel shape description method called Minimum Statistical Sum Direction Code (MSSDC). The experiment demonstrates the efficiency and feasibility of our proposed algorithms.

Xiaoling Wang, Kanglin Xie
Deriving Facial Patterns for Specifying Korean Young Men’s 3D Virtual Face from Muscle Based Features

In the work approached here we derive facial patterns defined by shape descriptors for making the feature of the Korean young men’s 3D virtual face. The clustering algorithms calculated on the feature vertices are employed to bring out the canonical facial model from the reference model. Shape descriptors are specified with respect to convexity of the facial components such as eyebrows, eyes, nose mouth and facial shape. By the comparison, we have shown considerable dissimilarity of the facial shape descriptors between clustering algorithms.

Seongah Chin, Seongdong Kim
A Content-Based Fragile Watermarking Scheme for Image Authentication

In this paper, we present an effective image authentication scheme that can tolerate incidental distortions but that indicates tampered regions in cases of malicious manipulation. After having divided an image into blocks in the spatial domain and having obtained the average of each block’s pixel values, we represent the size relationship among three random blocks in a binary tree and use it as a fragile watermark. We insert the watermark, which has been extracted based on content, into the DCT block, which is the frequency domain of the image. The experimental results show that this is an effective technique of image authentication.

Mi-Ae Kim, Won-Hyung Lee

Session 7: Content Mining and Knowledge Extraction

A New FP-Tree Algorithm for Mining Frequent Itemsets

Data mining has become an important field and has been applied extensively across many areas. Mining frequent itemsets in a transaction database is critical for mining association rules. Many investigations have estabilished that pattern-growth method outperforms the method of Apriori-like candidate generation. The performance of the pattern-growth method depends on the number of tree nodes. Accordingly, this work presents a new FP-tree structure (NFP-tree) and develops an efficient approach for mining frequent itemsets, based on an NFP-tree, called the NFP-growth approach. NFP-tree employs two counters in a tree node to reduce the number of tree nodes. Additionally, the header table of the NFP-tree is smaller than that of the FP-tree. Therefore, the total number of nodes of all conditional trees can be reduced. Simulation results reveal that the NFP-growth algorithm is superior to the FP-growth algorithm for dense datasets and real datasets.

Yu-Chiang Li, Chin-Chen Chang
Evaluation Incompleteness of Knowledge in Data Mining

In this paper, we prove that using rough degree of rough set in classic rough sets to measure of uncertainty of knowledge is not comprehensive. Then we define a new measure named rough entropy of rough set, and we prove it is a more comprehensive measure of incompleteness of knowledge about rough set X. At the same time, the research suggests the rough degree of rough set X about knowledge R, the rough entropy of knowledge R, and the rough entropy of rough set X about knowledge R in classic rough sets decrease monotonously as the granularity of information become smaller through finer partitions. These results will be very help for understanding the essence of concept approximation and measure of incompleteness in rough sets.

Qiang Li, Jianhua Li, Xiang Li, Shenghong Li
The Variable Precision Rough Set Model for Data Mining in Inconsistent Information System

The variable precision rough set (VPRS) model is an extension of original rough set model. For inconsistent information system, the VPRS model allows a flexible approximation boundary region by a precision variable. This paper is focused on data mining in inconsistent information system using the VPRS model. A method based on VPRS model is proposed to apply to data mining for inconsistent information system. By our method the deterministic and probabilistic classification rules are acquired from the inconsistent information system. An example is given to show that the method of data mining for inconsistent information system is effective.

Qingmin Zhou, Chenbo Yin, Yongsheng Li
Rule Discovery with Particle Swarm Optimization

This paper proposes Particle Swarm Optimization (PSO) algorithm to discover classification rules. The potential IF-THEN rules are encoded into real-valued particles that contain all types of attributes in data sets. Rule discovery task is formulized into an optimization problem with the objective to get the high accuracy, generalization performance, and comprehensibility, and then PSO algorithm is employed to resolve it. The advantage of the proposed approach is that it can be applied on both categorical data and continuous data. The experiments are conducted on two benchmark data sets: Zoo data set, in which all attributes are categorical, and Wine data set, in which all attributes except for the classification attribute are continuous. The results show that there is on average the small number of conditions per rule and a few rules per rule set, and also show that the rules have good performance of predictive accuracy and generalization ability.

Yu Liu, Zheng Qin, Zhewen Shi, Junying Chen
Data Mining Service Based on MDA

Data Mining is a helpful tool for business decision support. The adaptability, reusability and flexibility of Data Mining System is still a big challenge and in great need, for requirement and platform changes are inescapable. The idea is to make business layer, technology layer and realization layer become more independent to each other. In this paper, we describe the Model-Driven Architecture(MDA) foundation for Data Mining Service, including store metadata in desired way for users to define operation level and build Metadata-Driven Tools, enrich UDDI specification for application components publish service inside system, also we discuss the benefits of using it and its future work.

Yiqun Chen, Chi-Hung Chi, Jian Yin

Session 8: Web Services and Content Applications (I)

Web Service Composition Based on BPWS-Net

Web Service is a more efficient and economical framework for distributed computing. However, it’s an important challenge to integrate Web Services provided by different enterprises or organizations into a new value-added Web Service. BPEL4WS (Business Process Execution Language for Web Services) is such a procedure language for Web Service composition. An approach to model BPEL4WS described process based on a kind of Service-Oriented Petri Net, BPWS-net, is proposed in this paper. Both the basic activities and structured activities of BPEL4WS are discussed using BPWS-Net. By means of this approach, not only the formal semantics of Web Service and its composition can be definitely described, but also the control flow of BPEL4WS process can be graphically modeled. Furthermore, this approach can be used to validate the correctness or soundness of Web Service composition.

Jian Sun, Changjun Jiang
Testing Web Services Using Progressive Group Testing

This paper proposes progressive group testing techniques to test large number of Web services (WS) available on Internet. At the unit testing level, the WS with the same functionality are tested in group using progressively increasing number of test cases. A small number of WS that scored best will be integrated into the real environment for operational testing. At the integration testing level, many composite services will be constructed and tested by group integration testing. The results of group testing at both unit and integration levels are verified by weighted majority voting mechanisms. The weights are based on the reliability history of the WS under test. A case study is designed and implemented, where the dependency among the test cases in WS is analyzed and used to generate progressive layers of test cases.

Wei-Tek Tsai, Yinong Chen, Zhibin Cao, Xiaoying Bai, Hai Huang, Ray Paul
XFCM – XML Based on Fuzzy Clustering and Merging – Method for Personalized User Profile Based on Recommendation System of Category and Product

In data mining, to access a large amount of data sets for the purpose of predictive data does not guarantee a good method. Even, the size of Real data is unlimited in Mobile commerce. Hereupon, in addition to searching expected Products for Users, it becomes necessary to develop a recommendation service based on XML Technology. In this paper, we design the optimized XML Recommended products data. Efficient XML data preprocessing is required in include of formatting, structural, attribute of representation with dependent on User Profile Information. Our goal is to find a relationship among user interested products and E-Commerce from M-Commerce to XDB. First, analyzing user profiles information. In the result creating clusters with user profile analyzed such as with set of sex, age, job. Second, it is clustering XML data, which are associative objects, classified from user profile in shopping mall. Third, after composing categories and Products in which associative Products exist from the first clustering, it represent categories and Products in shopping mall and optimized clustering XML data which are personalized products. The proposed personalizing user profile clustering method is designed and simulated to demonstrate the efficiency of the system.

JinHong Kim, EunSeok Lee
Analyzing Web Interfaces of Databases for Retrieving Web Information

Much of the information on the web is indeed dynamic content provided through linkups with databases. However, due to heterogeneity of databases, it is difficult to provide an integrated information retrieval. Meanwhile, information on web databases can be easily provided to users through web interfaces. In analyzing web interfaces, therefore, an information integration system can integrate web databases without concerning the database structures. This paper presents a solution to semantic heterogeneity through the analysis of web interfaces and the building of semantic networks.

Jeong-Oog Lee, Myeong-Cheol Ko, Jinsoo Kim, Chang-Joo Moon, Young-Gab Kim, Hoh Peter In
A New Universally Verifiable and Receipt-Free Electronic Voting Scheme Using One-Way Untappable Channels

Electronic voting schemes must provide universal verifiability and receipt-freeness. However, since their objectives are mutually contradictory, providing both properties are difficult. To date, most electronic voting schemes provide only one of these properties and those few which provide both properties are not practical due to heavy computational requirements. In this paper, we present an efficient electronic voting scheme that provides both properties. The proposed scheme uses a trusted third party called HS (Honest Shuffler) and requires only one-way untappable channels from HSs to voters. Currently, this is the weakest physical assumption required for receipt-freeness. Among the schemes which assume only one-way untappable channels and provide both properties, our scheme requires the least amount of computation.

Sangjin Kim, Heekuck Oh

Session 9: Web Services and Content Applications (II)

Ontology-Based Conceptual Modeling of Policy-Driven Control Framework: Oriented to Multi-agent System for Web Services Management

The integration of web services and intelligent agents is promising for automated service discovery, negotiation, and cooperation. But due to the dynamic and heterogeneous nature of web services and agents, it is challenging to guide the behaviors of underlying agents to meet the high-level business (changeful) requirements. Traditional Policy-driven methods (Ponder, Rei, KAoS, etc) are not adaptable to direct the discovery, negotiation and cooperation of dynamic agents who may join in or leave out of a specific community or organization (virtual organization) at run time. The purpose of this paper is to model an ontology-based, policy-driven control framework that is suitable to supervise the dynamic agents according to high-level policies. On the basis of federated multi-agents infrastructure and ontologies of policies, domain concepts, and agent federations, a model of role-based policy specification framework is presented in this paper.

Beishui Liao, Ji Gao, Jun Hu, Jiujun Chen
An Approach to Dynamically Reconfiguring Service-Oriented Applications from a Business Perspective

This paper proposes an approach to dynamically reconfiguring service-oriented applications from a business perspective: CAFISEadapt, which defines both business-level and software-level change operations to respectively express changes in the business domain and the software domain. Utilizing the convergence of these two level change operations, the approach expects application changes can be automatically coherent with business changes. Through hiding software-level technical details of applications that are necessary for traditional change operations, the business-level change operations can be used by business users to dynamically modify service-oriented application instances, which can realize the dynamic reconfiguration of service-oriented applications in a straightforward way to timely adapt to business requirement changes. This approach has been applied and validated in the project FLAME2008.

Jianwu Wang, Yanbo Han, Jing Wang, Gang Li
Dynamically Reconfiguring Sitemaps Using RDF

This paper presents extracting, storing, and applying the metadata and ontology of product data. In this paper, the design and tooling information included in STEP-NC files is focused as an example. By analyzing the relationship among the product data, the RDFS schema is designed first. Based on the schema, metadata is extracted and stored in RDF files. As an application of the stored metadata, we can reconfigure the sitemap of product data repositories. The users can select the view that he or she is interested in (e.g., the views from products, tools, persons, or a current location). The sitemaps also can be constructed from current location dynamically. With such various and dynamic views of product data repository, the users can access the specific data more effectively.

Huijae Lee, Sang Bong Yoo
A General Model for Heterogeneous Web Services Integration

A General Integration Model of Web Services is presented in this paper to solve heterogeneous Web Services integration provided by different sellers. By use of transforming and coordinating mechanisms, services are presented to be a uniform general service for users, which efficiently provides a general mechanism to integrate a large number of heterogeneous Web Services.

Wei Zhang, Shangfen Guo, Dan Ma
Methodology for Semantic Representing of Product Data in XML

Theoretically based on highly general ontological notions drawn from Analytical Philosophy, Modeling ability for EXPRESS and XML is evaluated, and the limits of current product data modeling approaches are indicated. An ontology based method for representing product data by XML is proposed. Compared with those existed solutions, this approach not only takes advantage of XML’s popularity and flexibility, and compatibility with STEP’s rigorous description of product, but also aims at consistent semantic interoperation. The proposed method is of two levels: The first is building ontology level by extracting semantic knowledge from EXPRESS schema; in the second, XML schema is derived from ontology to access XML documents. And in this paper a formal semantic expression mechanism in description logics is introduced to capture the semantic of EXPRESS schema.

Xiangjun Fu, Shanping Li, Ming Guo, Nizamuddin Channa
Semantic Based Web Services Discovery

This paper presents a novel approach for Web Services discovery on the envisioned Semantic Web. At first it proposes ontology based four-layer Web Services description model which is helpful for data-independence and concept-sharing. And then a users’ services preferences and constrains model upon the description is described. SBWSDF (“Semantic Based Web Services Discovery Framework”) is a framework to implement Web Services discovery using these models. Using the prototype of this framework, we set up a services ontology base and a rules base about flight booking. This prototype integrates the services description, the preferences and constrains rules and the request information to select proper services by using an inference engine. The result proves it’s a new approach of Web Services discovery with intelligence.

Jianjun Xu, Qian Zhu, Juanzi Li, Jie Tang, Po Zhang, Kehong Wang

Session 10: Content Retrieval and Management (I)

What Are People Looking for in Your Web Page?

Web server log analyses usually analyze the pattern of the access. We believe that it is also very important to understand the goal of the access. In this paper, we propose to combine the log analysis with content analysis to identify information goals on individual accessed pages. We analyze the web server log to extract information goals on entry pages from anchor texts and query terms, and propagate them along users’ access paths to other linked pages. The experiment shows that our approach could find popular terms on web pages, temporal changes in these terms could reflect users’ interest shifts, and unexpected terms could sometimes indicate a design problem.

Chen Ding, Chi-Hung Chi
The Impact of OCR Accuracy on Automatic Text Classification

Current general digitization approach of paper media is converting them into the digital images by a scanner, and then reading them by an OCR to generate ASCII text for full-text retrieval. However, it is impossible to recognize all characters with 100% accuracy by the present OCR technology. Therefore, it is important to know the impact of OCR accuracy on automatic text classification to reveal its technical feasibility. In this research we perform automatic text classification experiments for English newswire articles to study on the relationships between the accuracies of OCR and the text classification employing the statistical classification techniques.

Guowei Zu, Mayo Murata, Wataru Ohyama, Tetsushi Wakabayashi, Fumitaka Kimura
TSS: A Hybrid Web Searches

Because of emergence of Semantic Web, It make possible for machines to understand the meaning of resources on the Web. The widespread availability of machine understandable information will impact on Information retrieval on the web. In this paper, we propose a hybrid web searches architecture, TSS, which combines the traditional search with semantic search to improve precision and recall. The components in TSS are described to support the hybrid web searches.

Li-Xin Han, Gui-Hai Chen, Li Xie
Determining the Number of Probability-Based Clustering: A Hybrid Approach

While analyzing the previous methods for determining the number of probability-based clustering, this paper introduces an improved Monte Carlo Cross-Validation algorithm (iMCCV) and attempts to solve the posterior probabilities spread problem, which cannot be resolved by the Monte Carlo Cross-Validation algorithm. Furthermore, we present a hybrid approach to determine the number of probability-based clustering by combining the iMCCV algorithm and the parallel coordinates visual technology. The efficiency of our approach is discussed with experimental results.

Tao Dai, Chunping Li, Jiaguang Sun
Categorizing XML Documents Based on Page Styles

The self-describing feature of XML offers both challenges and opportunities in information retrieval, document management, and data mining. To process and manage XML documents effectively on XML data server, database, Electronic Document Management System(EDMS) and search engine, we have to develop a new technique for categorizing large XML documents automatically. In this paper, we propose a new methodology for categorizing XML documents based on page style by taking account of meanings of the elements and nested structures of XML. Accurate categorization of XML documents by page styles provides an important basis for a variety of applications of managing and processing XML. Experiments with Yahoo! pages show that our methodology provides almost 100% accuracy in categorizing XML documents by page styles.

Jung-Won Lee

Session 11: Content Retrieval and Management (II)

Generating Different Semantic Spaces for Document Classification

Document classification is an important technique in the field of digital library, WWW pages etc. Due to the problems of synonymy and polysemy, it is better to classify documents based on latent semantics. The local semantic basis, which contains the features of documents within a particular category, has more discriminate power and is more effective in classification than global semantic basis which contains the common features of all documents available. Because the semantic basis obtained by Nonnegative matrix factorization has a straightforward correspondence with samples while the semantic basis obtained by Singular value decomposition doesn’t, NMF is suitable to obtain the local semantic basis. In this paper, global and local semantic bases obtained by SVD and NMF are compared. The experimental results show that the best classification accuracy is achieved by local semantic basis obtained by NMF.

Jianjiang Lu, Baowen Xu, Jixiang Jiang
A Component Retrieval Method Based on Facet-Weight Self-learning

Component-based development method has been a new software development paradigm. How to get the needed components quickly and accurately is one of the basic problems about reusing software component automatically. In this paper, an intelligent component retrieval model – FWRM. is proposed. Facet presentation is used to model query and component. Multiple types of facets are defined which extends traditional keyword-based facet presentation. Genetic algorithm based facet weight self-learning algorithm can change the facet weight dynamically in order to improve retrieval accuracy. Corresponding similarity functions are defined also. In addition, risk minimization-based component sampling method is used to solve the insufficiency of training data. All these algorithms and methods are integrated into FWRM’s three main implementation parts: Facet-Weight Optimize System, Component Retrieve System and Resource. The experimental results prove that this method is feasible and can improve component retrieval effectively.

Xiaoqin Xie, Jie Tang, Juanzi Li, Kehong Wang
The Algorithm About Division and Reducts of Information System Based on Discernibility Index of Attribute

The effective reduct algorithm is the foundation to use the rough set theory in data mining and knowledge discovery in database. In this paper, we discuss the well-known reduct algorithms, and propose the conception of discernibility index of attribute. We also propose the algorithm about division and reducts of information system based on discernibility index of attribute. We analyze the completeness and validity of the algorithm. The experiments indicate that our algorithm is efficient and practical.

Jun Li, Xiongfei Li, Hui Liu, Xinying Chen
An Effective Document Classification System Based on Concept Probability Vector

This paper presents an effective concept-based document classification system, which can efficiently classify Korean documents through the thesaurus tool. The thesaurus tool is the information extractor that acquires the meanings of document terms from the thesaurus. It supports effective document classification with the acquired meanings. The system uses the concept-probability vector to represent the meanings of the terms. Because the category of the document depends on the meanings than the terms, even though the size of the vector is small, the system can classify the document without degradation of the performance. The system uses the small concept-probability vector so that it can save the time and space for document classification. The experimental results suggest that the presented system with the thesaurus tool can effectively classify the documents.

Hyun-Kyu Kang, Yi-Gyu Hwang, Pum-Mo Ryu
Accuracy Improvement of Automatic Text Classification Based on Feature Transformation and Multi-classifier Combination

In this paper, we describe a comparative study on techniques of feature transformation and classification to improve the accuracy of automatic text classification. The normalization to the relative word frequency, the principal component analysis (K-L transformation) and the power transformation were applied to the feature vectors, which were classified by the Euclidean distance, the linear discriminant function, the projection distance, the modified projection distance and the SVM. In order to improve the classification accuracy, the multi-classifier combination by majority vote was employed.

Xuexian Han, Guowei Zu, Wataru Ohyama, Tetsushi Wakabayashi, Fumitaka Kimura

Session 12: Ontology and Knowledge Conceptualization

Risk Minimization Based Ontology Mapping

The key point to reach interoperability over distributed ontologies is the mediation between them, called ontology mapping. Absolutely manually specified mapping is tedious and time consumption. Additional, how to ensure the consistency and deal with error prone in manual process, further how to maintain the mapping with the evolution of ontologies are all beyond manual work. Therefore, it is indeed necessary to automatically discover the mapping between ontologies so that mergence and translation of different ontology-based annotations become possible. Existing (semi-)automatic processing system are restricted to limited information, which depress the performance especially when the taxonomy structures have little overlapping or the instances have few commons. In this paper, based on Bayesian decision theory, we propose an approach called RiMOM to automatically discover mapping between ontologies. RiMOM treats the entire mapping problem as a decision problem instead of similarity problem in previous work. It explicitly and formally gives a complete decision model for ontology mapping. Based on shallow NLP, this paper also introduces a method to deal with instances heterogeneity, which is a long-standing problem for information processing. Experiments on real world data show that RiMOM is promising.

Jie Tang, Bang-Yong Liang, Juanzi Li, Kehong Wang
Evolutionary Parameter Estimation Algorithm for Combined Kernel Function in Support Vector Machine

This paper proposes a new kernel function for support vector machine and its learning method with fast convergence and good classification performance. A set of kernel functions are combined to create a new kernel function, which is trained by a learning method based on evolution algorithm. The learning method results in the optimal decision model consisting of a set of features as well as a set of the parameters for combined kernel function. The combined kernel function and the learning method were applied to obtain the optimal decision model for the classification of clinical proteome patterns, and the combined kernel function showed faster convergence in learning phase and resulted in the optimal decision model with better classification performance than other kernel functions. Therefore, the combined kernel function has the greater flexibility in representing a problem space than single kernel functions.

Syng-Yup Ohn, Ha-Nam Nguyen, Sung-Do Chi
Enriching Domain Ontology from Domain-Specific Documents with HowNet

Constructing domain ontology by hand is still a hard and time-consuming job. So, developing methods and techniques to acquire ontology from a large amount of documents in a semi-automatic manner is indispensable. In this paper, we present a technique of how to enrich existing ontology from domain-specific documents with online knowledge system like HowNet. HowNet is a bilingual general knowledge base that encodes inter-concept semantic relations and the inter-attribute semantic relations that can provide a hint to enrich existing ontology. We present the enrichment algorithms in detail. The preliminary experiment in physical education domain is taken and useful conclusions is made and presented through the paper.

Yong Cheng, Zhongzhi Shi
A Framework of Extracting Sub-ontology

It often needs to extract sub-ontology from large-scale ontology for application. Current approaches mainly focus on extraction in single ontology. However, an application may require several large-scale ontologies in different domains; it needs approaches of extracting sub-ontology from multiple ontologies. This paper proposes a framework of extracting sub-ontology from multiple ontologies. In the framework, a unified visualized ontology model is proposed; different Ontology language codes are translated into the unified model firstly. Then we divide the user requirements to extract sub-ontology from each ontology; the extraction is in an iterative and incremental process. Finally, we integrate these sub-ontologies into the sub-ontology that the user demands, and translate the result back to the Ontology language the user uses. This framework avoids doing integration and extraction processes with large-scale ontologies, which are the most difficult processes in current approaches.

Baowen Xu, Dazhou Kang, Jianjiang Lu
Ontology Based Sports Video Annotation and Summary

With digital sports video increasing everyday, effective analyzing sports video content becomes more and more important. Effective and efficient representation of video for searching, retrieval, inference and mining is a key problem in knowledge engineering. To describe sports video content efficiently, sports video ontology for video annotation is represented in OWL, a description logic based Web Ontology Language. We describe a user-friendly platform for sports video annotation. Ontology based sports video annotation can facilitate video indexing, retrieval and reasoning in a broad range of applications including Digital Olympic Project in China. Moreover, we present a hierarchical sports video summarization strategy to browse the sports video in a progressive way. In sports video, replay scenes often represent the highlight or interesting event of the video. Hence, our representative scene selection is based on the replay detection algorithm and identical events detection. The basic experimental results show our strategy is effective.

Jian-quan Ouyang, Li Jin-tao, Zhang Yong-dong
Backmatter
Metadaten
Titel
Content Computing
herausgegeben von
Chi-Hung Chi
Kwok-Yan Lam
Copyright-Jahr
2004
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-30483-8
Print ISBN
978-3-540-23898-0
DOI
https://doi.org/10.1007/b103383