nach oben

2012 | Buch

Kapitel lesen Erstes Kapitel lesen

Data and Knowledge Engineering

Third International Conference, ICDKE 2012, Wuyishan, Fujian, China, November 21-23, 2012. Proceedings

herausgegeben von: Yang Xiang, Mukaddim Pathan, Xiaohui Tao, Hua Wang

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the refereed proceedings of the International Conference on Data and Knowledge Engineering, ICDKE 2012, held in Wuyishan, Fujian, China, in November 2012. The conference was co-located with the 6th International Conference on Network and System Security, NSS 2012. The 13 revised full papers of ICDKE 2012 were carefully reviewed and selected from 53 submissions. The papers cover the following topics: artificial intelligence and data engineering; knowledge discovery and data management; information extraction and retrieval and data security.

Inhaltsverzeichnis

Frontmatter

ICDKE: Artificial Intelligence and Data Engineering

OR-Tree: An Optimized Spatial Tree Index for Flash-Memory Storage Systems

Abstract

Nowadays, flash memory is widely used in various portable devices such as PDAs (personal digital assistants) and HPCs (handheld PCs). With the development of location-based services (LBS) provided by those portable devices, how to provide efficient spatial data management over flash-equipped LBS devices has become a critical issue in recent years. In this paper, we aim to propose an efficient flash-optimized spatial tree index, called OR-tree, to improve the queries and updates on flash-resident spatial data. The OR-tree is based on R-tree and introduces two new techniques to make the R-tree suitable for flash memory storage system. First, we employ a lazy node splitting policy by introducing overflow nodes for the leaf nodes in R-tree. Next, we redesign the algorithms of update operations to adapt the unbalanced OR-tree structure. To evaluate the performance of the OR-tree, we conduct experiments on a spatial data set gathered from the R-tree portal. We run different workloads and compare the OR-tree with other competitors on the basis of different metrics including flash page usage, flash write count, erasure count, and runtime. The experimental results show that the OR-tree outperforms the competitors considering both space efficiency and time performance.

Na Wang, Peiquan Jin, Shouhong Wan, Yinghui Zhang, Lihua Yue

Directional Skyline Queries

Abstract

Continuous monitoring of queries over moving objects has become an important topic as it supports a wide range of useful mobile applications. A continuous skyline query involves both static and dynamic dimensions. In the dynamic dimension, the data object not only has a distance from the query object, but it also has a direction with respect to the query object motion. In this paper, we propose a direction-oriented continuous skyline query algorithm to compute the skyline objects with respect to the current position of the user. The goal of the proposed algorithm is to help the user to retrieve the best objects that satisfy his/her constraints and fall either in any direction around the query object, or is aligned along the object’s direction of motion. We also create a pre-computed skyline data set that facilitates skyline update, and enhances query running time and performance. Finally, we present experimental results to demonstrate the performance and efficiency of our proposed algorithms.

Eman El-Dawy, Hoda M. O. Mokhtar, Ali El-Bastawissy

Transforming Nescient Activity into Intelligent Activity

Abstract

A nescient activity within financial service process is prone to risk of producing inconsistent outcome that results severe legal consequence for a financial institute e.g., bank. Financial service process activities must be intelligent to understand and comply financial regulations. Intelligent process activities will assist to prevent producing outcomes that are inconsistent to financial regulations. Existing process technologies support defining only nescient activities. Currently, there is no solution that underpins transforming a nescient activity into intelligent activity. In this paper, we offer a solution that supports transformation of a nescient activity into intelligent activity.

Rafiqul Haque, Nenad Krdzavac, Tom Butler

Self-adaptive Non-stationary Parallel Multisplitting Two-Stage Iterative Methods for Linear Systems

Abstract

In this paper,the new non-stationary parallel multisplitting two-stage iterative methods with self-adaptive weighting matrices are presented for the linear system of equations when the coefficient matrix is nonsingular. Self-adaptive weighting matrices are given, especially, the nonnegativity is eliminated. The convergence theories are established for the self-adaptive non-stationary parallel multisplitting two-stage iterative methods. Finally, the numerical comparisons of several self-adaptive non-stationary parallel multisplitting two-stage iterative methods are shown.

Guo-Yan Meng, Chuan-Long Wang, Xi-Hong Yan

Towards a Case-Based Reasoning Approach Based on Ontologies Application to Railroad Accidents

Abstract

The work presented in the context of this paper is to develop a system of Case-Based Reasoning (CBR) to help capitalize and exploit the knowledge on railroad accidents from our field of application. To streamline and strengthen the acquisition process, representation and exploitation of this domain knowledge, initially, it is necessary to harmonize and standardize the terminology used by the domain actors. In this context, we have agreed to call on building ontologies. Far from being just an effect of "fashion" or a goal in itself, ontology finds its place in resolving problems. For this purpose, ontologies can play an important role in the CBR systems, because they can greatly reduce the effort of acquiring knowledge in various stages of reasoning. In this paper, we will present the first realized works in our approach.

Ahmed Maalel, Lassad Mejri, Habib Hadj-Mabrouk, Henda Ben Ghézela

ICDKE: Knowledge Discovery and Data Management

An Agile Knowledge Discovery in Databases Software Process

Abstract

In a knowledge society, transforming data into information and knowledge to support the decision-making process is a crucial success factor for all organizations. In this sense, the mission of Software Engineering is to build systems able to process large volumes of data, transform them into relevant knowledge and deliver them to customers, so they can make the right decisions at the right time. However, companies still fail in determining the process model used in their Knowledge Discovery in Databases projects. This article introduces the AgileKDD, an agile and disciplined software process for developing systems capable of discovering the knowledge hidden in databases, which was built on top of the Open Unified Process. A case study shows that AgileKDD can increase the success factor of projects whose goal is to develop Knowledge Discovery in Databases applications.

Givanildo Santana do Nascimento, Adicinéia Aparecida de Oliveira

An Innovative Outlier Detection Method Using Localized Thresholds

Abstract

In this paper, we investigate the research problem of specifying the thresholds to effectively detect outliers and propose an innovative technique that leverages multiple localized thresholds for detecting outliers. Our technique is able to overcome a major limitation of the existing outlier detection approaches that use a single universal threshold to distinguish outliers from normal data. The experiments conducted demonstrate that our technique is able to achieve a better detection effectiveness than the traditional approaches.

Ji Zhang, Jie Cao, Xiaodong Zhu

A Self-stabilizing Algorithm for Finding a Minimal K-Dominating Set in General Networks

Abstract

Since the publication of Dijkstra’s pioneering paper, a lot of self-stabilizing algorithms for computing dominating sets have been proposed in the literature. However, there is no self-stabilizing algorithm for the minimal k-dominating set (MKDS) in arbitrary graphs that works under a distributed daemon. The proposed algorithms for the minimal k-dominating set (MKDS) either work for trees (Kamei and Kakugawa [16]) or find a minimal 2-dominating set (Huang et al. [14,15]). In this paper, we propose a self-stabilizing algorithm for the minimal k-dominating set (MKDS) under a central daemon model when operating in any general network. We further prove that the worst case convergence time of the algorithm from any arbitrary initial state is O(n ²) steps where n is the number of nodes in the network.

Guangyuan Wang, Hua Wang, Xiaohui Tao, Ji Zhang

Optimal Combination of Feature Weight Learning and Classification Based on Local Approximation

Abstract

Currently, most feature weights estimation methods are independent on the classification algorithms. The combination of discriminant analysis and classifiers for effective pattern classification remains heuristic. The present study address the topics of learning of feature weights by using a recently reported classification algorithm, K-Local Hyperplane Distance Nearest Neighbor (HKNN) [18], in which the data is modeled as embedded in a linear hyperplane. Motivated by the encouraging performance of the Learning Discriminative Projections and Prototypes, the feature weights are estimated by minimizing the classifier leave-one-out cross validation error of HKNN. Approximated explicit solution is obtained to give feature estimation. Therefore, the feature weighting and classification are perfectly matched. The performance of the combinational model is extensively assessed via experiments on both synthetic and benchmark datasets. The superior results demonstrate that the method is competitive compared with some state-of-art models.

Hongmin Cai, Michael Ng

A Model of User-Oriented Reduct Construction Based on Minimal Set Cover

Abstract

An implicit assumption of many learning algorithm is that all attributes in the attribute set are of the same importance. However, this assumption is unreasonable or practical. If attributes in the attribute set are considered of non-equal importance with respect to their own situation, then the model obtained from those attributes would be more realistic. This paper designed a user oriented reduction model based on the minimal set cover by seamlessly combining an order of attributes that describes user preferences. The accessibility and efficiency of this algorithm is shown by an example.

Suqing Han, Guimei Yin

ICDKE: Information Extraction and Retrieval

Determining Pattern Similarity in a Medical Recommender System

Abstract

As recommender systems have proven their effectiveness in other areas, it is aimed to transfer this approach for use in medicine. Particularly, the diagnoses of physicians made in rural hospitals of developing countries, in remote areas or in situations of uncertainty are to be complemented by machine recommendations drawing on large bases of expert knowledge in order to reduce the risk to patients. Recommendation is mainly based on finding known patterns similar to a case under consideration. To search for such patterns in rather large databases, a weighted similarity distance is employed, which is specially derived for medical knowledge. For collaborative filtering an incremental algorithm, called W-InCF, is used working with the Mahalanobis distance and fuzzy membership. W-InCF consists of a learning phase, in which a cluster model of patients’ medical history is constructed incrementally, and a prediction phase, in which the medical pattern of each patient considered is compared with the model to determine the most similar cluster. Fuzzy sets are employed to cope with possible confusion of decision making on overlapping clusters. The degrees of membership to these fuzzy sets is expressed by a weighted Mahalanobis radial basis function, and the weights are derived from risk factors identified by experts. The algorithm is validated using data on cephalopelvic disproportion.

Maytiyanin Komkhao, Jie Lu, Lichen Zhang

Chinese Latent Relational Search Based on Relational Similarity

Abstract

Latent relational search is a new way of searching information in an unknown domain according to the knowledge of a known domain from the Web. By analyzing the analogous relationship between word pairs, the latent relational search engine can tell us the accurate information that we need. Given a Chinese query, {(A, B), (C, ?)}, our aim is to get D which is the answer of “?”. In this paper, we propose an approach to Chinese Latent relational search for the first time. Moreover, we classify the relation mappings between two word pairs into three categories according to the number of relations and the number of target words corresponding to each relation. Our approach firstly extracts relation-representing words by using the preprocessing modular and two clustering algorithms, and then the candidate word set of D corresponding to each relation-representing word can be obtained. The proposed method achieves an MRR of 0.563, which is comparable to the existing methods.

Chao Liang, Zhao Lu

Wikipedia Category Graph and New Intrinsic Information Content Metric for Word Semantic Relatedness Measuring

Abstract

Computing semantic relatedness is a key component of information retrieval tasks and natural processing language applications. Wikipedia provides a knowledge base for computing word relatedness with more coverage than WordNet. In this paper we use a new intrinsic information content (IC) metric with Wikipedia category graph (WCG) to measure the semantic relatedness between words. Indeed, we have developed a performed algorithm to extract the categories assigned to a given word from the WCG. Moreover, this extraction strategy is coupled with a new intrinsic information content metric based on the subgraph composed of hypernyms of a given concept. Also, we have developed a process to quantify the information content subgraph. When tested on common benchmark of similarity ratings the proposed approach shows a good correlation value compared to other computational models.

Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha, Mohamed Tmar, Abdelmajid Ben Hamadou

Nautilus: A Generic Framework for Crawling Deep Web

Abstract

This paper presents Nautilus, which is a generic framework for crawling deep Web. We provide an abstraction of deep Web crawling process and mechanism of integrating heterogeneous business modules. A Federal Decentralized Architecture is proposed to ensemble advantages of existed P2P networking architectures. We also present effective policies to schedule crawling tasks. Experimental results show our scheduling policies have good performance on load-balance and overall throughput.

Jianyu Zhao, Peng Wang

Retrieving Information from Microblog Using Pattern Mining and Relevance Feedback

Abstract

Retrieving information from Twitter is always challenging due to its large volume, inconsistent writing and noise. Most existing information retrieval (IR) and text mining methods focus on term-based approach, but suffers from the problems of terms variation such as polysemy and synonymy. This problem deteriorates when such methods are applied on Twitter due to the length limit. Over the years, people have held the hypothesis that pattern-based methods should perform better than term-based methods as it provides more context, but limited studies have been conducted to support such hypothesis especially in Twitter. This paper presents an innovative framework to address the issue of performing IR in microblog. The proposed framework discover patterns in tweets as higher level feature to assign weight for low-level features (i.e. terms) based on their distributions in higher level features. We present the experiment results based on TREC11 microblog dataset and shows that our proposed approach significantly outperforms term-based methods Okapi BM25, TF-IDF and pattern based methods, using precision, recall and F measures.

Cher Han Lau, Xiaohui Tao, Dian Tjondronegoro, Yuefeng Li

ICDKE: Data Security I

A Secure and Efficient Mix Network Especially Suitable for E-Voting

Abstract

A mix network is proposed in this paper. Its most operations are carried out in an off-line one-time initialization phase so that its on-line efficiency is very high. Although this two-phase mechanism has a limitation to parameter setting, we show that the limitation does not prevent the mix network from being employed in its main application, e-voting, with the help of a grouped shuffling mechanism. Its achievement of desired security properties is formally proved.

Kun Peng

Power Analysis Based Reverse Engineering on the Secret Round Function of Block Ciphers

Abstract

Side Channel Analysis (SCA) has become a new threat to the hardware implementations of encryption algorithms. The reverse engineering has been adopted to explore the unknown part of the encryption algorithms. The existing SCAs for reverse engineering depend on the leakage models in a large extent and mainly focus on the single component of the algorithms while the other parts of the target algorithm are known. In this paper, we present a more general and feasible reverse analysis by combining the mathematical methods and the SCA methods. We use the strict avalanche criterion for the non-linear operations of block ciphers and apply the power analysis to reverse the structure parameters. We propose a new reverse analysis method to reduce the dependency on the leakage models, which can be combined with the structure cryptanalysis to reverse the internal parameters of the linear and non-linear operations. We finally achieve the reverse analysis on the unknown round function of block ciphers.

Ming Tang, Zhenlong Qiu, Weijie Li, Shubo Liu, Huanguo Zhang

PEVS: A Secure Electronic Voting Scheme Using Polling Booths

Abstract

We present a Polling booth based Electronic Voting Scheme (PEVS) that allows eligible voters to cast their ballots inside polling booths. The ballot cast by a voter is inalterable and non-reusable. The scheme ensures vote-privacy since there is no link between the voter and the keys used for voting. A voter computer inside the booth performs the cryptographic task to construct the ballot to provide receipt-freeness. The scheme is coercion-resistant and modeled to fend off forced-abstention attacks, simulation attacks or randomization attacks. In addition, the scheme is both voter and universal verifiable. We formally analyze soundness (the eligibility of the voter, inalterability and non-reusability of the ballot), vote-privacy, receipt-freeness, and coercion-resistance in PEVS using the ProVerif tool. The analysis shows that PEVS satisfies these required properties. PEVS is the first polling booth based electronic voting scheme that satisfies all the requirements listed.

Md. Abdul Based, Joe-Kai Tsay, Stig Fr. Mjølsnes

ICDKE: Data Security II

Certificate-Based Key-Insulated Signature

Abstract

To reduce the influence of key exposure, we introduce key-insulated into certificate-based cryptography and formalize the notion and security model of the certificate-based key-insulated signature scheme. We then present a certificate-based key-insulated signature scheme, which is proven to be existentially unforgeable against adaptive chosen message attacks in the random oracle model.

Haiting Du, Jiguo Li, Yichen Zhang, Tao Li, Yuexin Zhang

The Generalized Construction and Linear Complexity of Binary Sequences with Three-Level Autocorrelation Values

Abstract

Binary sequences with good autocorrelation and large linear complexity are needed in many applications. As an application of theory of interleaved sequences, one family of binary pseudo-random sequences is constructed. Furthermore, the autocorrelation values and linear complexity of such constructed binary sequences are derived in this paper.

Zuling Chang, Dandan Li

Anonymous Hierarchical Identity-Based Encryption in Prime Order Groups

Abstract

A hierarchical identity-based encryption (HIBE) scheme is called anonymous if the ciphertext does not leak the identity of the recipient. Currently, anonymous HIBE schemes are constructed in composite order groups or achieve selective-ID security in prime order groups and the ciphertext size is linear in the maximum depth of hierarchy. We propose an anonymous HIBE scheme with constant size ciphertext, which is adaptive-ID secure without random oracles in prime order groups. Our scheme improves security and efficiency of the anonymous HIBE schemes simultaneously compare with the previous work.

Yanli Ren, Shuozhong Wang, Xinpeng Zhang

Weaknesses of “Security Analysis and Enhancement for Three-Party Password-Based Authenticated Key Exchange Protocol”

Abstract

The three-party password-based authenticated key exchange (3PAKE) protocol allows two users to share a session key for future communication with the help of a trusted server in the public network. Recently, Zhao et al. [Zhao J., Gu D., Zhang L., Security analysis and enhancement for three-party password-based authenticated key exchange protocol, Security Communication Networks 2012; 5(3):273-278] proposed an efficient 3PAKE protocol using smart cards. They proved that their protocol can withstand various known attacks found in the previously published schemes. However, in this paper, we point out that their protocol is vulnerable to three kinds of attacks namely, off-line password-guessing attack, privileged insider attack and stolen smart card attack. Hence, Zhao et al.’s scheme is not recommended for practical applications.

Muhammad Khurram Khan, Debiao He

Backmatter

Titel: Data and Knowledge Engineering
herausgegeben von: Yang Xiang
Mukaddim Pathan
Xiaohui Tao
Hua Wang
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-34679-8
Print ISBN: 978-3-642-34678-1
DOI: https://doi.org/10.1007/978-3-642-34679-8

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

ICDKE: Artificial Intelligence and Data Engineering

OR-Tree: An Optimized Spatial Tree Index for Flash-Memory Storage Systems

Directional Skyline Queries

Transforming Nescient Activity into Intelligent Activity

Self-adaptive Non-stationary Parallel Multisplitting Two-Stage Iterative Methods for Linear Systems

Towards a Case-Based Reasoning Approach Based on Ontologies Application to Railroad Accidents

ICDKE: Knowledge Discovery and Data Management

An Agile Knowledge Discovery in Databases Software Process

An Innovative Outlier Detection Method Using Localized Thresholds

A Self-stabilizing Algorithm for Finding a Minimal K-Dominating Set in General Networks

Optimal Combination of Feature Weight Learning and Classification Based on Local Approximation

A Model of User-Oriented Reduct Construction Based on Minimal Set Cover

ICDKE: Information Extraction and Retrieval

Determining Pattern Similarity in a Medical Recommender System

Chinese Latent Relational Search Based on Relational Similarity

Wikipedia Category Graph and New Intrinsic Information Content Metric for Word Semantic Relatedness Measuring

Nautilus: A Generic Framework for Crawling Deep Web

Retrieving Information from Microblog Using Pattern Mining and Relevance Feedback

ICDKE: Data Security I

A Secure and Efficient Mix Network Especially Suitable for E-Voting

Power Analysis Based Reverse Engineering on the Secret Round Function of Block Ciphers

PEVS: A Secure Electronic Voting Scheme Using Polling Booths

ICDKE: Data Security II

Certificate-Based Key-Insulated Signature

The Generalized Construction and Linear Complexity of Binary Sequences with Three-Level Autocorrelation Values

Anonymous Hierarchical Identity-Based Encryption in Prime Order Groups

Weaknesses of “Security Analysis and Enhancement for Three-Party Password-Based Authenticated Key Exchange Protocol”

Backmatter

Premium Partner