Skip to main content

Über dieses Buch

Due to the popularity of knowledge discovery and data mining, in practice as well as among academic and corporate R&D professionals, association rule mining is receiving increasing attention.
The authors present the recent progress achieved in mining quantitative association rules, causal rules, exceptional rules, negative association rules, association rules in multi-databases, and association rules in small databases. This book is written for researchers, professionals, and students working in the fields of data mining, data analysis, machine learning, knowledge discovery in databases, and anyone who is interested in association rule mining.




Association rule mining is an important topic in data mining. Our work in this book focuses on this topic. To briefly clarify the background of association rule mining in this chapter, we will concentrate on introducing data mining techniques.
In Section 1.1 we begin with explaining what data mining is. In Section 1.2 we argue as to why data mining is needed. In Section 1.3 we recall the process of knowledge discovery in databases (KDD). In Section 1.4 we demonstrate data mining tasks and faced data types. Section 1.5 introduces some basic data mining techniques. Section 1.6 presents data mining and marketing. In Section 1.7, we show some examples where data mining is applied to real-world problems. And, finally in Section 1.8 we discuss future work involving data mining.

Association Rule

This chapter recalls some of the essential concepts related to associ- ation rule mining, which will be utilized throughout the book. Some existing research into the improvement of association rule mining techniques is also introduced to clarify the process. The chapter is organized as follows. In Section 2.1, we begin by outlining certain necessary basic concepts. Some measurements of association rules are discussed in Section 2.2. In Section 2.3, we introduce the Apriori algorithm. This algorithm searches large (or frequent) itemsets in databases. Section 2.4 introduces some research into mining association rules. Finally, we summarize this chapter in Section 2.5.

Negative Association Rule

During decision making, we are often confronted by a huge amount of factors. These factors may be either an advantage or a disadvantage to a decision objective. For the purpose of low-risk (high-profit), we must scrutinize the possible behavior of these factors. It is parti- cularly useful to grasp which of the disadvantage factors will rarely occur when the expected advantage factors occur, by using past data. Also, we take into account that there are essential differences between positive and negative association rule mining. Using a pruning algo- rithm we can reduce the search space, however, some pruned itemsets may be useful in the extraction of negative rules.

Causality in Databases

A causal rule between two variables, XY , captures the relation- ship that the presence of X causes the appearance of Y . Because of its usefulness (in comparison with association rules), the techniques for mining causal rules are beginning to be developed. However, the effectiveness of existing methods, such as LCD and CU-path algo- rithms, is limited for mining causal rules among invariable items. These techniques are not adequate for the discovery and representa- tion of causal rules among multi-value variables. In this chapter, we propose techniques for mining causality between the variables X and Y by partitioning, where causality is represented in the form XY , with the conditional probability matrix M Y❘X . These techniques are also applied to find causal rules in probabilistic databases. This chapter begins by stating the problems faced in Section 4.1. Some necessary basic concepts are defined in Section 4.2. In Section 4.3 we first define a ‘good partition’ for generating item variables from items, and we then present a method of mining causality of interest from large databases. In Section 4.4 we advocate an approach for finding dependencies among variables. In Section 4.5 we apply the proposed causality mining techniques to mining probabilistic databases. And finally, we conclude in Section 4.6.

Causal Rule Analysis

Causal rules attached to matrices can be used to capture causal rela- tionships among multi-value variables in data. However, because the causal relations are represented in a non-linear form (a matrix), it is rather difficult to make decisions using the causal rules. Therefore, one of the main challenges is to reduce the complexity of the repre- sentation. As important research into post data mining, this chapter firstly establishes a method of optimizing causal rules which tackles the ‘useless’ information in the conditional probability matrices of the extracted rules. Then, techniques for constructing polynomial functions for approximate causality in data are advocated. Finally, we propose an approach for finding the approximate polynomial causal- ity between two variables from a given data set by fitting.

Association Rules in Very Large Databases

Dealing with very large databases is one of the defining challenges in data mining research and development. Some databases are simp- ly too large (e.g., with terabytes of data) to be processed at one time. An ideal way of mining very large databases would be by us- ing paralleling techniques. This system employs hardware technology, such as parallel machines, to implement concurrent data mining al- gorithms. However, parallel machines are expensive, and less widely available, than single processor machines. This chapter presents some techniques for mining association rules in very large databases, using instance selection.

Association Rules in Small Databases

Accidents in nuclear power plants can cause environmental disasters and create personal, economical and ecological damage. Therefore, research into automatic surveillance and early nuclear accident de- tection has received much attention. To reduce nuclear accidents, re- liable information is needed for controlling, and/or preventing, such accidents. Hence, extracting useful patterns from limited data in nu- clear power plants is very important, and is imperative for the pur- pose of safety. This kind of knowledge is generally obtained from theoretical, experimental, and real data. However, nuclear accidents rarely occur, and we may discover nothing from the accident database in a plant. Therefore, reliable mining of an accident database in a nuclear power plant would require dependence upon external data as well.

Conclusion and Future Work

After compiling this book, we acknowledge that association rule mining is still in a stage of exploration and development. There remain some essential issues that need to be explored for identifying useful association rules. In this chapter, these issues are outlined as possible future problems to be solved. In Section 8.1, we summarize the previous seven chapters. And then, in Section 8.2, we describe four other challenging problems in association rule mining.


Weitere Informationen