ABSTRACT
Existing research on mining quantitative databases mainly focuses on mining associations. However, mining associations is too expensive to be practical in many cases. In this paper, we study mining correlations from quantitative databases and show that it is a more effective approach than mining associations. We propose a new notion of Quantitative Correlated Patterns (QCPs), which is founded on two formal concepts, mutual information and all-confidence. We first devise a normalization on mutual information and apply it to QCP mining to capture the dependency between the attributes. We further adopt all-confidence as a quality measure to control, at a finer granularity, the dependency between the attributes with specific quantitative intervals. We also propose a supervised method to combine the consecutive intervals of the quantitative attributes based on mutual information, such that the interval combining is guided by the dependency between the attributes. We develop an algorithm, QCoMine, to efficiently mine QCPs by utilizing normalized mutual information and all-confidence to perform a two-level pruning. Our experiments verify the efficiency of QCoMine and the quality of the QCPs.
- R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993. Google ScholarDigital Library
- Y. Aumann and Y. Lindell. A statistical theory for quantitative association rules. Journal of Intelligent Information Systems, 20(3):255--283, 2003. Google ScholarDigital Library
- S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: generalizing association rules to correlations. In SIGMOD, pages 265--276, 1997. Google ScholarDigital Library
- S. Brin, R. Rastogi, and K. Shim. Mining optimized gain rules for numeric attributes. In KDD, pages 135--144, 1999. Google ScholarDigital Library
- E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. IEEE TKDE, 13(1):64--78, 2001. Google ScholarDigital Library
- T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., 1991. Google ScholarDigital Library
- T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining with optimized two-dimensional association rules. ACM TODS, 26(2):179--213, 2001. Google ScholarDigital Library
- S. Hettich, C. Blake, and C. Merz. UCI repository of machine learning databases.Google Scholar
- W.-Y. Kim, Y.-K. Lee, and J. Han. Ccmine: Efficient mining of confidence-closed correlated patterns. In PAKDD, pages 569--579, 2004.Google ScholarCross Ref
- Y.-K. Lee, W.-Y. Kim, Y. D. Cai, and J. Han. Comine: Efficient mining of correlated patterns. In ICDM, page 581, 2003. Google ScholarDigital Library
- S. Ma and J. L. Hellerstein. Mining mutually dependent patterns. In ICDM, pages 409--416, 2001. Google ScholarDigital Library
- E. R. Omiecinski. Alternative interest measures for mining associations in databases. IEEE TKDE, 15(1):57--69, 2003. Google ScholarDigital Library
- G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, pages 229--248. 1991.Google Scholar
- R. Rastogi and K. Shim. Mining optimized association rules with categorical and numeric attributes. IEEE TKDE, 14(1):29--50, 2002. Google ScholarDigital Library
- R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In SIGMOD, 1996. Google ScholarDigital Library
- K. Wang, S. H. W. Tay, and B. Liu. Interestingness-based interval merger for numeric association rules. In KDD, pages 121--128, 1998.Google Scholar
- G. I. Webb. Discovering associations with numeric variables. In KDD, pages 383--388, 2001. Google ScholarDigital Library
- H. Xiong, P.-N. Tan, and V. Kumar. Mining strong affinity association patterns in data sets with skewed support distribution. In ICDM, page 387, 2003. Google ScholarDigital Library
- M. J. Zaki and K. Gouda. Fast vertical mining using diffsets. In KDD, pages 326--335, 2003. Google ScholarDigital Library
- H. Zhang, B. Padmanabhan, and A. Tuzhilin. On the discovery of significant statistical quantitative rules. In KDD, pages 374--383, 2004. Google ScholarDigital Library
Index Terms
- Mining quantitative correlated patterns using an information-theoretic approach
Recommendations
Correlated pattern mining in quantitative databases
We study mining correlations from quantitative databases and show that this is a more effective approach than mining associations to discover useful patterns. We propose the novel notion of quantitative correlated pattern (QCP), which is founded on two ...
An information-theoretic approach to quantitative association rule mining
Quantitative association rule (QAR) mining has been recognized an influential research problem over the last decade due to the popularity of quantitative databases and the usefulness of association rules in real life. Unlike boolean association rules (...
Efficient discovery of correlated patterns using multiple minimum all-confidence thresholds
Correlated patterns are an important class of regularities that exist in a database. Although there exists no universally acceptable best measure to judge the interestingness of a pattern, all-confidence is emerging as a popular measure to discover the ...
Comments