Abstract
Self-sufficient itemsets are those whose frequency cannot be explained solely by the frequency of either their subsets or of their supersets. We argue that itemsets that are not self-sufficient will often be of little interest to the data analyst, as their frequency should be expected once that of the itemsets on which their frequency depends is known. We present tests for statistically sound discovery of self-sufficient itemsets, and computational techniques that allow those tests to be applied as a post-processing step for any itemset discovery algorithm. We also present a measure for assessing the degree of potential interest in an itemset that complements these statistical measures.
- Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining associations between sets of items in massive databases. In Proceedings of the ACM-SIGMOD International Conference on Management of Data. 207--216. Google ScholarDigital Library
- Agresti, A. 1992. A survey of exact inference for contingency tables. Statis. Sci. 7, 1, 131--153.Google ScholarCross Ref
- Agresti, A. 2002. Categorical Data Analysis. Wiley-Interscience, New York.Google Scholar
- Aumann, Y. and Lindell, Y. 1999. A statistical theory for quantitative association rules. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99). 261--270. Google ScholarDigital Library
- Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., and Lakhal, L. 2000. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st International Conference on Computational Logic (CL'00). Springer-Verlag, Berlin, 972--986. Google ScholarDigital Library
- Bayardo, Jr., R. J. and Agrawal, R. 1999. Mining the most interesting rules. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99). 145--154. Google ScholarDigital Library
- Bayardo, Jr., R. J., Agrawal, R., and Gunopulos, D. 2000. Constraint-based rule mining in large, dense databases. Data Min. Knowl. Disc. 4, 2/3, 217--240. Google ScholarDigital Library
- Brijs, T., Swinnen, G., Vanhoof, K., and Wets, G. 1999. Using association rules for product assortment decisions: A case study. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 254--260. Google ScholarDigital Library
- Calders, T. and Goethals, B. 2002. Mining all non-derivable frequent itemsets. In Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02). Springer, Berlin, 74--85. Google ScholarDigital Library
- Chan, R., Yang, Q., and Shen, Y. D. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. 19--26. Google ScholarDigital Library
- Cheng, J., Ke, Y., and Ng, W. 2006. Δ-tolerance closed frequent itemsets. In Proceedings of the 6th International Conference on Data Mining. 139--148. Google ScholarDigital Library
- Cooley, R., Tan, P.-N., and Srivastava, J. 1999. Discovery of interesting usage patterns from Web data. In Proceedings of the International WEBKDD'99 Workshop. Springer, Berlin, 163--182. Google ScholarDigital Library
- DuMouchel, W. 1999. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Americ. Statis. 53, 3, 177--190.Google Scholar
- DuMouchel, W. and Pregibon, D. 2001. Empirical Bayes screening for multi-item associations. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). ACM Press, New York, NY, 76. Google ScholarDigital Library
- Goethals, B. 2007. NDI. Software. http://www.adrem.ua.ac.be/goethals/software/.Google Scholar
- Hettich, S. and Bay, S. D. 2006. The UCI KDD archive. Department of Information and Computer Science. University of California, Irvine, CA. http://kdd.ics.uci.edu.Google Scholar
- Holm, S. 1979. A simple sequentially rejective multiple test procedure. Scandinavian J. Statis. 6, 65--70.Google Scholar
- Jaroszewicz, S. and Simovici, D. A. 2004. Interestingness of frequent itemsets using Bayesian networks as background knowledge. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). R. Kohavi, J. Gehrke, and J. Ghosh, Eds. ACM Press, New York, NY, 178--186. Google ScholarDigital Library
- Liu, B., Hsu, W., and Ma, Y. 2001. Identifying non-actionable association rules. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). 329--334. Google ScholarDigital Library
- Malik, H. H. and Kender, J. R. 2006. High quality, efficient hierarchical document clustering using closed interesting itemsets. In Proceedings of the 6th IEEE International Conference on Data Mining. IEEE, 991--996. Google ScholarDigital Library
- Megiddo, N. and Srikant, R. 1998. Discovering predictive association rules. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD'98). AAAI Press, Menlo Park, US, 27--78.Google Scholar
- Newman, D. J., Hettich, S., Blake, C., and Merz, C. J. 2006. UCI repository of machine learning databases. {Machine-readable data repository}. Department of Information and Computer Science, University of California, Irvine, CA.Google Scholar
- Pei, J., Dong, G., Zou, W., and Han, J. 2002. On computing condensed frequent pattern bases. In Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM'02). 378--385. Google ScholarDigital Library
- Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and J. Frawley, Eds. AAAI/MIT Press, Menlo Park, CA., 229--248.Google Scholar
- Shaffer, J. P. 1995. Multiple hypothesis testing. Annual Rev. Psych. 46, 561--584.Google ScholarCross Ref
- Tabachnick, B. G. and Fidell, L. S. 2001. Using Multivariate Statistics. Allyn and Bacon, Boston, MA. Google ScholarDigital Library
- Webb, G. I. 1995. OPUS: An efficient admissible algorithm for unordered search. J. Arti. Intell. Res. 3, 431--465. Google ScholarDigital Library
- Webb, G. I. 2001. Discovering associations with numeric variables. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). The ACM, New York, NY, 383--388. Google ScholarDigital Library
- Webb, G. I. 2007. Discovering significant patterns. Mach. Learn. 68, 1, 1--33. Google ScholarDigital Library
- Webb, G. I. 2009. Magnum Opus Version 4.3. Software, G. I. Webb & Associates, Melbourne, Aust.Google Scholar
- Webb, G. I. and Zhang, S. 2005. K-optimal rule discovery. Data Mining Knowl. Discov. 10, 1, 39--79. Google ScholarDigital Library
- Wu, X., Barbará, D., and Ye, Y. 2003. Screening and interpreting multi-item associations based on log-linear modeling. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 276--285. Google ScholarDigital Library
- Xin, D., Han, J., Yan, X., and Cheng, H. 2005. Mining compressed frequent-pattern sets. In Proceedings of the International Conference on Very Large Databases (VLDB'05). 709--720. Google ScholarDigital Library
- Yao, H. and Hamilton, H. J. 2006. Mining itemset utilities from transaction databases. Data Knowl. Engin. 59, 603--626. Google ScholarDigital Library
- Zaki, M. J. 2000. Generating non-redundant association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). ACM, New York, NY, 34--43. Google ScholarDigital Library
- Zaki, M. J. 2004. Mining non-redundant association rules. Data Mining Knowl. Discov. 9, 3, 223--248. Google ScholarDigital Library
- Zaki, M. J. and Hsiao, C. J. 2002. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining. 457--473.Google Scholar
- Zhang, H., Padmanabhan, B., and Tuzhilin, A. 2004. On the discovery of significant statistical quantitative rules. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD'04). ACM Press, New York, NY, 374--383. Google ScholarDigital Library
- Zheng, Z., Kohavi, R., and Mason, L. 2001. Real world performance of association rule algorithms. In Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD'01). ACM, New York, NY, 401--406. Google ScholarDigital Library
Index Terms
- Self-sufficient itemsets: An approach to screening potentially interesting associations between items
Recommendations
Human disease network guided discovery of interesting itemsets in hospital discharge data
DMMH '11: Proceedings of the 2011 workshop on Data mining for medicine and healthcareStandard knowledge discovery techniques, such as unsupervised or supervised descriptive rule discovery, have been widely used in medical data mining. Most of the research is focused on developing effective association rule evaluation metrics that would ...
Mining fuzzy specific rare itemsets for education data
Association rule mining is an important data analysis method for the discovery of associations within data. There have been many studies focused on finding fuzzy association rules from transaction databases. Unfortunately, in the real world, one may ...
Mining N-most interesting itemsets without support threshold by the COFI-tree
Data mining is the discovery of interesting and hidden patterns from a large amount of collected data. Applications can be found in many organisations with large databases, for many different purposes such as customer relationships, marketing, planning, ...
Comments