skip to main content
research-article

Self-sufficient itemsets: An approach to screening potentially interesting associations between items

Published:18 January 2010Publication History
Skip Abstract Section

Abstract

Self-sufficient itemsets are those whose frequency cannot be explained solely by the frequency of either their subsets or of their supersets. We argue that itemsets that are not self-sufficient will often be of little interest to the data analyst, as their frequency should be expected once that of the itemsets on which their frequency depends is known. We present tests for statistically sound discovery of self-sufficient itemsets, and computational techniques that allow those tests to be applied as a post-processing step for any itemset discovery algorithm. We also present a measure for assessing the degree of potential interest in an itemset that complements these statistical measures.

References

  1. Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining associations between sets of items in massive databases. In Proceedings of the ACM-SIGMOD International Conference on Management of Data. 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agresti, A. 1992. A survey of exact inference for contingency tables. Statis. Sci. 7, 1, 131--153.Google ScholarGoogle ScholarCross RefCross Ref
  3. Agresti, A. 2002. Categorical Data Analysis. Wiley-Interscience, New York.Google ScholarGoogle Scholar
  4. Aumann, Y. and Lindell, Y. 1999. A statistical theory for quantitative association rules. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99). 261--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., and Lakhal, L. 2000. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st International Conference on Computational Logic (CL'00). Springer-Verlag, Berlin, 972--986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bayardo, Jr., R. J. and Agrawal, R. 1999. Mining the most interesting rules. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99). 145--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bayardo, Jr., R. J., Agrawal, R., and Gunopulos, D. 2000. Constraint-based rule mining in large, dense databases. Data Min. Knowl. Disc. 4, 2/3, 217--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Brijs, T., Swinnen, G., Vanhoof, K., and Wets, G. 1999. Using association rules for product assortment decisions: A case study. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 254--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Calders, T. and Goethals, B. 2002. Mining all non-derivable frequent itemsets. In Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02). Springer, Berlin, 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chan, R., Yang, Q., and Shen, Y. D. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. 19--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cheng, J., Ke, Y., and Ng, W. 2006. Δ-tolerance closed frequent itemsets. In Proceedings of the 6th International Conference on Data Mining. 139--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cooley, R., Tan, P.-N., and Srivastava, J. 1999. Discovery of interesting usage patterns from Web data. In Proceedings of the International WEBKDD'99 Workshop. Springer, Berlin, 163--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. DuMouchel, W. 1999. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Americ. Statis. 53, 3, 177--190.Google ScholarGoogle Scholar
  14. DuMouchel, W. and Pregibon, D. 2001. Empirical Bayes screening for multi-item associations. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). ACM Press, New York, NY, 76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Goethals, B. 2007. NDI. Software. http://www.adrem.ua.ac.be/goethals/software/.Google ScholarGoogle Scholar
  16. Hettich, S. and Bay, S. D. 2006. The UCI KDD archive. Department of Information and Computer Science. University of California, Irvine, CA. http://kdd.ics.uci.edu.Google ScholarGoogle Scholar
  17. Holm, S. 1979. A simple sequentially rejective multiple test procedure. Scandinavian J. Statis. 6, 65--70.Google ScholarGoogle Scholar
  18. Jaroszewicz, S. and Simovici, D. A. 2004. Interestingness of frequent itemsets using Bayesian networks as background knowledge. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). R. Kohavi, J. Gehrke, and J. Ghosh, Eds. ACM Press, New York, NY, 178--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Liu, B., Hsu, W., and Ma, Y. 2001. Identifying non-actionable association rules. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). 329--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Malik, H. H. and Kender, J. R. 2006. High quality, efficient hierarchical document clustering using closed interesting itemsets. In Proceedings of the 6th IEEE International Conference on Data Mining. IEEE, 991--996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Megiddo, N. and Srikant, R. 1998. Discovering predictive association rules. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD'98). AAAI Press, Menlo Park, US, 27--78.Google ScholarGoogle Scholar
  22. Newman, D. J., Hettich, S., Blake, C., and Merz, C. J. 2006. UCI repository of machine learning databases. {Machine-readable data repository}. Department of Information and Computer Science, University of California, Irvine, CA.Google ScholarGoogle Scholar
  23. Pei, J., Dong, G., Zou, W., and Han, J. 2002. On computing condensed frequent pattern bases. In Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM'02). 378--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and J. Frawley, Eds. AAAI/MIT Press, Menlo Park, CA., 229--248.Google ScholarGoogle Scholar
  25. Shaffer, J. P. 1995. Multiple hypothesis testing. Annual Rev. Psych. 46, 561--584.Google ScholarGoogle ScholarCross RefCross Ref
  26. Tabachnick, B. G. and Fidell, L. S. 2001. Using Multivariate Statistics. Allyn and Bacon, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Webb, G. I. 1995. OPUS: An efficient admissible algorithm for unordered search. J. Arti. Intell. Res. 3, 431--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Webb, G. I. 2001. Discovering associations with numeric variables. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). The ACM, New York, NY, 383--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Webb, G. I. 2007. Discovering significant patterns. Mach. Learn. 68, 1, 1--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Webb, G. I. 2009. Magnum Opus Version 4.3. Software, G. I. Webb & Associates, Melbourne, Aust.Google ScholarGoogle Scholar
  31. Webb, G. I. and Zhang, S. 2005. K-optimal rule discovery. Data Mining Knowl. Discov. 10, 1, 39--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wu, X., Barbará, D., and Ye, Y. 2003. Screening and interpreting multi-item associations based on log-linear modeling. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 276--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xin, D., Han, J., Yan, X., and Cheng, H. 2005. Mining compressed frequent-pattern sets. In Proceedings of the International Conference on Very Large Databases (VLDB'05). 709--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yao, H. and Hamilton, H. J. 2006. Mining itemset utilities from transaction databases. Data Knowl. Engin. 59, 603--626. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zaki, M. J. 2000. Generating non-redundant association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). ACM, New York, NY, 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zaki, M. J. 2004. Mining non-redundant association rules. Data Mining Knowl. Discov. 9, 3, 223--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zaki, M. J. and Hsiao, C. J. 2002. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining. 457--473.Google ScholarGoogle Scholar
  38. Zhang, H., Padmanabhan, B., and Tuzhilin, A. 2004. On the discovery of significant statistical quantitative rules. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD'04). ACM Press, New York, NY, 374--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zheng, Z., Kohavi, R., and Mason, L. 2001. Real world performance of association rule algorithms. In Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD'01). ACM, New York, NY, 401--406. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Self-sufficient itemsets: An approach to screening potentially interesting associations between items

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 4, Issue 1
      January 2010
      135 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/1644873
      Issue’s Table of Contents

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 January 2010
      • Accepted: 1 May 2009
      • Revised: 1 March 2009
      • Received: 1 March 2008
      Published in tkdd Volume 4, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader