research-article

Self-sufficient itemsets: An approach to screening potentially interesting associations between items

Author:
Geoffrey I. Webb

Monash University, Australia

Monash University, Australia
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 4 Issue 1Article No.: 3pp 1–20https://doi.org/10.1145/1644873.1644876

Published:18 January 2010Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Self-sufficient itemsets are those whose frequency cannot be explained solely by the frequency of either their subsets or of their supersets. We argue that itemsets that are not self-sufficient will often be of little interest to the data analyst, as their frequency should be expected once that of the itemsets on which their frequency depends is known. We present tests for statistically sound discovery of self-sufficient itemsets, and computational techniques that allow those tests to be applied as a post-processing step for any itemset discovery algorithm. We also present a measure for assessing the degree of potential interest in an itemset that complements these statistical measures.

References

Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining associations between sets of items in massive databases. In Proceedings of the ACM-SIGMOD International Conference on Management of Data. 207--216. Google ScholarDigital Library
Agresti, A. 1992. A survey of exact inference for contingency tables. Statis. Sci. 7, 1, 131--153.Google ScholarCross Ref
Agresti, A. 2002. Categorical Data Analysis. Wiley-Interscience, New York.Google Scholar
Aumann, Y. and Lindell, Y. 1999. A statistical theory for quantitative association rules. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99). 261--270. Google ScholarDigital Library
Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., and Lakhal, L. 2000. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st International Conference on Computational Logic (CL'00). Springer-Verlag, Berlin, 972--986. Google ScholarDigital Library
Bayardo, Jr., R. J. and Agrawal, R. 1999. Mining the most interesting rules. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99). 145--154. Google ScholarDigital Library
Bayardo, Jr., R. J., Agrawal, R., and Gunopulos, D. 2000. Constraint-based rule mining in large, dense databases. Data Min. Knowl. Disc. 4, 2/3, 217--240. Google ScholarDigital Library
Brijs, T., Swinnen, G., Vanhoof, K., and Wets, G. 1999. Using association rules for product assortment decisions: A case study. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 254--260. Google ScholarDigital Library
Calders, T. and Goethals, B. 2002. Mining all non-derivable frequent itemsets. In Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02). Springer, Berlin, 74--85. Google ScholarDigital Library
Chan, R., Yang, Q., and Shen, Y. D. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. 19--26. Google ScholarDigital Library
Cheng, J., Ke, Y., and Ng, W. 2006. Δ-tolerance closed frequent itemsets. In Proceedings of the 6th International Conference on Data Mining. 139--148. Google ScholarDigital Library
Cooley, R., Tan, P.-N., and Srivastava, J. 1999. Discovery of interesting usage patterns from Web data. In Proceedings of the International WEBKDD'99 Workshop. Springer, Berlin, 163--182. Google ScholarDigital Library
DuMouchel, W. 1999. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Americ. Statis. 53, 3, 177--190.Google Scholar
DuMouchel, W. and Pregibon, D. 2001. Empirical Bayes screening for multi-item associations. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). ACM Press, New York, NY, 76. Google ScholarDigital Library
Goethals, B. 2007. NDI. Software. http://www.adrem.ua.ac.be/goethals/software/.Google Scholar
Hettich, S. and Bay, S. D. 2006. The UCI KDD archive. Department of Information and Computer Science. University of California, Irvine, CA. http://kdd.ics.uci.edu.Google Scholar
Holm, S. 1979. A simple sequentially rejective multiple test procedure. Scandinavian J. Statis. 6, 65--70.Google Scholar
Jaroszewicz, S. and Simovici, D. A. 2004. Interestingness of frequent itemsets using Bayesian networks as background knowledge. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). R. Kohavi, J. Gehrke, and J. Ghosh, Eds. ACM Press, New York, NY, 178--186. Google ScholarDigital Library
Liu, B., Hsu, W., and Ma, Y. 2001. Identifying non-actionable association rules. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). 329--334. Google ScholarDigital Library
Malik, H. H. and Kender, J. R. 2006. High quality, efficient hierarchical document clustering using closed interesting itemsets. In Proceedings of the 6th IEEE International Conference on Data Mining. IEEE, 991--996. Google ScholarDigital Library
Megiddo, N. and Srikant, R. 1998. Discovering predictive association rules. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD'98). AAAI Press, Menlo Park, US, 27--78.Google Scholar
Newman, D. J., Hettich, S., Blake, C., and Merz, C. J. 2006. UCI repository of machine learning databases. {Machine-readable data repository}. Department of Information and Computer Science, University of California, Irvine, CA.Google Scholar
Pei, J., Dong, G., Zou, W., and Han, J. 2002. On computing condensed frequent pattern bases. In Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM'02). 378--385. Google ScholarDigital Library
Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and J. Frawley, Eds. AAAI/MIT Press, Menlo Park, CA., 229--248.Google Scholar
Shaffer, J. P. 1995. Multiple hypothesis testing. Annual Rev. Psych. 46, 561--584.Google ScholarCross Ref
Tabachnick, B. G. and Fidell, L. S. 2001. Using Multivariate Statistics. Allyn and Bacon, Boston, MA. Google ScholarDigital Library
Webb, G. I. 1995. OPUS: An efficient admissible algorithm for unordered search. J. Arti. Intell. Res. 3, 431--465. Google ScholarDigital Library
Webb, G. I. 2001. Discovering associations with numeric variables. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). The ACM, New York, NY, 383--388. Google ScholarDigital Library
Webb, G. I. 2007. Discovering significant patterns. Mach. Learn. 68, 1, 1--33. Google ScholarDigital Library
Webb, G. I. 2009. Magnum Opus Version 4.3. Software, G. I. Webb & Associates, Melbourne, Aust.Google Scholar
Webb, G. I. and Zhang, S. 2005. K-optimal rule discovery. Data Mining Knowl. Discov. 10, 1, 39--79. Google ScholarDigital Library
Wu, X., Barbará, D., and Ye, Y. 2003. Screening and interpreting multi-item associations based on log-linear modeling. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 276--285. Google ScholarDigital Library
Xin, D., Han, J., Yan, X., and Cheng, H. 2005. Mining compressed frequent-pattern sets. In Proceedings of the International Conference on Very Large Databases (VLDB'05). 709--720. Google ScholarDigital Library
Yao, H. and Hamilton, H. J. 2006. Mining itemset utilities from transaction databases. Data Knowl. Engin. 59, 603--626. Google ScholarDigital Library
Zaki, M. J. 2000. Generating non-redundant association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). ACM, New York, NY, 34--43. Google ScholarDigital Library
Zaki, M. J. 2004. Mining non-redundant association rules. Data Mining Knowl. Discov. 9, 3, 223--248. Google ScholarDigital Library
Zaki, M. J. and Hsiao, C. J. 2002. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining. 457--473.Google Scholar
Zhang, H., Padmanabhan, B., and Tuzhilin, A. 2004. On the discovery of significant statistical quantitative rules. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD'04). ACM Press, New York, NY, 374--383. Google ScholarDigital Library
Zheng, Z., Kohavi, R., and Mason, L. 2001. Real world performance of association rule algorithms. In Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD'01). ACM, New York, NY, 401--406. Google ScholarDigital Library

Index Terms

Self-sufficient itemsets: An approach to screening potentially interesting associations between items
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Human disease network guided discovery of interesting itemsets in hospital discharge data
DMMH '11: Proceedings of the 2011 workshop on Data mining for medicine and healthcare

Standard knowledge discovery techniques, such as unsupervised or supervised descriptive rule discovery, have been widely used in medical data mining. Most of the research is focused on developing effective association rule evaluation metrics that would ...
Read More
Mining fuzzy specific rare itemsets for education data

Association rule mining is an important data analysis method for the discovery of associations within data. There have been many studies focused on finding fuzzy association rules from transaction databases. Unfortunately, in the real world, one may ...
Read More
Mining N-most interesting itemsets without support threshold by the COFI-tree

Data mining is the discovery of interesting and hidden patterns from a large amount of collected data. Applications can be found in many organisations with large databases, for many different purposes such as customer relationships, marketing, planning, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 4, Issue 1
January 2010
135 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1644873
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 January 2010
- Accepted: 1 May 2009
- Revised: 1 March 2009
- Received: 1 March 2008
Published in tkdd Volume 4, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Association discovery
association rules
itemset discovery
itemset screening
statistical evaluation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 52
  Total Citations
  View Citations
- 1,310
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Self-sufficient itemsets: An approach to screening potentially interesting associations between items

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Human disease network guided discovery of interesting itemsets in hospital discharge data

Mining fuzzy specific rare itemsets for education data

Mining N-most interesting itemsets without support threshold by the COFI-tree

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Self-sufficient itemsets: An approach to screening potentially interesting associations between items

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Human disease network guided discovery of interesting itemsets in hospital discharge data

Mining fuzzy specific rare itemsets for education data

Mining N-most interesting itemsets without support threshold by the COFI-tree

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media