Beyond market baskets: generalizing association rules to correlations

Authors:
Sergey Brin

Department of Computer Science, Stanford University, Stanford, CA

Department of Computer Science, Stanford University, Stanford, CA
View Profile

,
Rajeev Motwani

Department of Computer Science, Stanford University, Stanford, CA

Department of Computer Science, Stanford University, Stanford, CA
View Profile

,
Craig Silverstein

Department of Computer Science, Stanford University, Stanford, CA

Department of Computer Science, Stanford University, Stanford, CA
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 26 Issue 2June 1997pp 265–276https://doi.org/10.1145/253262.253327

Published:01 June 1997Publication History

ACM SIGMOD Record

Abstract

One of the most well-studied problems in data mining is mining for association rules in market basket data. Association rules, whose significance is measured via support and confidence, are intended to identify rules of the type, “A customer purchasing item A often also purchases item B.” Motivated by the goal of generalizing beyond market baskets and the association rules used with them, we develop the notion of mining rules that identify correlations (generalizing associations), and we consider both the absence and presence of items as a basis for generating rules. We propose measuring significance of associations via the chi-squared test for correlation from classical statistics. This leads to a measure that is upward closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between correlated and uncorrelated itemsets in the lattice. We develop pruning strategies and devise an efficient algorithm for the resulting problem. We demonstrate its effectiveness by testing it on census data and finding term dependence in a corpus of text documents, as well as on synthetic data.

References

1 R. Agrawal, A. Arning, T. Bollinger, M. Mehta, J. Sharer, and R. Srikant. The Quest Data Mining System. In Proceedings of the Second International Conference on Knowledge Discovery in Databases and Data, August 1996.Google ScholarDigital Library
2 R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the A CM SIGMOD International Conference on the Management of Data, pages 207-216, May 1993. Google ScholarDigital Library
3 R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5:914-925, 1993. Google ScholarDigital Library
4 R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.I. Verkamo. Fast Discovery of Association Rules. In Fayyad et al {9}, pages 307-328, 1996. Google Scholar
5 R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the ~Oth International Conference on Very Large Data Bases, pages 487-499, September 1994. Google ScholarDigital Library
6 A. Agresti. A survey of exact inference for contingency tables. Statistical Science, 7:131-177, 1992.Google ScholarCross Ref
7 M. Dietzfelbinger, A. Karlin, K. Mehlhorn, F. Meyer auf der Heide, H. Rohnert, and H. Tarjan. Dynamic perfect hashing: Upper and lower bounds. In Proceedings of the 18th IEEE Symposium on Foundations of Computer Science, pages 524-531, 1988.Google Scholar
8 Ft. Ewald. Keynote address. The 3rd International Conference on Information and Knowledge Management, 1994.Google Scholar
9 U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthrusamy. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996. Google ScholarDigital Library
10 M. Fredman, J. Koml6s, and E. Szemer6di. Storing a sparse table with O(1) worst case access time. Journal of the A CM, 31(3):538-544, 1984. Google ScholarDigital Library
11 T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Mining Optimized Association Rules for Numeric Attributes. In Proceedings of the Fifteenth A CM Symposium on Principles of Database Systems, 1996. Google ScholarDigital Library
12 T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Mining optimized association rules for numeric data. In Prvceedings of the A CM SIGMOD International Conference on Management of Data, pages 13-24, 1996. Google ScholarDigital Library
13 J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: a relational aggregation operator generalizing groupby, cross-tab, and sub-totals. Microsoft Technical Report MSR-TR-95-22, 1995.Google Scholar
14 D. Gunopulos, H. Mannila, and S. Saluja. Discovering all most specific sentences by randomized algorithms. In Proceedings of the 6th International Conference on Database Theory, to appear, January 1997. Google ScholarDigital Library
15 J. Hun and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases, pages 420-431, September 1995. Google ScholarDigital Library
16 M. Houtsma and A. Swami. Set-oriented mining of association rules. In Proceedings of the International Conference on Data Engineering, pages 25-34, 1995. Google ScholarDigital Library
17 M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proceedings of the 3rd International Conference on Information and Knowledge Management, pages 401-407, 1994. Google ScholarDigital Library
18 H.O. Lancaster. The Chi-squared Distribution. John Wiley & Sons, New York, 1969.Google Scholar
19 P.S. de Laplace. Oeuvres compl~tes de Laplace publides sous les auspices de i'Acad~mie des Sciences par M.M. les secrdtaires perp~tuels. Gauthier-Villar, Paris, 1878/1912.Google Scholar
20 H. Mannila, H. Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discovering association rules. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pages 144-155, July 1994.Google Scholar
21 A. de Moivre. Approximatio ad summam terminorum binomii (a + b)n in seriem expansi. Supplement to Miscellanea Analytica, London, 1733.Google Scholar
22 D.S. Moore. Tests of chi-squared type. In: R.B. D'Agostino and M.A. Stephens (eds), Goodness-of-Fit Techniques, Marcel Dekker, New York, 1986, pp. 63-95.Google Scholar
23 F. Mosteller and D. Wallace. Inference and Disputed Authorship: The Federalists. Addison-Wesley, 1964.Google Scholar
24 J. S. Park, M. S. Chen, and P. S. Yu. An effective hash based algorithm for mining association rules. In Proceedings of the A CM SIGMOD International Con/erence on Management of Data, pages 175-186, May 1995. Google ScholarDigital Library
25 K. Pearson. On a criterion that a given system of deviations from the probable in the case of correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag., 5:157-175, 1900.Google ScholarCross Ref
26 G. Piatetsky and W. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, 1991. Google ScholarDigital Library
27 A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the International Conference on Very Large Data Bases, pages 432-444, 1995. Google ScholarDigital Library
28 R. Srikant and R. Agrawai. Mining generalized association rules. In Proceeding8 of the 21st International Conference on Very Larye Data Bases, pages 407-419, September 1995. Google Scholar
29 H. Toivonen. Samplinglarge databases for finding association rules. In Proceedings of the 22nd International Conference on Very Larye Data Bases, pages 134-145, September 1996. Google ScholarDigital Library
30 J. Widom. Research problems in data warehousing. In Proceedings of the ~th Conference on Information and Knowledge Management, November 1995. Google ScholarDigital Library

Index Terms

Beyond market baskets: generalizing association rules to correlations

Recommendations

Beyond market baskets: generalizing association rules to correlations
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data

One of the most well-studied problems in data mining is mining for association rules in market basket data. Association rules, whose significance is measured via support and confidence, are intended to identify rules of the type, “A customer purchasing ...
Read More
Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B.” Motivated partly by ...
Read More
Beyond intratransaction association analysis: mining multidimensional intertransaction association rules

In this paper, we extend the scope of mining association rules from traditional single-dimensional intratransaction associations, to multidimensional intertransaction associations. Intratransaction associations are the associations among items with the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 26, Issue 2
June 1997
583 pages
ISSN:0163-5808
DOI:10.1145/253262
Chairman:
Sudha Ram
Univ. of Arizona, Tucson
,
Editor:
Joan M. Peckham
Univ. of Rhode Island, Kingston
Issue’s Table of Contents
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data
June 1997
594 pages
ISBN:0897919114
DOI:10.1145/253260
Editors:
Joan M. Peckman
Univ. of Rhode Island, Kingston
,
Sudha Ram
Univ. of Arizona, Tucson
,
Michael Franklin
Copyright © 1997 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1997
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 977
  Total Citations
  View Citations
- 3,860
  Total Downloads
- Downloads (Last 12 months)290
- Downloads (Last 6 weeks)34
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Beyond market baskets: generalizing association rules to correlations

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Beyond market baskets: generalizing association rules to correlations

Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Beyond intratransaction association analysis: mining multidimensional intertransaction association rules