Article

Screening and interpreting multi-item associations based on log-linear modeling

Authors:
Xintao Wu

UNC at Charlotte, Charlotte, NC

UNC at Charlotte, Charlotte, NC
View Profile

,
Daniel Barbará

George Mason University, Fairfax, VA

George Mason University, Fairfax, VA
View Profile

,
Yong Ye

UNC at Charlotte, Charlotte, NC

UNC at Charlotte, Charlotte, NC
View Profile

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2003Pages 276–285https://doi.org/10.1145/956750.956783

Published:24 August 2003Publication History

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 276–285

ABSTRACT

Association rules have received a lot of attention in the data mining community since their introduction. The classical approach to find rules whose items enjoy high support (appear in a lot of the transactions in the data set) is, however, filled with shortcomings. It has been shown that support can be misleading as an indicator of how interesting the rule is. Alternative measures, such as lift, have been proposed. More recently, a paper by DuMouchel et al. proposed the use of all-two-factor loglinear models to discover sets of items that cannot be explained by pairwise associations between the items involved. This approach, however, has its limitations, since it stops short of considering higher order interactions (other than pairwise) among the items. In this paper, we propose a method that examines the parameters of the fitted loglinear models to find all the significant association patterns among the items. Since fitting loglinear models for large data sets can be computationally prohibitive, we apply graph-theoretical results to divide the original set of items into components (sets of items) that are statistically independent from each other. We then apply loglinear modeling to each of the components and find the interesting associations among items in them. The technique is experimentally evaluated with a real data set (insurance data) and a series of synthetic data sets. The results show that the technique is effective in finding interesting associations among the items involved.

References

R. Agrawal, T. Imilienski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Database, pages 207--216, 1993. Google ScholarDigital Library
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Data Bases, pages 487--499, September 1994. Google ScholarDigital Library
A. Agresti. Categorical data anlysis. Wiley, 1990.Google Scholar
J. Badsberg. An environment for graphical models. Ph.D. Thesis, Aalborg University, Demark, 1995.Google Scholar
D. Barbará, W. DuMouchel, C. Faloutsos, P. Hass, J. M. Hellerstein, Y. Ioannidis, H. Jagadish, T. Johnson, R. Ng, V. Poosala, K. Ross, and K. Sevcik. The new jersey data reduction report. Bulletin of the Technical Committee on Data Engineering, 20(4):3--45, December 1997.Google Scholar
D. Barbará and X. Wu. Loglinear based quasi cubes. Journal of Information and Intelligent Systems, 16(3):255--276, 2001. Google ScholarDigital Library
J. Benedetti and M. Brown. Stratigies for the selection of loglinear models. Biometrics, 34:680--686, 1978.Google ScholarCross Ref
Y. M. Bishop, S. E. Fienberg, and P. W. Holland. Discrete Multivariate Analysis: Theory and Practice. The MIT Press, Cambride, Massachusetts, and London, England, 1975.Google Scholar
M. Brown. Screening effects in multidimensional contigency tables. Applied Statistics, 25:37--46, 1976.Google ScholarCross Ref
COIL challenge 2000. The insurance company (tic) benchmark. http://kdd.uci.edu/databases/tic/tic.html.Google Scholar
A. Deshpande, M. Garofalakis, and R. Rastogi. Independence is good: Dependency-based histogram synopses for high-dimensional data. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 199--210. Santa Barbara, California, May 2001. Google ScholarDigital Library
W. DuMouchel and D. Pregibon. Empirical bayes screening for multi-item association. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data. San Francisco, CA, August 2001. Google ScholarDigital Library
L. Goodman. The analysis of multidimensional contigency tables: Stepwise procedures and direct estimation methods for building models for multiple classifications. Technometrics, 13:33--61, 1971.Google ScholarCross Ref
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the ACM SIGMOD International Conference on Management of Database, pages 1--12. Dallas, TX, May 2000. Google ScholarDigital Library
V. Harinarayan, A. Rajaraman, and J. Ullman. Implementing data cubes efficiently. In Proceedings of the ACM SIGMOD International Conference on Management of Database, pages 205--216. Montreal, Quebec, Canada, 1996. Google ScholarDigital Library
S. Lauritzen. Graphical Models. Oxford University Press, 1996.Google Scholar
D. Pavlov, H. Mannila, and P. Symth. Probabilistic models for query approximation with large sparse binary data sets. In Proceedings of Uncertainty in Artificial Intelligence, pages 199--210. Stanford, California, June 2000. Google ScholarDigital Library
S. Sarawagi, R. Agrawal, and N. Meggido. Discovery-driven explorations of olap data cubes. In Proceedings of the International Conference on Extending Data Base Technolgy, pages 168--182. Valencia, Spain, 1998. Google ScholarDigital Library
C. Silverstein, S. Brin, and R. Motwani. Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery, 2:39--68, 1998. Google ScholarDigital Library
R. Tarjan. Decomposition by clique separators. Discrete Mathematics, 55:221--232, 1985.Google ScholarCross Ref
J. Whittaker. Graphical Models in Applied Mathematical Multivariate Statistics. Wiley, 1990.Google Scholar
X. Wu and D. Barbará. Modeling and imputation of large incomplete multidimensional datasets. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery. Aix-en-Provence, France, Septemeber 2002. Google ScholarDigital Library

Index Terms

Screening and interpreting multi-item associations based on log-linear modeling
1. Information systems
  1. Information systems applications

Recommendations

Transaction-item association matrix-based frequent pattern network mining algorithm in large-scale transaction database

To increase the efficiency of data mining is the emphasis in this field at present. Through the establishment of transaction-item association matrix, this paper changes the process of association rule mining to elementary matrix operation, which makes ...
Read More
Frequent pattern network mining algorithm based on transaction-item association matrix
ICCOMP'09: Proceedings of the WSEAES 13th international conference on Computers

To increase the efficiency of data mining is the emphasis in this field at present. Aiming at the difficulties of data maintaining and updating in association rule mining FP-growth algorithm, this paper proposes a FP-network model which compresses the ...
Read More
Empirical bayes screening for multi-item associations
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

This paper considers the framework of the so-called "market basket problem", in which a database of transactions is mined for the occurrence of unusually frequent item sets. In our case, "unusually frequent" involves estimates of the frequency of each ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Conference Chair:
Lise Getoor
University of Maryland, College Park
,
General Chair:
Ted Senator
DARPA
,
Program Chairs:
Pedro Domingos
University of Washington
,
Christos Faloutsos
Carnegie Mellon University
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
association rule
graphical model
log-linear model
Qualifiers
- Article
Conference

Acceptance Rates
KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 342
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Screening and interpreting multi-item associations based on log-linear modeling

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Transaction-item association matrix-based frequent pattern network mining algorithm in large-scale transaction database

Frequent pattern network mining algorithm based on transaction-item association matrix

Empirical bayes screening for multi-item associations