skip to main content
10.1145/1150402.1150495acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Summarizing itemset patterns using probabilistic models

Published:20 August 2006Publication History

ABSTRACT

In this paper, we propose a novel probabilistic approach to summarize frequent itemset patterns. Such techniques are useful for summarization, post-processing, and end-user interpretation, particularly for problems where the resulting set of patterns are huge. In our approach items in the dataset are modeled as random variables. We then construct a Markov Random Fields (MRF) on these variables based on frequent itemsets and their occurrence statistics. The summarization proceeds in a level-wise iterative fashion. Occurrence statistics of itemsets at the lowest level are used to construct an initial MRF. Statistics of itemsets at the next level can then be inferred from the model. We use those patterns whose occurrence can not be accurately inferred from the model to augment the model in an iterative manner, repeating the procedure until all frequent itemsets can be modeled. The resulting MRF model affords a concise and useful representation of the original collection of itemsets. Extensive empirical study on real datasets show that the new approach can effectively summarize a large number of itemsets and typically significantly outperforms extant approaches.

References

  1. F. N. Afrati, A. Gionis, and H. Mannila. Approximating a collection of frequent sets. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 12--19, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 487--499, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Borgelt. Efficient implementations of apriori and eclat. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003.]]Google ScholarGoogle Scholar
  4. T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Calders and B. Goethals. Depth-first non-derivable itemset mining. In Proceedings of the SIAM 2005 International Conference on Data Mining, 2005.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman & Hall/CRC, 2004.]]Google ScholarGoogle Scholar
  7. A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y. K. Chen, and P. Dubey. Cache-conscious frequent pattern mining on a modern processor. In Proceedings of the 31st International Conference on Very Large Data Bases, pages 577--588, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Gouda and M. J. Zaki. Genmax: An efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 11(3):223--242, November 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen. Data mining, hypergraph transversals, and machine learning. In Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 209--216, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 1--12, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining top-k frequent closed patterns without minimum support. In Proceedings of the 2002 IEEE International Conference on Data Mining, pages 211--218, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Lauritzen and D. Speigelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B (Methodological), 50(2):157--224, 1988.]]Google ScholarGoogle ScholarCross RefCross Ref
  14. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Database Theory - ICDT '99, 7th International Conference, Jerusalem, Israel, January 10-12, 1999, Proceedings, pages 398--416, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: probabilistic models for query approximation on binary transaction data. IEEE Transactions on Knowledge and Data Engineering, 15(6):1409--1421, November 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In The Ohio State University, Technical Report, 2006.]]Google ScholarGoogle Scholar
  17. X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing itemset patterns: a profile-based approach. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 314--323, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. J. Zaki and C.-J. Hsiao. Charm: An efficient algorithm for closed itemset mining. In Proceedings of the Second SIAM International Conference on Data Mining, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  19. M. J. Zaki, S. Parthasarathy, and W. L. Mitsunori Ogihara. New algorithms for fast discovery of association rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pages 283--286, 1997.]]Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Summarizing itemset patterns using probabilistic models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2006
      986 pages
      ISBN:1595933395
      DOI:10.1145/1150402

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader