skip to main content
10.1145/956750.956788acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Fast vertical mining using diffsets

Published:24 August 2003Publication History

ABSTRACT

A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.

References

  1. R. Agrawal, et al. Fast discovery of association rules. In U. Fayyad and et al (eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ramesh Agrawal, Charu Aggarwal, and V. V. V. Prasad. Depth First Generation of Long Patterns. In 7th Int'l Conference on Knowledge Discovery and Data Mining, August 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jay Ayres, J. E. Gehrke, Tomi Yiu, and Jason Flannick. Sequential pattern mining using bitmaps. In SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, July 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. J. Bayardo. Efficiently mining long patterns from databases. In ACM SIGMOD Conf. Management of Data, June 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In ACM SIGMOD Conf. Management of Data, May 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: a maximal frequent itemset algorithm for transactional databases. In Intl. Conf. on Data Engineering, April 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Dunkel and N. Soparkar. Data organization and access for efficient data mining. In 15th IEEE Intl. Conf. on Data Engineering, March 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Gouda and M. J. Zaki. Efficiently mining maximal frequent itemsets. In 1st IEEE Int'l Conf. on Data Mining, November 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Gunopulos, H. Mannila, and S. Saluja. Discovering all the most specific sentences by randomized algorithms. In Intl. Conf. on Database Theory, January 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Han and M. Kamber. Data Minng: Concepts and Techniuqes. Morgan kaufmann Publishers, San Francisco, CA, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD Conf. Management of Data, May 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D-I. Lin and Z. M. Kedem. Pincer-search: A new algorithm for discovering the maximum frequent set. In 6th Intl. Conf. Extending Database Technology, March 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J-L. Lin and M. H. Dunham. Mining association rules: Anti-skew algorithms. In 14th Intl. Conf. on Data Engineering, February 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. S. Park, M. Chen, and P. S. Yu. An effective hash based algorithm for mining association rules. In Intl. Conf. Management of Data, May 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In 7th Intl. Conf. on Database Theory, January 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In SIGMOD Int'l Workshop on Data Mining and Knowledge Discovery, May 2000.]]Google ScholarGoogle Scholar
  17. S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with databases: alternatives and implications. In ACM Intl. Conf. Management of Data, June 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In 21st VLDB Conf., 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Shafer, R. Agrawal, and M. Mehta. Sprint: A scalable parallel classifier for data mining. In 22nd VLDB Conference, March 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Shenoy, et al. Turbo-charging vertical mining of large databases. In Intl. Conf. Management of Data, May 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. J. Zaki. Generating non-redundant association rules. In Int'l Conf. Knowledge Discovery and Data Mining, August 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3):372--390, May-June 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. J. Zaki and C.-J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In 2nd SIAM Int'l Conf. on Data Mining, April 2002.]]Google ScholarGoogle Scholar
  24. M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, August 1997.]]Google ScholarGoogle Scholar

Index Terms

  1. Fast vertical mining using diffsets

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2003
      736 pages
      ISBN:1581137370
      DOI:10.1145/956750

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 August 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader