skip to main content
article

Constrained frequent pattern mining: a pattern-growth view

Authors Info & Claims
Published:01 June 2002Publication History
Skip Abstract Section

Abstract

It has been well recognized that frequent pattern mining plays an essential role in many important data mining tasks. However, frequent pattern mining often generates a very large number of patterns and rules, which reduces not only the efficiency but also the effectiveness of mining. Recent work has highlighted the importance of the constraint-based mining paradigm in the context of mining frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases.Recently, we developed efficient pattern-growth methods for frequent pattern mining. Interestingly, pattern-growth methods are not only efficient but also effective in mining with various constraints. Many tough constraints which cannot be handled by previous methods can be pushed deep into the pattern-growth mining process. In this paper, we overview the principles of pattern-growth methods for constrained frequent pattern mining and sequential pattern mining. Moreover, we explore the power of pattern-growth methods towards mining with tough constraints and highlight some interesting open problems.

References

  1. R. Agarwal, C. Aggarwal, and V. V. V. Prasad. Depth-first generation of large itemsets for association rules. In IBM Technical Report RC21538, July 1999.Google ScholarGoogle Scholar
  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 487-499, Santiago, Chile, Sept. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering (ICDE'95), pages 3-14, Taipei, Taiwan, Mar. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. J. Bayardo. Efficiently mining long patterns from databases. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98), pages 85-93, Seattle, WA, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), pages 359-370, Philadelphia, PA, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. In Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'97), pages 265-276, Tucson, Arizona, May 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 443-452, Heidelberg, Germany, April 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining (KDD'99), pages 43-52, San Diego, CA, Aug. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. In Proc. 2000 Int. Conf. Data Engineering (ICDE'00), pages 512-521, San Diego, CA, Feb. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In Proc. 1999 Int. Conf. Data Engineering (ICDE'99), pages 512-521, Sydney, Australia, Mar. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. In Proc. 1999 Int. Conf. Data Engineering (ICDE'99), pages 106-115, Sydney, Australia, April 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Han and J. Pei. Mining frequent patterns by pattern-growth: Methodology and implications. SIGKDD Explorations (Special Issue on Scalable Data Mining Algorithms), 2:14-20, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. 2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'00), pages 355-359, Boston, MA, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'00), pages 1-12, Dallas, TX, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Kamber, J. Han, and J. Y. Chiang. Metarule-guided mining of multi-dimensional association rules using data cubes. In Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD'97), pages 207-210, Newport Beach, CA, Aug. 1997.Google ScholarGoogle Scholar
  16. M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proc. 3rd Int. Conf. Information and Knowledge Management, pages 401-408, Gaithersburg, Maryland, Nov. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. V. S. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of constrained frequent set queries with 2-variable constraints. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), pages 157-168, Philadelphia, PA, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Lent, A. Swami, and J. Widom. Clustering association rules. In Proc. 1997 Int. Conf. Data Engineering (ICDE'97), pages 220-231, Birmingham, England, April 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), pages 80-86, New York, NY, Aug. 1998.Google ScholarGoogle Scholar
  20. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1:259-289, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98), pages 13-24, Seattle, WA, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. In Proc. 1995 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '95), pages 175-186, San Jose, CA, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Pei and J. Han. Can we push more constraints into frequent pattern mining? In Proc. 2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'00), pages 350-354, Boston, MA, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Pei, J. Han, and L. V. S. Lakshmanan. Mining frequent itemsets with convertible constraints. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 433-332, Heidelberg, Germany, April 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang. H-Mine: Hyper-structure mining of frequent patterns in large databases. In Proc. 2001 Int. Conf. Data Mining (ICDM'01), pages 441-448, San Jose, CA, Nov. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. 2000 ACM-SIGMOD Int. Workshop Data Mining and Knowledge Discovery (DMKD'00), pages 11-20, Dallas, TX, May 2000.Google ScholarGoogle Scholar
  27. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 215-224, Heidelberg, Germany, April 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Pei, J. Han, and W. Wang. Constraint-based sequential pattern mining in large databases. In Submitted for publication, May 2002.Google ScholarGoogle Scholar
  29. S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98), pages 343-354, Seattle, WA, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proc. 1995 Int. Conf. Very Large Data Bases (VLDB'95), pages 432-443, Zurich, Switzerland, Sept. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB'98), pages 594-605, New York, NY, Aug. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD'97), pages 67-73, Newport Beach, CA, Aug. 1997.Google ScholarGoogle Scholar
  33. M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proc. 2002 SIAM Int. Conf. Data Mining, pages 457-473, Arlington, VA, April 2002.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGKDD Explorations Newsletter
    ACM SIGKDD Explorations Newsletter  Volume 4, Issue 1
    June 2002
    75 pages
    ISSN:1931-0145
    EISSN:1931-0153
    DOI:10.1145/568574
    Issue’s Table of Contents

    Copyright © 2002 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 June 2002

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader