ABSTRACT
A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.
- R. Agrawal, et al. Fast discovery of association rules. In U. Fayyad and et al (eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996.]] Google ScholarDigital Library
- Ramesh Agrawal, Charu Aggarwal, and V. V. V. Prasad. Depth First Generation of Long Patterns. In 7th Int'l Conference on Knowledge Discovery and Data Mining, August 2000.]] Google ScholarDigital Library
- Jay Ayres, J. E. Gehrke, Tomi Yiu, and Jason Flannick. Sequential pattern mining using bitmaps. In SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, July 2002.]] Google ScholarDigital Library
- R. J. Bayardo. Efficiently mining long patterns from databases. In ACM SIGMOD Conf. Management of Data, June 1998.]] Google ScholarDigital Library
- S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In ACM SIGMOD Conf. Management of Data, May 1997.]] Google ScholarDigital Library
- D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: a maximal frequent itemset algorithm for transactional databases. In Intl. Conf. on Data Engineering, April 2001.]] Google ScholarDigital Library
- B. Dunkel and N. Soparkar. Data organization and access for efficient data mining. In 15th IEEE Intl. Conf. on Data Engineering, March 1999.]] Google ScholarDigital Library
- K. Gouda and M. J. Zaki. Efficiently mining maximal frequent itemsets. In 1st IEEE Int'l Conf. on Data Mining, November 2001.]] Google ScholarDigital Library
- D. Gunopulos, H. Mannila, and S. Saluja. Discovering all the most specific sentences by randomized algorithms. In Intl. Conf. on Database Theory, January 1997.]] Google ScholarDigital Library
- J. Han and M. Kamber. Data Minng: Concepts and Techniuqes. Morgan kaufmann Publishers, San Francisco, CA, 2001.]] Google ScholarDigital Library
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD Conf. Management of Data, May 2000.]] Google ScholarDigital Library
- D-I. Lin and Z. M. Kedem. Pincer-search: A new algorithm for discovering the maximum frequent set. In 6th Intl. Conf. Extending Database Technology, March 1998.]] Google ScholarDigital Library
- J-L. Lin and M. H. Dunham. Mining association rules: Anti-skew algorithms. In 14th Intl. Conf. on Data Engineering, February 1998.]] Google ScholarDigital Library
- J. S. Park, M. Chen, and P. S. Yu. An effective hash based algorithm for mining association rules. In Intl. Conf. Management of Data, May 1995.]] Google ScholarDigital Library
- N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In 7th Intl. Conf. on Database Theory, January 1999.]] Google ScholarDigital Library
- J. Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In SIGMOD Int'l Workshop on Data Mining and Knowledge Discovery, May 2000.]]Google Scholar
- S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with databases: alternatives and implications. In ACM Intl. Conf. Management of Data, June 1998.]] Google ScholarDigital Library
- A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In 21st VLDB Conf., 1995.]] Google ScholarDigital Library
- J. Shafer, R. Agrawal, and M. Mehta. Sprint: A scalable parallel classifier for data mining. In 22nd VLDB Conference, March 1996.]] Google ScholarDigital Library
- P. Shenoy, et al. Turbo-charging vertical mining of large databases. In Intl. Conf. Management of Data, May 2000.]] Google ScholarDigital Library
- M. J. Zaki. Generating non-redundant association rules. In Int'l Conf. Knowledge Discovery and Data Mining, August 2000.]] Google ScholarDigital Library
- M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3):372--390, May-June 2000.]] Google ScholarDigital Library
- M. J. Zaki and C.-J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In 2nd SIAM Int'l Conf. on Data Mining, April 2002.]]Google Scholar
- M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, August 1997.]]Google Scholar
Index Terms
- Fast vertical mining using diffsets
Recommendations
New approach in data stream association rule mining based on graph structure
ICDM'10: Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspectsDiscovery of useful information and valuable knowledge from transactions has attracted many researchers due to increasing use of very large databases and data warehouses. Furthermore most of proposed methods are designed to work on traditional databases ...
A time- and memory-efficient frequent itemset discovering algorithm for association rule mining
Frequent itemset discovering is a highly researched area in the field of data mining. The algorithms dealing with this problem have several advantages and disadvantages regarding their time complexity, I/O cost and memory requirement. There are ...
Finding frequent itemsets by transaction mapping
SAC '05: Proceedings of the 2005 ACM symposium on Applied computingIn this paper, we present a novel algorithm for mining complete frequent itemsets. This algorithm is referred to as the TM algorithm from hereon. In this algorithm, we employ the vertical representation of a database. Transaction ids of each itemset are ...
Comments