Article

Fast vertical mining using diffsets

Authors:
Mohammed J. Zaki

Rensselaer Polytechnic Institute, Troy, NY

Rensselaer Polytechnic Institute, Troy, NY
View Profile

,
Karam Gouda

Faculty of Science, Benha, Egypt

Faculty of Science, Benha, Egypt
View Profile

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2003Pages 326–335https://doi.org/10.1145/956750.956788

Published:24 August 2003Publication History

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 326–335

ABSTRACT

A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.

References

R. Agrawal, et al. Fast discovery of association rules. In U. Fayyad and et al (eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996.]] Google ScholarDigital Library
Ramesh Agrawal, Charu Aggarwal, and V. V. V. Prasad. Depth First Generation of Long Patterns. In 7th Int'l Conference on Knowledge Discovery and Data Mining, August 2000.]] Google ScholarDigital Library
Jay Ayres, J. E. Gehrke, Tomi Yiu, and Jason Flannick. Sequential pattern mining using bitmaps. In SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, July 2002.]] Google ScholarDigital Library
R. J. Bayardo. Efficiently mining long patterns from databases. In ACM SIGMOD Conf. Management of Data, June 1998.]] Google ScholarDigital Library
S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In ACM SIGMOD Conf. Management of Data, May 1997.]] Google ScholarDigital Library
D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: a maximal frequent itemset algorithm for transactional databases. In Intl. Conf. on Data Engineering, April 2001.]] Google ScholarDigital Library
B. Dunkel and N. Soparkar. Data organization and access for efficient data mining. In 15th IEEE Intl. Conf. on Data Engineering, March 1999.]] Google ScholarDigital Library
K. Gouda and M. J. Zaki. Efficiently mining maximal frequent itemsets. In 1st IEEE Int'l Conf. on Data Mining, November 2001.]] Google ScholarDigital Library
D. Gunopulos, H. Mannila, and S. Saluja. Discovering all the most specific sentences by randomized algorithms. In Intl. Conf. on Database Theory, January 1997.]] Google ScholarDigital Library
J. Han and M. Kamber. Data Minng: Concepts and Techniuqes. Morgan kaufmann Publishers, San Francisco, CA, 2001.]] Google ScholarDigital Library
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD Conf. Management of Data, May 2000.]] Google ScholarDigital Library
D-I. Lin and Z. M. Kedem. Pincer-search: A new algorithm for discovering the maximum frequent set. In 6th Intl. Conf. Extending Database Technology, March 1998.]] Google ScholarDigital Library
J-L. Lin and M. H. Dunham. Mining association rules: Anti-skew algorithms. In 14th Intl. Conf. on Data Engineering, February 1998.]] Google ScholarDigital Library
J. S. Park, M. Chen, and P. S. Yu. An effective hash based algorithm for mining association rules. In Intl. Conf. Management of Data, May 1995.]] Google ScholarDigital Library
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In 7th Intl. Conf. on Database Theory, January 1999.]] Google ScholarDigital Library
J. Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In SIGMOD Int'l Workshop on Data Mining and Knowledge Discovery, May 2000.]]Google Scholar
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with databases: alternatives and implications. In ACM Intl. Conf. Management of Data, June 1998.]] Google ScholarDigital Library
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In 21st VLDB Conf., 1995.]] Google ScholarDigital Library
J. Shafer, R. Agrawal, and M. Mehta. Sprint: A scalable parallel classifier for data mining. In 22nd VLDB Conference, March 1996.]] Google ScholarDigital Library
P. Shenoy, et al. Turbo-charging vertical mining of large databases. In Intl. Conf. Management of Data, May 2000.]] Google ScholarDigital Library
M. J. Zaki. Generating non-redundant association rules. In Int'l Conf. Knowledge Discovery and Data Mining, August 2000.]] Google ScholarDigital Library
M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3):372--390, May-June 2000.]] Google ScholarDigital Library
M. J. Zaki and C.-J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In 2nd SIAM Int'l Conf. on Data Mining, April 2002.]]Google Scholar
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, August 1997.]]Google Scholar

Index Terms

Fast vertical mining using diffsets
1. Information systems
  1. Information systems applications

Recommendations

New approach in data stream association rule mining based on graph structure
ICDM'10: Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects

Discovery of useful information and valuable knowledge from transactions has attracted many researchers due to increasing use of very large databases and data warehouses. Furthermore most of proposed methods are designed to work on traditional databases ...
Read More
A time- and memory-efficient frequent itemset discovering algorithm for association rule mining

Frequent itemset discovering is a highly researched area in the field of data mining. The algorithms dealing with this problem have several advantages and disadvantages regarding their time complexity, I/O cost and memory requirement. There are ...
Read More
Finding frequent itemsets by transaction mapping
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing

In this paper, we present a novel algorithm for mining complete frequent itemsets. This algorithm is referred to as the TM algorithm from hereon. In this algorithm, we employ the vertical representation of a database. Transaction ids of each itemset are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Conference Chair:
Lise Getoor
University of Maryland, College Park
,
General Chair:
Ted Senator
DARPA
,
Program Chairs:
Pedro Domingos
University of Washington
,
Christos Faloutsos
Carnegie Mellon University
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
association rule mining
diffsets
frequent itemsets
Qualifiers
- Article
Conference

Acceptance Rates
KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 246
  Total Citations
  View Citations
- 1,503
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast vertical mining using diffsets

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

New approach in data stream association rule mining based on graph structure

A time- and memory-efficient frequent itemset discovering algorithm for association rule mining

Finding frequent itemsets by transaction mapping