research-article

Incremental update on probabilistic frequent itemsets in uncertain databases

Authors:
Ming-Yen Lin

Feng Chia University, Taichung, Taiwan

Feng Chia University, Taichung, Taiwan
View Profile

,
Cheng-Tai Fu

Feng Chia University, Taichung, Taiwan

Feng Chia University, Taichung, Taiwan
View Profile

,
Sue-Chen Hsueh

Chaoyang University of Technology, Taichung, Taiwan

Chaoyang University of Technology, Taichung, Taiwan
View Profile

ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and CommunicationFebruary 2012Article No.: 75Pages 1–8https://doi.org/10.1145/2184751.2184841

Published:20 February 2012Publication History

ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication

Pages 1–8

ABSTRACT

Mining frequent itemsets in an uncertain database is a highly complicated problem. Most algorithms focus on improving the mining efficiency with the assumption that the database is static. Uncertain databases, however, are constantly updated with newly appended transactions like certain databases. Some patterns may become obsolete and new ones may emerge due to updates. Remining the whole uncertain database from scratch is very time-consuming owing to the frequentness probabilities computations. To tackle this maintenance problem, we propose an algorithm called p-FUP for efficient incremental update of patterns in an uncertain database. The p-FUP algorithm, inspired by a threshold-based PFI-testing technique and the FUP algorithm, uses approximations to incrementally update and discovers frequent itemsets in the uncertain database. Comprehensive experiments using both real and synthetic datasets show that p-FUP outperforms the re-mining based algorithm of 2.8 times faster in average, and has good linear scalability.

References

Aggarwal, C. C., Li, Y., Wang, J., and Wang, J.: Frequent pattern mining with uncertain data. In Proceedings of the 15^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29--38, June 2009 Google ScholarDigital Library
Aggarwal, C. C. and Yu, P. S.: A survey of uncertain data algorithms and applications. In IEEE Transactions on Knowledge and Data Engineering, 21(5):609--623, 2009 Google ScholarDigital Library
Agrawal, R. and Srikant, R.: Fast algorithms for mining association rules in large databases. In Proceedings of the 20^th International Conference on Very Large Data Bases, pp. 487--499, September 1994 Google ScholarDigital Library
Bernecker, T., Kriegel, H.-P., Renz, M., Verhein, F., and Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In Proceedings of the 15^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119--128, June 2009 Google ScholarDigital Library
Cam, L. L.: An approximation theorem for the Poisson binomial distribution. In Pacific Journal of Mathematics, 10(4):1181--1197, 1960Google ScholarCross Ref
Cheng, R., Kalashnikov, D. V., and Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 551--562, June 2003 Google ScholarDigital Library
Cheung, D. W., Han, J., Ng, V. T., and Wong, C. Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In Proceedings of the 12^th International Conference on Data Engineering, pp. 106--114, February 1996 Google ScholarDigital Library
Cheung, D. W., Lee, S. D., and Kao, B.: A general incremental technique for maintaining discovered association rules. In Proceedings of the 5^th International Conference on Database Systems for Advanced Applications, pp. 185--194, April 1997 Google ScholarDigital Library
Chui, C.-K. and Kao, B.: A decremental approach for mining frequent itemsets from uncertain data. In Proceedings of the 12^th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 64--75, May 2008 Google ScholarDigital Library
Chui, C.-K., Kao, B., and Hung, E.: Mining frequent itemsets from uncertain data. In Proceedings of the 11^th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 47--58, May 2007 Google ScholarDigital Library
Cormode, G., and Garofalakis, M.: Sketching probabilistic data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 281--292, June 2007 Google ScholarDigital Library
Dalvi, N. and Suciu, D.: Efficient query evaluation on probabilistic databases. In The International Journal on Very Large Data Bases, 16(4):523--544, 2007 Google ScholarDigital Library
Han, J., Pei, J., Yin, Y., and Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. In Data Mining and Knowledge Discovery, 8(1): 53--87, 2004 Google ScholarDigital Library
Huang, J., Antova, L., Koch, C., and Olteanu, D.: MayBMS: a probabilistic database management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1071--1074, June 2009 Google ScholarDigital Library
Khoussainova, N., Balazinska, M., and Suciu, D.: Towards correcting input data errors probabilistically using integrity constraints. In Proceedings of the 5^th ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 43--50, June 2006 Google ScholarDigital Library
Leung, C. K.-S, Mateo, M. A. F., and Brajczuk, D. A.: A tree-based approach for frequent pattern mining from uncertain data. In Proceedings of the 12^th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 653--661, May 2008 Google ScholarDigital Library
Mutsuzaki, M. et al.: Trio-one: Layering uncertainty and lineage on a conventional dbms. In Proceedings of the 3^rd Biennial Conference on Innovative Data Systems Research, pp. 269--274, January 2007Google Scholar
Park, J. S., Chen, M.-S., and Yu., P. S.: An effective hash-based algorithm for mining association rules. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 175--186, May 1995 Google ScholarDigital Library
Sun, L., Cheng, R., Cheung, D. W., and Cheng, J.: Mining uncertain data with probabilistic guarantees. In Proceedings of the 16^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 273--282, July 2010 Google ScholarDigital Library
Tang, P. and Peterson E. A.: Mining probabilistic frequent closed itemsets in uncertain databases. In Proceedings of the 49^th ACM Annual Southeast Regional Conference, pp. 86--91, March 2011 Google ScholarDigital Library
Wang, L., Cheng, R., Lee, S. D., and Cheung, D. W.: Accelerating probabilistic frequent itemset mining: a model-based approach. In Proceedings of the 19^th ACM Conference on Information and Knowledge Management, pp. 429--438, October 2010 Google ScholarDigital Library
Zaki, M. J., Parthasarathy, S., Ogihara M., and Li W.: New algorithms for fast discovery of association rules. In Proceedings of the 3^rd International Conference on Knowledge Discovery and Data Mining, pp. 283--286, August 1997Google ScholarDigital Library
Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In Proceedings of the SIGMOD International Conference on Management of Data, pp. 819--832, June 2008 Google ScholarDigital Library
Zheng, Z., Kohavi, R., and Mason L.: Real world performance of association rule algorithms. In Proceedings of the 7^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 401--406, August 2001 Google ScholarDigital Library

Index Terms

Incremental update on probabilistic frequent itemsets in uncertain databases
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Probabilistic frequent itemset mining in uncertain databases
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Probabilistic frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied to standard "certain" transaction databases. The consideration of existential uncertainty of item(sets),...
Read More
Efficient algorithms for mining high-utility itemsets in uncertain databases

High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the ...
Read More
Mining probabilistic frequent closed itemsets in uncertain databases
ACM-SE '11: Proceedings of the 49th Annual Southeast Regional Conference

This paper defines probabilistic support and probabilistic frequent closed itemsets in uncertain databases for the first time. It also proposes a probabilistic frequent closed itemset mining (PFCIM) algorithm to mine probabilistic frequent closed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
February 2012
852 pages
ISBN:9781450311724
DOI:10.1145/2184751
Conference Chairs:
Suk-Han Lee
Sungkyunkwan University, Korea
,
Lajos Hanzo
University of Southampton, UK
,
Roslan Ismail
Universiti Kuala Lumpur, Malaysia
,
Program Chairs:
Dongsoo S. Kim
Indiana University
,
Min Young Chung
Sungkyunkwan University, Korea
,
Sang-Won Lee
Sungkyunkwan University, Korea
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 February 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
frequent itemset mining
incremental update
probabilistic data
probabilistic frequent itemset
uncertain database
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate251of941submissions,27%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 129
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Incremental update on probabilistic frequent itemsets in uncertain databases

ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication

ABSTRACT

References

Cited By

Index Terms

Recommendations

Probabilistic frequent itemset mining in uncertain databases

Efficient algorithms for mining high-utility itemsets in uncertain databases

Mining probabilistic frequent closed itemsets in uncertain databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Incremental update on probabilistic frequent itemsets in uncertain databases

ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication

ABSTRACT

References

Cited By

Index Terms

Recommendations

Probabilistic frequent itemset mining in uncertain databases

Efficient algorithms for mining high-utility itemsets in uncertain databases

Mining probabilistic frequent closed itemsets in uncertain databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media