ABSTRACT
Mining frequent itemsets in an uncertain database is a highly complicated problem. Most algorithms focus on improving the mining efficiency with the assumption that the database is static. Uncertain databases, however, are constantly updated with newly appended transactions like certain databases. Some patterns may become obsolete and new ones may emerge due to updates. Remining the whole uncertain database from scratch is very time-consuming owing to the frequentness probabilities computations. To tackle this maintenance problem, we propose an algorithm called p-FUP for efficient incremental update of patterns in an uncertain database. The p-FUP algorithm, inspired by a threshold-based PFI-testing technique and the FUP algorithm, uses approximations to incrementally update and discovers frequent itemsets in the uncertain database. Comprehensive experiments using both real and synthetic datasets show that p-FUP outperforms the re-mining based algorithm of 2.8 times faster in average, and has good linear scalability.
- Aggarwal, C. C., Li, Y., Wang, J., and Wang, J.: Frequent pattern mining with uncertain data. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29--38, June 2009 Google ScholarDigital Library
- Aggarwal, C. C. and Yu, P. S.: A survey of uncertain data algorithms and applications. In IEEE Transactions on Knowledge and Data Engineering, 21(5):609--623, 2009 Google ScholarDigital Library
- Agrawal, R. and Srikant, R.: Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487--499, September 1994 Google ScholarDigital Library
- Bernecker, T., Kriegel, H.-P., Renz, M., Verhein, F., and Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119--128, June 2009 Google ScholarDigital Library
- Cam, L. L.: An approximation theorem for the Poisson binomial distribution. In Pacific Journal of Mathematics, 10(4):1181--1197, 1960Google ScholarCross Ref
- Cheng, R., Kalashnikov, D. V., and Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 551--562, June 2003 Google ScholarDigital Library
- Cheung, D. W., Han, J., Ng, V. T., and Wong, C. Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In Proceedings of the 12th International Conference on Data Engineering, pp. 106--114, February 1996 Google ScholarDigital Library
- Cheung, D. W., Lee, S. D., and Kao, B.: A general incremental technique for maintaining discovered association rules. In Proceedings of the 5th International Conference on Database Systems for Advanced Applications, pp. 185--194, April 1997 Google ScholarDigital Library
- Chui, C.-K. and Kao, B.: A decremental approach for mining frequent itemsets from uncertain data. In Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 64--75, May 2008 Google ScholarDigital Library
- Chui, C.-K., Kao, B., and Hung, E.: Mining frequent itemsets from uncertain data. In Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 47--58, May 2007 Google ScholarDigital Library
- Cormode, G., and Garofalakis, M.: Sketching probabilistic data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 281--292, June 2007 Google ScholarDigital Library
- Dalvi, N. and Suciu, D.: Efficient query evaluation on probabilistic databases. In The International Journal on Very Large Data Bases, 16(4):523--544, 2007 Google ScholarDigital Library
- Han, J., Pei, J., Yin, Y., and Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. In Data Mining and Knowledge Discovery, 8(1): 53--87, 2004 Google ScholarDigital Library
- Huang, J., Antova, L., Koch, C., and Olteanu, D.: MayBMS: a probabilistic database management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1071--1074, June 2009 Google ScholarDigital Library
- Khoussainova, N., Balazinska, M., and Suciu, D.: Towards correcting input data errors probabilistically using integrity constraints. In Proceedings of the 5th ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 43--50, June 2006 Google ScholarDigital Library
- Leung, C. K.-S, Mateo, M. A. F., and Brajczuk, D. A.: A tree-based approach for frequent pattern mining from uncertain data. In Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 653--661, May 2008 Google ScholarDigital Library
- Mutsuzaki, M. et al.: Trio-one: Layering uncertainty and lineage on a conventional dbms. In Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, pp. 269--274, January 2007Google Scholar
- Park, J. S., Chen, M.-S., and Yu., P. S.: An effective hash-based algorithm for mining association rules. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 175--186, May 1995 Google ScholarDigital Library
- Sun, L., Cheng, R., Cheung, D. W., and Cheng, J.: Mining uncertain data with probabilistic guarantees. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 273--282, July 2010 Google ScholarDigital Library
- Tang, P. and Peterson E. A.: Mining probabilistic frequent closed itemsets in uncertain databases. In Proceedings of the 49th ACM Annual Southeast Regional Conference, pp. 86--91, March 2011 Google ScholarDigital Library
- Wang, L., Cheng, R., Lee, S. D., and Cheung, D. W.: Accelerating probabilistic frequent itemset mining: a model-based approach. In Proceedings of the 19th ACM Conference on Information and Knowledge Management, pp. 429--438, October 2010 Google ScholarDigital Library
- Zaki, M. J., Parthasarathy, S., Ogihara M., and Li W.: New algorithms for fast discovery of association rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283--286, August 1997Google ScholarDigital Library
- Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In Proceedings of the SIGMOD International Conference on Management of Data, pp. 819--832, June 2008 Google ScholarDigital Library
- Zheng, Z., Kohavi, R., and Mason L.: Real world performance of association rule algorithms. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 401--406, August 2001 Google ScholarDigital Library
Index Terms
- Incremental update on probabilistic frequent itemsets in uncertain databases
Recommendations
Probabilistic frequent itemset mining in uncertain databases
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningProbabilistic frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied to standard "certain" transaction databases. The consideration of existential uncertainty of item(sets),...
Efficient algorithms for mining high-utility itemsets in uncertain databases
High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the ...
Mining probabilistic frequent closed itemsets in uncertain databases
ACM-SE '11: Proceedings of the 49th Annual Southeast Regional ConferenceThis paper defines probabilistic support and probabilistic frequent closed itemsets in uncertain databases for the first time. It also proposes a probabilistic frequent closed itemset mining (PFCIM) algorithm to mine probabilistic frequent closed ...
Comments