skip to main content
10.1145/2184751.2184841acmconferencesArticle/Chapter ViewAbstractPublication PagesicuimcConference Proceedingsconference-collections
research-article

Incremental update on probabilistic frequent itemsets in uncertain databases

Published:20 February 2012Publication History

ABSTRACT

Mining frequent itemsets in an uncertain database is a highly complicated problem. Most algorithms focus on improving the mining efficiency with the assumption that the database is static. Uncertain databases, however, are constantly updated with newly appended transactions like certain databases. Some patterns may become obsolete and new ones may emerge due to updates. Remining the whole uncertain database from scratch is very time-consuming owing to the frequentness probabilities computations. To tackle this maintenance problem, we propose an algorithm called p-FUP for efficient incremental update of patterns in an uncertain database. The p-FUP algorithm, inspired by a threshold-based PFI-testing technique and the FUP algorithm, uses approximations to incrementally update and discovers frequent itemsets in the uncertain database. Comprehensive experiments using both real and synthetic datasets show that p-FUP outperforms the re-mining based algorithm of 2.8 times faster in average, and has good linear scalability.

References

  1. Aggarwal, C. C., Li, Y., Wang, J., and Wang, J.: Frequent pattern mining with uncertain data. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29--38, June 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aggarwal, C. C. and Yu, P. S.: A survey of uncertain data algorithms and applications. In IEEE Transactions on Knowledge and Data Engineering, 21(5):609--623, 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Agrawal, R. and Srikant, R.: Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487--499, September 1994 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bernecker, T., Kriegel, H.-P., Renz, M., Verhein, F., and Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119--128, June 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cam, L. L.: An approximation theorem for the Poisson binomial distribution. In Pacific Journal of Mathematics, 10(4):1181--1197, 1960Google ScholarGoogle ScholarCross RefCross Ref
  6. Cheng, R., Kalashnikov, D. V., and Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 551--562, June 2003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cheung, D. W., Han, J., Ng, V. T., and Wong, C. Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In Proceedings of the 12th International Conference on Data Engineering, pp. 106--114, February 1996 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cheung, D. W., Lee, S. D., and Kao, B.: A general incremental technique for maintaining discovered association rules. In Proceedings of the 5th International Conference on Database Systems for Advanced Applications, pp. 185--194, April 1997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chui, C.-K. and Kao, B.: A decremental approach for mining frequent itemsets from uncertain data. In Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 64--75, May 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chui, C.-K., Kao, B., and Hung, E.: Mining frequent itemsets from uncertain data. In Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 47--58, May 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cormode, G., and Garofalakis, M.: Sketching probabilistic data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 281--292, June 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dalvi, N. and Suciu, D.: Efficient query evaluation on probabilistic databases. In The International Journal on Very Large Data Bases, 16(4):523--544, 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Han, J., Pei, J., Yin, Y., and Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. In Data Mining and Knowledge Discovery, 8(1): 53--87, 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Huang, J., Antova, L., Koch, C., and Olteanu, D.: MayBMS: a probabilistic database management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1071--1074, June 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Khoussainova, N., Balazinska, M., and Suciu, D.: Towards correcting input data errors probabilistically using integrity constraints. In Proceedings of the 5th ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 43--50, June 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Leung, C. K.-S, Mateo, M. A. F., and Brajczuk, D. A.: A tree-based approach for frequent pattern mining from uncertain data. In Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 653--661, May 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mutsuzaki, M. et al.: Trio-one: Layering uncertainty and lineage on a conventional dbms. In Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, pp. 269--274, January 2007Google ScholarGoogle Scholar
  18. Park, J. S., Chen, M.-S., and Yu., P. S.: An effective hash-based algorithm for mining association rules. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 175--186, May 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sun, L., Cheng, R., Cheung, D. W., and Cheng, J.: Mining uncertain data with probabilistic guarantees. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 273--282, July 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tang, P. and Peterson E. A.: Mining probabilistic frequent closed itemsets in uncertain databases. In Proceedings of the 49th ACM Annual Southeast Regional Conference, pp. 86--91, March 2011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wang, L., Cheng, R., Lee, S. D., and Cheung, D. W.: Accelerating probabilistic frequent itemset mining: a model-based approach. In Proceedings of the 19th ACM Conference on Information and Knowledge Management, pp. 429--438, October 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zaki, M. J., Parthasarathy, S., Ogihara M., and Li W.: New algorithms for fast discovery of association rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283--286, August 1997Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In Proceedings of the SIGMOD International Conference on Management of Data, pp. 819--832, June 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zheng, Z., Kohavi, R., and Mason L.: Real world performance of association rule algorithms. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 401--406, August 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Incremental update on probabilistic frequent itemsets in uncertain databases

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
      February 2012
      852 pages
      ISBN:9781450311724
      DOI:10.1145/2184751

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 February 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate251of941submissions,27%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader