Skip to main content
Log in

Weighted frequent itemset mining over uncertain databases

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Frequent itemset mining (FIM) is a fundamental research topic, which consists of discovering useful and meaningful relationships between items in transaction databases. However, FIM suffers from two important limitations. First, it assumes that all items have the same importance. Second, it ignores the fact that data collected in a real-life environment is often inaccurate, imprecise, or incomplete. To address these issues and mine more useful and meaningful knowledge, the problems of weighted and uncertain itemset mining have been respectively proposed, where a user may respectively assign weights to items to specify their relative importance, and specify existential probabilities to represent uncertainty in transactions. However, no work has addressed both of these issues at the same time. In this paper, we address this important research problem by designing a new type of patterns named high expected weighted itemset (HEWI) and the HEWI-Uapriori algorithm to efficiently discover HEWIs. The HEWI-Uapriori finds HEWIs using an Apriori-like two-phase approach. The algorithm introduces a property named high upper-bound expected weighted downward closure (HUBEWDC) to early prune the search space and unpromising itemsets. Substantial experiments on real-life and synthetic datasets are conducted to evaluate the performance of the proposed algorithm in terms of runtime, memory consumption, and number of patterns found. Results show that the proposed algorithm has excellent performance and scalability compared with traditional methods for weighted-itemset mining and uncertain itemset mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. (2012) Frequent itemset mining dataset repository. Available: http://fimi.ua.ac.be/data/

  2. Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data, ACM SIGKDD international conference on knowledge discovery and data mining, pp 29–38

  3. Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623

    Article  Google Scholar 

  4. Aggarwal CC (2010) Managing and mining uncertain data, managing and mining uncertain data

  5. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases, the international conference on very large data bases, pp 487–499

  6. Agrawal R, Srikant R (1994) Quest synthetic data generator. Available: http://www.Almaden.ibm.com/cs/quest/syndata.html

  7. Bernecker T, Kriegel HP, Renz M, Verhein F, Zuefl A (2009) Probabilistic frequent itemset mining in uncertain databases, ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128

  8. Cai CH, Fu AWC, Cheng CH, Kwong WW (1998) Mining association rules with weighted items, the international database engineering and applications symposium, pp 68–77

  9. Chen MS, Han J, Yu PS (1996) Data mining: An overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866– 883

    Article  Google Scholar 

  10. Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data, advances in knowledge discovery and data mining

  11. Geng L, Hamilton H J Interestingness measures for data mining: A survey. ACM Comput Surv 38(3):2006

  12. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87

    Article  MathSciNet  Google Scholar 

  13. Lan GC, Hong TP, Lee HY, Lin C W (2013) Mining weighted frequent itemsets, the workshop on combinatorial mathematics and computation theory, pp 85–89

  14. Lan GC, Hong TP, Lee HY (2014) An efficient approach for finding weighted sequential patterns from sequence databases. Appl Intell 41(2):439–452

    Article  Google Scholar 

  15. Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data, advances in knowledge discovery and data mining, pp 653–661

  16. Leung CKS, Hao B (2009) Mining of frequent itemsets from streams of uncertain data, IEEE international conference on data engineering, pp 1663–1670

  17. Lin CW, Hong TP, Lu WH (2009) The Pre-FUFP algorithm for incremental mining. Expert Syst Appl 36(5):9498– 9505

    Article  Google Scholar 

  18. Lin CW, Hong TP (2011) Temporal data mining with up-to-date pattern trees. Expert Syst Appl 38 (12):15143–15150

    Article  Google Scholar 

  19. Lin CW, Hong TP (2012) A new mining approach for uncertain databases using cufp trees. Expert Syst Appl 39(4):4084– 4093

    Article  MathSciNet  Google Scholar 

  20. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, et al. (2004) Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  21. Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements, the international conference on extending database technology: advances in database technology, pp 3–17

  22. Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees, the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 273–282

  23. Tang P, Peterson E A (2011) Mining probabilistic frequent closed itemsets in uncertain databases. The Annual Southeast Regional Conference, pp 86–91

  24. Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework, ACM SIGKDD international conference on knowledge discovery and data mining, pp 661–666

  25. Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. The VLDB Endowment 5(11):1650–1661

    Article  Google Scholar 

  26. Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on wit-trees. Expert Syst Appl 40(4):1256–1264

    Article  Google Scholar 

  27. Wang W, Yang J, Yu PS (2000) Efficient mining of weighted association rules (war), ACM SIGKDD international conference on knowledge discovery and data mining, pp 270–274

  28. Yun U, Leggett J (2005) WFIM: Weighted frequent itemset mining with a weight range and a minimum weight, SIAM international conference on data mining, pp 636–640

  29. Yun U, Leggett J (2006) WSpan: Weighted sequential pattern mining in large sequential database, IEEE international conference on intelligent systems, pp 512–517

  30. Yun U (2008) A new framework for detecting weighted sequential patterns in large sequence databases. Knowl-Based Syst 21(2):110–122

    Article  Google Scholar 

  31. Zaki M, Hsiao C (2002) CHARM: An efficient algorithm for closed itemset mining. SIAM Int Conf Data Min 2:457–473

    Google Scholar 

Download references

Acknowledgments

This research was partially supported by the Tencent Project under grant CCF-TencentRAGR20140114, by the Shenzhen Peacock Project, China, under grant KQC201109020055A, by the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under grant HIT.NSRIF.2014100, and by the Shenzhen Strategic Emerging Industries Program under grant ZDSY20120613125016389.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerry Chun-Wei Lin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, J.CW., Gan, W., Fournier-Viger, P. et al. Weighted frequent itemset mining over uncertain databases. Appl Intell 44, 232–250 (2016). https://doi.org/10.1007/s10489-015-0703-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0703-9

Keywords

Navigation