ABSTRACT
In recent years, mining frequent itemsets over uncertain data has attracted much attention in the data mining community. Unlike the corresponding problem in deterministic data, the frequent itemset under uncertain data has two different definitions: the expected support-based frequent itemset and the probabilistic frequent itemset. Most existing works only focus on one of the definitions and no comprehensive study is conducted to compare the two different definitions. Moreover, due to lacking the uniform implementation platform, existing solutions for the same definition even generate inconsistent results. In this demo, we present a demonstration called as UFIMT (underline Uncertain Frequent Itemset Mining Toolbox) which not only discovers frequent itemsets over uncertain data but also compares the performance of different algorithms and demonstrates the relationship between different definitions. In this demo, we firstly present important techniques and implementation skills of the mining problem, secondly, we show the system architecture of UFIMT, thirdly, we report an empirical analysis on extensive both real and synthetic benchmark data sets, which are used to compare different algorithms and to show the close relationship between two different frequent itemset definitions, and finally we discuss some existing challenges and new findings.
- C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD'09.. Google ScholarDigital Library
- T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Züfle. Probabilistic frequent itemset mining in uncertain databases. In KDD'09. Google ScholarDigital Library
- T. Calders, C. Garboni, and B. Goethals. Approximation of frequentness probability of itemsets in uncertain data. In ICDM'10. Google ScholarDigital Library
- C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD'07. Google ScholarDigital Library
- C. K.-S. Leung, M. A. F. Mateo, and D. A. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In PAKDD'08. Google ScholarDigital Library
- L. Sun, R. Cheng, D. W. Cheung, and J. Cheng. Mining uncertain data with probabilistic guarantees. In KDD'10. Google ScholarDigital Library
- Y. Tong, L. Chen, Y. Cheng, and P. S. Yu. Mining frequent itemsets over uncertain databases. In VLDB'12. Google ScholarDigital Library
- Y. Tong, L. Chen, and B. Ding. Discovering threshold-based frequent closed itemsets over probabilistic data. In ICDE'12. Google ScholarDigital Library
- L. Wang, R. Cheng, S. D. Lee, and D. W.-L. Cheung. Accelerating probabilistic frequent itemset mining: a model-based approach. In CIKM'10. Google ScholarDigital Library
Index Terms
- UFIMT: an uncertain frequent itemset mining toolbox
Recommendations
Efficient algorithms for mining high-utility itemsets in uncertain databases
High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the ...
Incremental update on probabilistic frequent itemsets in uncertain databases
ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and CommunicationMining frequent itemsets in an uncertain database is a highly complicated problem. Most algorithms focus on improving the mining efficiency with the assumption that the database is static. Uncertain databases, however, are constantly updated with newly ...
Mining closed high utility itemsets in uncertain databases
SoICT '16: Proceedings of the 7th Symposium on Information and Communication TechnologyIn order to reduce the number of high-utility itemsets (HUIs), closed high-utility itemsets (CHUIs) have been proposed. However, most techniques for mining CHUIs require certain databases; i.e., there are no probabilities. However, in many real-world ...
Comments