ABSTRACT
In order to reduce the number of high-utility itemsets (HUIs), closed high-utility itemsets (CHUIs) have been proposed. However, most techniques for mining CHUIs require certain databases; i.e., there are no probabilities. However, in many real-world applications, an item or itemset may have a probability. Actual data can be affected by the use of noisy sensors. Many algorithms have been proposed to effectively mine frequent itemsets from uncertain databases; however, there are no algorithms for mining CHUIs from uncertain databases. This paper proposes an algorithm called CPHUI-List (closed potential high-utility itemset PEU-List-based mining algorithm) for mining closed potential high-utility itemsets (CPHUIs) from uncertain databases without generating candidates. CPHUI-List performs a depth-first search of the search space, and uses the downward closure property of high transaction-weighed probabilistic and utilization itemsets to prune non-closed potential high-utility itemsets. Experiments show that the runtime and memory consumption of CPHUI-List are lower than those of CHUI-Miner.
- R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487--499, 1994. Google ScholarDigital Library
- C.C. Aggarwal, P.S. Yu, A survey of uncertain data algorithms and applications, IEEE Trans. Knowl. Data Eng, vol. 21, no. 5, pp. 609--623, 2009. Google ScholarDigital Library
- T. Bernecker, H.P. Kriegel, M. Renz, F. Verhein, A. Zuefl, Probabilistic frequent itemset mining in uncertain databases, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119--128, 2009. Google ScholarDigital Library
- R. Chan, Q. Yang, Y.D. Shen, Mining high utility item-sets, Proceedings of the IEEE International Conference on Data Mining, pp. 19--26, 2003. Google ScholarDigital Library
- C.K. Chui, B. Kao, E. Hung, Mining frequent item-sets from uncertain data, Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 47--58, 2007. Google ScholarDigital Library
- J. Han, J.Pei, Y.Yin, R. Mao, Mining frequentpatterns without candidate generation: a frequent-pattern tree approach, Data Min. Knowl. Discov., vol. 8, no. 1, pp. 53--87, 2004. Google ScholarDigital Library
- C.K.S. Leung, M.A.F. Mateo, D.A. Brajczuk, A tree-based approach for frequent pattern mining from uncertain data, Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 653--661, 2008. Google ScholarDigital Library
- C.K. Leung, S.K. Tanbeer, PUF-tree: A compact tree structure for frequent pattern mining of uncertain data, Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Part I, pp. 13--25, 2013.Google ScholarCross Ref
- C.W. Lin, T.P. Hong, A new mining approach for uncertain databases using cufp trees, Expert Syst. Appl, vol 39, issue 4, pp. 4084--4093, 2012. Google ScholarDigital Library
- J. C. W. Lin, W. Gan, P. Fournier-Viger, T.P. Hong, and V.S. Tseng, Efficient algorithms for Mining High-Utility Itemsets with Uncertain Databases, Knowledge-Based Systems, vol. 96, pp. 171--187, 2016. Google ScholarDigital Library
- M. Liu, J. Qu, Mining high utility itemsets without candidate generation, Proc.22nd ACM Intern. Conf. Info. and Know. Management, pp. 55--64, 2012. Google ScholarDigital Library
- D. Nilesh, S. Dan, Efficient query evaluation on probabilistic databases, VLDB J., vol. 16, no. 4, pp. 523--544, 2007. Google ScholarDigital Library
- L. Sun, R. Cheng, D. W. Cheung, J. Cheng, Mining uncertain data with probabilistic guarantees, The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 273--282, 2010. Google ScholarDigital Library
- Y. Tong, L. Chen, Y. Cheng, P.S. Yu, Mining frequent itemsets over uncertain databases, VLDB Endow. 5, no. 11, pp. 1650--1661, 2012. Google ScholarDigital Library
- V. S. Tseng, C.W. Wu, B.E. Shie, and P. S. Yu, UP-Growth: an efficient algorithm for high utility itemset mining, Proc. of ACM SIGKDD, pp. 253--262, 2010. Google ScholarDigital Library
- V. S. Tseng, B.-E. Shie, C.-W. Wu, and P. S. Yu, Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Transactions on Knowledge and Data Engineering, vol. 25, issue 8, pp.1772--1786, 2013. Google ScholarDigital Library
- V. S. Tseng, C.W. Wu, P. Fournier Viger and P. S. Yu, Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets, IEEE Transactions on Knowledge and Data Engineering, vol. 27, issue 3, pp. 726--739, 2015.Google ScholarDigital Library
- C.W. Wu, P.F. Viger, J.Y. Gu, V.S. Tseng, Mining closed+ high utility itemsets without candidate generation, IEEE 2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 187--194, 2015.Google ScholarCross Ref
- S. Zida, P.F. Viger, J.C.W. Lin, C.W. Wu, V.S. Tseng, EFIM: A fast and memory efficient algorithm for high-utility itemset mining, Knowledge and Information Systems (Accepted, 2016).Google Scholar
Index Terms
- Mining closed high utility itemsets in uncertain databases
Recommendations
Efficient algorithms for mining high-utility itemsets in uncertain databases
High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the ...
Efficiently mining uncertain high-utility itemsets
Data mining consists of deriving implicit, potentially meaningful and useful knowledge from databases such as information about the most profitable items. High-utility itemset mining (HUIM) has thus emerged as an important research topic in data mining. ...
Mining Potential High-Utility Itemsets over Uncertain Databases
ASE BD&SI '15: Proceedings of the ASE BigData & SocialInformatics 2015Traditional high-utility itemsets mining (HUIM) incorporates the concept of utility (e.g., profit) over certain databases. However, an item or itemset is not only present or absent in the transactions but also associated with an existing probability ...
Comments