An efficient structure for fast mining high utility itemsets

Deng, Zhi-Hong

doi:10.1007/s10489-017-1130-x

An efficient structure for fast mining high utility itemsets

Published: 08 February 2018

Volume 48, pages 3161–3177, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhi-Hong Deng¹

402 Accesses
18 Citations
Explore all metrics

Abstract

High utility itemset mining has emerged to be an important research issue in data mining since it has a wide range of real life applications. Although a number of algorithms have been proposed in recent years, the mining efficiency is still a big challenge since these algorithms suffer from either the problem of low efficiency of calculating candidates’ utilities or the problem of generating huge number of candidates. In this paper, we propose a novel data structure named PUN-list (PU-tree-Node list), which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, named MIP (Mining high utility Itemset using PUN-Lists), for efficiently mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, named PUN-list, which avoids costly and repeated utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be efficiently constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, named set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that MIP is very efficient since it is much faster than HUI-Miner, d2HUP, and UP-Growth + , especially on dense datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

A survey of density based clustering algorithms

Article 29 September 2020

Data distribution tailoring revisited: cost-efficient integration of representative data

Article 12 April 2024

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: SIGMOD, vol 1993, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: VLDB, vol 1994, pp 487–499
Ahmed CF, Tanbeer SK, Jeong B (2010) Mining high utility web access sequences in dynamic web log data. In: SNPD, vol 2010, pp 76–81
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2011) HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198
Article Google Scholar
Chan R, Yang Q, Shen Y (2003) Mining high utility itemsets. In: ICDM, vol 2003, pp 19–26
Dawar S, Goyal V (2015) UP-Hist tree: an efficient data structure for mining high utility patterns from transaction databases. In: IDEAS, vol 2015, pp 56–61
Deng ZH (2016) Diffnodesets: an efficient structure for fast mining frequent itemsets. Appl Soft Comput 41:214–223
Article Google Scholar
Deng ZH, Lv SL (2014) Fast mining frequent itemsets using Nodesets. Expert Syst Appl 41(10):4505–4512
Article Google Scholar
Deng ZH, Lv SL (2015) PrePost + : an efficient N-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Syst Appl 42(13):5424–5432
Article Google Scholar
Deng ZH, Wang ZH (2010) A new fast vertical method for mining frequent itemsets. Intern J Comput Intell Syst 3(6):733–744
Article Google Scholar
Deng ZH, Wang ZH, Jiang JJ (2012) A new algorithm for fast mining frequent itemsets using N-lists. Sci China Inform Sci 55(9):2008–2030
Article MathSciNet MATH Google Scholar
Erwin A, Gopalan RP, Achuthan NR (2008) Efficient mining of high utility itemsets from large datasets. In: PAKDD, vol 2008, pp 554–561
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Article Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent itemsets without candidate generation. In: SIGMOD, vol 2000, pp 1–12
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Article Google Scholar
Lan GC, Hong TP, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inform Syst 38(1):85–107
Article Google Scholar
Li HF, Huang HY, Chen YC, Liu YJ, Lee SY (2008) Fast and memory efficient mining of high utility itemsets in data streams. In: ICDM, vol 2008, pp 881–886
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Article Google Scholar
Liu Y, Liao WK, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets, vol 2005, pp 689–695
Liu M, Qu JF (2012) Mining high utility itemsets without candidate generation. In: CIKM, vol 2012, pp 55–64
Liu J, Wang K, Fung BCM (2012) Direct discovery of high utility itemsets without candidate generation. In: ICDM, vol 2012, pp 984–989
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent itemsets in large databases. In: ICDM, vol 2001, pp 441–448
Rymon R (1992) Search through systematic set enumeration. In: KR, vol 1992, pp 539–550
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Article Google Scholar
Tseng VS, Wu CW, Shie BE, Yu PS (2010) Up-rowth: an efficient algorithm for high utility itemset mining. In: SIGKDD, vol 2010, pp 253–262
Tseng VS, Wu CW, Fournier-Viger P, Yu PS (2015) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27(3):726–739
Article Google Scholar
Vo B, Coenen F, Le T, Hong TP (2013) A hybrid approach for mining frequent itemsets. In: SMC, vol 2013, pp 4647–4651
Vo B, Le T, Coenen F, Hong TP (2016) Mining frequent itemsets using the N-list and subsume concepts. Intern J Mach Learn Cybern 7:253–265
Article Google Scholar
Wu CW, Shie BE, Tseng VS, Yu PS (2012) Mining top-K high utility itemsets. In: SIGKDD, vol 2012, pp 78–86
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: SDM, vol 2004, pp 482–486
Yen SJ, Lee YS (2007) Mining high utility quantitative association rules. In: Dawak, vol 2007, pp 283–292
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with tech-niques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878
Article Google Scholar
Yun U, Ryang H (2015) Incremental high utility pattern mining with static and dynamic databases. Appl Intell 42(2):323–352
Article Google Scholar
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: SIGKDD, vol 2003, pp 326–335

Download references

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (Grant No. 61170091).

Author information

Authors and Affiliations

Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Zhi-Hong Deng

Authors

Zhi-Hong Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi-Hong Deng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, ZH. An efficient structure for fast mining high utility itemsets. Appl Intell 48, 3161–3177 (2018). https://doi.org/10.1007/s10489-017-1130-x

Download citation

Published: 08 February 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10489-017-1130-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient structure for fast mining high utility itemsets

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A survey of density based clustering algorithms

Data distribution tailoring revisited: cost-efficient integration of representative data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient structure for fast mining high utility itemsets

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A survey of density based clustering algorithms

Data distribution tailoring revisited: cost-efficient integration of representative data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation