Skip to main content
Top
Published in: Soft Computing 11/2017

02-05-2016 | Focus

Efficiently mining uncertain high-utility itemsets

Authors: Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, Vincent S. Tseng

Published in: Soft Computing | Issue 11/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data mining consists of deriving implicit, potentially meaningful and useful knowledge from databases such as information about the most profitable items. High-utility itemset mining (HUIM) has thus emerged as an important research topic in data mining. But most HUIM algorithms can only handle precise data, although big data collected in real-life applications using experimental measurements or noisy sensors is often uncertain. In this paper, an efficient algorithm, named Mining Uncertain High-Utility Itemsets (MUHUI), is proposed to efficiently discover potential high-utility itemsets (PHUIs) in uncertain data. Based on the probability-utility-list (PU-list) structure, the MUHUI algorithm directly mines PHUIs without generating candidates, and can avoid constructing PU-lists for numerous unpromising itemsets by applying several efficient pruning strategies, which greatly improve its performance. Extensive experiments conducted on both real-life and synthetic datasets show that the proposed algorithm significantly outperforms the state-of-the-art PHUI-List algorithm in terms of efficiency and scalability, and that the proposed MUHUI algorithm scales well when mining PHUIs in large-scale uncertain datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Aggarwal CC (2010) Managing and mining uncertain data, managing and mining uncertain data Aggarwal CC (2010) Managing and mining uncertain data, managing and mining uncertain data
go back to reference Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 29–38 Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 29–38
go back to reference Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623CrossRef Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623CrossRef
go back to reference Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925CrossRef Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925CrossRef
go back to reference Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large database. In: The ACM SIGMOD International Conference on Management of Data, pp 207–216 Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large database. In: The ACM SIGMOD International Conference on Management of Data, pp 207–216
go back to reference Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: International Conference on Very Large Data Bases, pp 487–499 Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: International Conference on Very Large Data Bases, pp 487–499
go back to reference Ahmed CF, Tanbeer SK, Jeong BS, Le YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721CrossRef Ahmed CF, Tanbeer SK, Jeong BS, Le YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721CrossRef
go back to reference Bernecker T, Kriegel HP, Renz M, Verhein F, Zuefl A (2009) Probabilistic frequent itemset mining in uncertain databases. In: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 119–128 Bernecker T, Kriegel HP, Renz M, Verhein F, Zuefl A (2009) Probabilistic frequent itemset mining in uncertain databases. In: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 119–128
go back to reference Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: IEEE International Conference on Data Mining, pp 19–26 Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: IEEE International Conference on Data Mining, pp 19–26
go back to reference Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883CrossRef Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883CrossRef
go back to reference Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Advances in Knowledge Discovery and Data Mining, pp 47–58 Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Advances in Knowledge Discovery and Data Mining, pp 47–58
go back to reference Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 217–228 Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 217–228
go back to reference Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. Found Intell Syst 8502:83–92 Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. Found Intell Syst 8502:83–92
go back to reference Fournier-Viger P, Zida S (2016) FOSHU: Faster on-shelf high utility itemset mining—with or without negative unit profit. In: The 30th Symposium on Applied Computing, pp 857–864 Fournier-Viger P, Zida S (2016) FOSHU: Faster on-shelf high utility itemset mining—with or without negative unit profit. In: The 30th Symposium on Applied Computing, pp 857–864
go back to reference Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9 (Article 9) Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9 (Article 9)
go back to reference Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87MathSciNetCrossRef Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87MathSciNetCrossRef
go back to reference Lan GC, Hong TP, Tseng VS (2011) Discovery of high utility itemsets from on-shelf time periods of products. Expert Syst Appl 38(5):5851–5857CrossRef Lan GC, Hong TP, Tseng VS (2011) Discovery of high utility itemsets from on-shelf time periods of products. Expert Syst Appl 38(5):5851–5857CrossRef
go back to reference Lan GC, Hong TP, Huang JP, Tseng VS (2014) On-shelf utility mining with negative item values. Expert Syst Appl 41(7):3450–3459CrossRef Lan GC, Hong TP, Huang JP, Tseng VS (2014) On-shelf utility mining with negative item values. Expert Syst Appl 41(7):3450–3459CrossRef
go back to reference Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: Advances in Knowledge Discovery and Data Mining, pp 653–661 Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: Advances in Knowledge Discovery and Data Mining, pp 653–661
go back to reference Lin JCW, Gan W, Fournier-Viger P, Hong TP (2015) Mining high-utility itemsets with multiple minimum utility thresholds. In: ACM International C* Conference on Computer Science & Software Engineering, pp 9–17 Lin JCW, Gan W, Fournier-Viger P, Hong TP (2015) Mining high-utility itemsets with multiple minimum utility thresholds. In: ACM International C* Conference on Computer Science & Software Engineering, pp 9–17
go back to reference Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2015) Mining potential high-utility itemsets over uncertain databases. In: ACM 5th ASE BigData & SocialInformatics, pp 25 Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2015) Mining potential high-utility itemsets over uncertain databases. In: ACM 5th ASE BigData & SocialInformatics, pp 25
go back to reference Lin JCW, Gan W, Hong TP, Zhang B (2015) An incremental high-utility mining algorithm with transaction insertion. Sci World J Lin JCW, Gan W, Hong TP, Zhang B (2015) An incremental high-utility mining algorithm with transaction insertion. Sci World J
go back to reference Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424 Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424
go back to reference Lin CW, Hong TP, Lan GC, Wong JW, Lin WY (2015) Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases. Adv Eng Inform 29(1):16–27CrossRef Lin CW, Hong TP, Lan GC, Wong JW, Lin WY (2015) Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases. Adv Eng Inform 29(1):16–27CrossRef
go back to reference Lin JCW, Gan W, Hong TP (2015) A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification. Adv Eng Inform 29(3):562–574CrossRef Lin JCW, Gan W, Hong TP (2015) A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification. Adv Eng Inform 29(3):562–574CrossRef
go back to reference Lin JCW, Gan W, Hong TP, Tseng VS (2015) Efficient algorithms for mining up-to-date high-utility patterns. Adv Eng Inform 29(3):648–661CrossRef Lin JCW, Gan W, Hong TP, Tseng VS (2015) Efficient algorithms for mining up-to-date high-utility patterns. Adv Eng Inform 29(3):648–661CrossRef
go back to reference Lin CW, Hong TP (2012) A new mining approach for uncertain databases using CUFP trees. Expert Syst Appl 39(4):4084–4093CrossRef Lin CW, Hong TP (2012) A new mining approach for uncertain databases using CUFP trees. Expert Syst Appl 39(4):4084–4093CrossRef
go back to reference Liu C, Chen L, Zhang C (2013) Summarizing probabilistic frequent patterns: a fast approach. In: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 527–535 Liu C, Chen L, Zhang C (2013) Summarizing probabilistic frequent patterns: a fast approach. In: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 527–535
go back to reference Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in Knowledge Discovery and Data Mining, pp 689–695 Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in Knowledge Discovery and Data Mining, pp 689–695
go back to reference Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: ACM International Conference on Information and Knowledge Management, pp 55–64 Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: ACM International Conference on Information and Knowledge Management, pp 55–64
go back to reference Nilesh D, Dan S (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523–544CrossRef Nilesh D, Dan S (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523–544CrossRef
go back to reference Rymon R (1992) Search through systematic set enumeration. In: International Conference Principles of Knowledge Representation and Reasoning, pp 539–550 Rymon R (1992) Search through systematic set enumeration. In: International Conference Principles of Knowledge Representation and Reasoning, pp 539–550
go back to reference Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 273–282 Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 273–282
go back to reference Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. VLDB Endow 5(11):1650–1661CrossRef Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. VLDB Endow 5(11):1650–1661CrossRef
go back to reference Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-growth: an efficient algorithm for high utility itemset mining. In: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 253–262 Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-growth: an efficient algorithm for high utility itemset mining. In: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 253–262
go back to reference Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786CrossRef Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786CrossRef
go back to reference Wang L, Cheung DL, Cheng R, Lee SD, Yang XS (2012) Efficient mining of frequent item sets on large uncertain databases. IEEE Trans Knowl Data Eng 24(12):2170–2183CrossRef Wang L, Cheung DL, Cheng R, Lee SD, Yang XS (2012) Efficient mining of frequent item sets on large uncertain databases. IEEE Trans Knowl Data Eng 24(12):2170–2183CrossRef
go back to reference Wang L, Cheng R, Lee SD, Cheung D (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: The 19th ACM International Conference on Information and Knowledge Managemen, pp 429–438 Wang L, Cheng R, Lee SD, Cheung D (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: The 19th ACM International Conference on Information and Knowledge Managemen, pp 429–438
go back to reference Wu CW, Shie BE, Tseng VS, Yu PS (2012) Mining top-\(k\) high utility itemsets. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 78–86 Wu CW, Shie BE, Tseng VS, Yu PS (2012) Mining top-\(k\) high utility itemsets. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 78–86
go back to reference Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: The SIAM International Conference on Data Mining, pp 211–225 Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: The SIAM International Conference on Data Mining, pp 211–225
go back to reference Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626CrossRef Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626CrossRef
Metadata
Title
Efficiently mining uncertain high-utility itemsets
Authors
Jerry Chun-Wei Lin
Wensheng Gan
Philippe Fournier-Viger
Tzung-Pei Hong
Vincent S. Tseng
Publication date
02-05-2016
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 11/2017
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-016-2159-1

Other articles of this Issue 11/2017

Soft Computing 11/2017 Go to the issue

Premium Partner