Top

Knowledge and Information Systems

Published in:

02-09-2016 | Regular Paper

EFIM: a fast and memory efficient algorithm for high-utility itemset mining

Authors: Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, Vincent S. Tseng

Published in: Knowledge and Information Systems | Issue 2/2017

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In recent years, high-utility itemset mining has emerged as an important data mining task. However, it remains computationally expensive both in terms of runtime and memory consumption. It is thus an important challenge to design more efficient algorithms for this task. In this paper, we address this issue by proposing a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discover high-utility itemsets. EFIM relies on two new upper bounds named revised sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques named High-utility Database Projection and High-utility Transaction Merging (HTM), also performed in linear time. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster than the state-of-art algorithms \(\hbox {d}^2\)HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+ on dense datasets and performs quite well on sparse datasets. Moreover, a key advantage of EFIM is its low memory consumption.

previous article Measuring similarity of users with qualitative preferences for service selection

next article Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large databases, Morgan Kaufmann, Santiago de Chile, Chile, September 1994, pp 487–499

Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient tree structures for high-utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721CrossRef

Ahmed CF, Tanbeer SK, Jeong B (2010) Mining high utility web access sequences in dynamic web log data. In: Proceedings of the international conference on software engineering artificial intelligence networking and parallel/distributed computing, IEEE, London, UK, June 2010, pp 76–81

Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15:3389–3393MATH

Fournier-Viger P, Wu CW, Tseng VS (2014) Novel concise representations of high utility itemsets using generator patterns. In: Proceedings of the 10th international conference on advanced data mining and applications, Guilin, China, December 2014. Lecture Notes in Artificial Intelligence, vol 8933. Springer, Berlin, pp 30–43

Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. in: Proceedings of the 21st international symposium on methodologies for intelligent systems, Roskilde, Denmark, June 2014. Lecture Notes in Artificial Intelligence, vol 9384. Springer, Berlin, pp 83–92

Fournier-Viger P, Lin JCW, Duong QH, Dam TL (2016) PHM: mining periodic high-utility itemsets. In: Proceedings of the 16th industrial conference on data mining, New York, USA, July 2016. Lecture Notes in Artificial Intelligence, vol 9728, Springer, Berlin, pp 64–79

Fournier-Viger P, Zida S (2015) FOSHU: faster on-shelf high utility itemset mining with or without negative unit profit. In: Proceedings of the 30th symposium on applied computing, ACM, Salamanca, Spain, April 2015, pp 857–864

Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87MathSciNetCrossRef

10.

Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-Mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of the 2001 IEEE international conference on data mining, IEEE, San Jose, CA, November 2001, pp 441–448

11.

Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381CrossRef

12.

Lan GC, Hong TP, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107CrossRef

13.

Lin JCW, Hong TP, Lan GC, Wong JW, Lin WY (2015) Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases. Adv Eng Inform 29(1):16–27CrossRef

14.

Lin YC, Wu CW, Tseng VS (2015) Mining high utility itemsets in big data. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining, Ho Chi Minh City, Vietnam, May 2015, Lecture Notes in Artificial Intelligence, vol 9077. Springer, Berlin, pp 649–661

15.

Liu J, Wang K, Fung B (2012) Direct discovery of high utility itemsets without candidate generation. In: Proceedings of the 12th IEEE international conference on data mining, IEEE, Brussels, Belgium, December 2012, pp 984–989

16.

Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 22nd ACM international conference on information and knowledge management, ACM, Maui, HI, October 2012, pp 55–64

17.

Liu Y, Cheng C, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinform 14(230)

18.

Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining, Hanoi, Vietnam, May 2005, Lecture Notes in Artificial Intelligence, vol 3518. Springer, Berlin, pp 689–695

19.

Lu T, Liu Y, Wang L (2014) An algorithm of top-k high utility itemsets mining over data stream. J Softw 9(9):2342–2347CrossRef

20.

Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef

21.

Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl Based Syst 76:109–126CrossRef

22.

Rymon R (1992) Search through systematic set enumeration. In: Proceedings of the third international conference on principles of knowledge representation and reasoning, Morgan Kaufmann, Cambridge, MA, October 1992, pp 539–50

23.

Sahoo J, Das AK, Goswami A (2015) An efficient approach for mining association rules from high utility itemsets. Expert Syst Appl 42(13):5754–5778CrossRef

24.

Song W, Liu Y, Li J (2014) BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Proc Int J Data Wareh Min 10(1):1–15CrossRef

25.

Thilagu M, Nadarajan R (2012) Efficiently mining of effective web traversal patterns with average utility. In: Proceedings of the international conference on communication, computing, and security. CRC Press, Gurgaon, India, September 2016, pp 444–451

26.

Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786CrossRef

27.

Tseng VS, Wu CW, Fournier-Viger P, Yu P (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67CrossRef

28.

Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of the ICDM’04 Workshop on Frequent Itemset Mining Implementations, CEUR, Brighton, UK, November 2014

29.

Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 3rd SIAM international conference on data mining, SIAM, Lake Buena Vista, FL, USA, April 2004, pp 482–486

30.

Yin J, Zheng Z, Cao L, Song Y, Wei, W (2012) An efficient algorithm for mining high utility sequential patterns.In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, Beijing, China, August 2012, pp 660–668

31.

Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: Proceedings of the 13th international conference on data mining, IEEE, Dallas, TX, USA, December 2013, pp 1259–1264

32.

Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878CrossRef

33.

Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390CrossRef

34.

Zida S, Fournier-Viger P, Wu CW, Lin JCW, Tseng VS (2015) Efficient mining of high utility sequential rules. In: Proceedings of the 11th international conference on machine learning and data mining, Hamburg, Germany, July 2015, Lecture Notes in Artificial Intelligence vol 9166. Springer, Berlin, pp 1–15

35.

Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. In: Proceedings of the 14th Mexican international conference on artificial intelligence, Cuernavaca, Mexico, October 2015. Lecture Notes in Artificial Intelligence, vol 9413. Springer, Berlin, pp 530–546

Title: EFIM: a fast and memory efficient algorithm for high-utility itemset mining
Authors: Souleymane Zida
Philippe Fournier-Viger
Jerry Chun-Wei Lin
Cheng-Wei Wu
Vincent S. Tseng
Publication date: 02-09-2016
Publisher: Springer London
Published in: Knowledge and Information Systems / Issue 2/2017
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-016-0986-0

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 2/2017

A distributed approach for accelerating sparse matrix arithmetic operations for high-dimensional feature selection

Domain-agnostic discovery of similarities and concepts at scale

A survey of methods for time series change point detection

Constrained recommendations for query visualizations

Measuring similarity of users with qualitative preferences for service selection

Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining

Premium Partner