Abstract
Consumer behavior plays a very important role in economics and targeted marketing. However, understanding economic consumer behavior is quite challenging, such as finding credible and reliable information on product profitability. Different from frequent pattern mining, utility-oriented mining integrates utility theory and data mining. Utility mining is a useful tool for understanding economic consumer behavior. Traditional algorithms for mining high-utility patterns (HUPs) applies a single/uniform minimum utility threshold (minutil) to obtain the set of HUPs, but in some real-life circumstances, some specific products may bring lower utilities compared with others, but their profit may offer some vital information. If minutil is set high, the patterns with low minutil are missed; if minutil is set low, the number of patterns becomes unmanageable. In this article, an efficient one-phase utility-oriented pattern mining algorithm, called HIMU, is proposed for mining HUPs with varied item-specific minimum utility. A novel tree structure called a multiple item utility set-enumeration tree (MIU-tree) and the global sorted and the conditional downward closure properties are introduced in HIMU. In addition, we extended the compact utility-list structure to keep the necessary information, and thus this one-phase HIMU model greatly reduces the computational costs and memory requirements. Moreover, two pruning strategies are then extended to enhance the performance. We conducted extensive experiments in several synthetic and real-world datasets; the results indicate that the designed one-phase HIMU algorithm can address the “rare item problem” and has better performance than the state-of-the-art algorithms in terms of runtime, memory usage, and scalability. Furthermore, the enhanced algorithms outperform the non-optimized HIMU approach.
- Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In ACM SIGMOD Record, Vol. 22. ACM, 207--216.Google ScholarDigital Library
- Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering. IEEE, 3--14.Google ScholarDigital Library
- Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, Vol. 1215. 487--499.Google ScholarDigital Library
- Rakesh Agrawal and Ramakrishnan. Quest synthetic data generator. 1994. Retrieved from http://www.Almaden.ibm.com/cs/quest/syndata.html.Google Scholar
- Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21, 12 (2009), 1708--1721.Google ScholarDigital Library
- Oznur Kirmemis Alkan and Pinar Karagoz. 2015. CROM and HuspExt: Improving efficiency of high utility sequential pattern extraction. IEEE Trans. Knowl. Data Eng. 27, 10 (2015), 2645--2657.Google ScholarDigital Library
- Raymond Chan, Qiang Yang, and Yi-Dong Shen. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE, 19--26.Google ScholarCross Ref
- Ming-Syan Chen, Jiawei Han, and Philip S. Yu. 1996. Data mining: An overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8, 6 (1996), 866--883.Google ScholarDigital Library
- Microsoft. Example database foodmart of Microsoft analysis services. 2016. Retrieved from http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx.Google Scholar
- Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems. Springer, 83--92.Google Scholar
- Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, Shyue-Liang Wang, and Philip S. Yu. 2018. Privacy preserving utility mining: A survey. In Proceedings of the IEEE International Conference on Big Data. IEEE, 2617--2626.Google Scholar
- Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, and Philip S. Yu. 2019. Utility-driven mining of high utility episodes. In Proceedings of the IEEE International Conference on Big Data. IEEE, 2644--2653.Google Scholar
- Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, and Justin Zhan. 2017. Data mining in distributed environment: A survey. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 7, 6 (2017).Google ScholarCross Ref
- Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Han-Chieh Chao. 2016. More efficient algorithms for mining high-utility itemsets with multiple minimum utility thresholds. In Proceedings of the International Conference on Database and Expert Systems Applications. Springer, 71--87.Google ScholarDigital Library
- Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 8, 2 (2018).Google ScholarDigital Library
- Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Vincent S. Tseng, and Philip S. Yu. 2019. A survey of utility-oriented pattern mining. IEEE Trans. Knowl. Data Eng. DOI:10.1109/TKDE.2019.2942594 (2019).Google Scholar
- Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2020. HUOPM: High utility occupancy pattern mining. IEEE Trans. Cybern. 50, 3 (2020), 1195–1208. DOI:10.1109/TCYB.2019.2896267Google ScholarCross Ref
- Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Justin Zhan. 2017. Mining of frequent patterns with multiple minimum supports. Eng. Applic. Artif. Intell. 60 (2017), 83--96.Google ScholarDigital Library
- Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Han-Chieh Chao, Hamido Fujita, and Philip S. Yu. 2020. ProUM: Projection-based utility mining on sequence data. Inf. Sci. 513 (2020), 222--240.Google ScholarDigital Library
- Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. Comput. Surv. 38, 3 (2006), 9.Google ScholarDigital Library
- Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining Knowl. Discov. 8, 1 (2004), 53--87.Google ScholarDigital Library
- Ya-Han Hu and Yen-Liang Chen. 2006. Mining association rules with multiple minimum supports: A new mining algorithm and a support tuning mechanism. Decis. Supp. Syst. 42, 1 (2006), 1--24.Google ScholarDigital Library
- Ya-Han Hu, Fan Wu, and Yi-Jiun Liao. 2013. An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports. J. Syst. Softw. 86, 5 (2013), 1224--1238.Google ScholarDigital Library
- Tony Cheng-Kui Huang. 2013. Discovery of fuzzy quantitative sequential patterns with multiple minimum supports and adjustable membership functions. Inf. Sci. 222 (2013), 126--146.Google ScholarDigital Library
- 2012. Frequent itemset mining dataset repository. Retrieved from http://fimi.ua.ac.be/data/.Google Scholar
- R. Uday Kiran and P. Krishna Reddy. 2011. Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms. In Proceedings of the 14th International Conference on Extending Database Technology. ACM, 11--20.Google Scholar
- Srikumar Krishnamoorthy. 2015. Pruning strategies for mining high utility itemsets. Exp. Syst. Applic. 42, 5 (2015), 2371--2381.Google ScholarDigital Library
- Srikumar Krishnamoorthy. 2018. Efficient mining of high utility itemsets with multiple minimum utility thresholds. Eng. Applic. Artif. Intell. 69 (2018), 112--126.Google ScholarCross Ref
- Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2014. An efficient projection-based indexing approach for mining high utility itemsets. Knowl. Inf. Syst. 38, 1 (2014), 85--107.Google ScholarCross Ref
- Guo-Cheng Lan, Tzung-Pei Hong, Vincent S. Tseng, and Shyue-Liang Wang. 2014. Applying the maximum utility measure in high utility sequential pattern mining. Exp. Syst. Applic. 41, 11 (2014), 5071--5081.Google ScholarCross Ref
- Yeong-Chyi Lee, Tzung-Pei Hong, and Wen-Yang Lin. 2004. Mining fuzzy association rules with multiple minimum supports using maximum constraints. In Proceedings of the International Conference on Knowledge-based and Intelligent Information and Engineering Systems. Springer, 1283--1290.Google ScholarCross Ref
- Beibei Li, Anindya Ghose, and Panagiotis G. Ipeirotis. 2011. Towards a theory model for product search. In Proceedings of the 20th International Conference on World Wide Web. ACM, 327--336.Google Scholar
- Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu. 2011. An effective tree structure for mining high utility itemsets. Exp. Syst. Applic. 38, 6 (2011), 7419--7424.Google ScholarDigital Library
- Chun-Wei Lin, Guo-Cheng Lan, and Tzung-Pei Hong. 2015. Mining high utility itemsets for transaction deletion in a dynamic database. Intell. Data Anal. 19, 1 (2015), 43--55.Google ScholarDigital Library
- Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Wensheng Gan. 2016. FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits. Knowl.-based Syst. 111 (2016), 283--298.Google Scholar
- Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2016. Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl.-based Syst. 96 (2016), 171--187.Google Scholar
- Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2016. Fast algorithms for mining high-utility itemsets with various discount strategies. Adv. Eng. Inform. 30, 2 (2016), 109--126.Google ScholarDigital Library
- Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Justin Zhan. 2016. Efficient mining of high-utility itemsets using multiple minimum utility thresholds. Knowl.-based Syst. 113 (2016), 100--115.Google Scholar
- Jerry Chun-Wei Lin, Wensheng Gan, Tzung-Pei Hong, and Vincent S. Tseng. 2015. Efficient algorithms for mining up-to-date high-utility patterns. Adv. Eng. Inform. 29, 3 (2015), 648--661.Google ScholarDigital Library
- Jerry Chun-Wei Lin, Jiexiong Zhang, and Philippe Fournier-Viger. 2017. High-utility sequential pattern mining with multiple minimum utility thresholds. In Proceedings of the Asia-Pacific Web and Web-age Information Management Joint Conference on Web and Big Data. Springer, 215--229.Google Scholar
- Bing Liu. 2007. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer Science 8 Business Media.Google ScholarDigital Library
- Bing Liu, Wynne Hsu, and Yiming Ma. 1999. Mining association rules with multiple minimum supports. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 337--341.Google ScholarDigital Library
- Junqiang Liu, Ke Wang, and Benjamin C. M. Fung. 2012. Direct discovery of high utility itemsets without candidate generation. In Proceedings of the IEEE 12th International Conference on Data Mining. IEEE, 984--989.Google Scholar
- Mengchi Liu and Junfeng Qu. 2012. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 55--64.Google ScholarDigital Library
- Ying Liu, Wei-keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 689--695.Google ScholarDigital Library
- Yu-Cheng Liu, Chun-Pei Cheng, and Vincent S. Tseng. 2011. Discovering relational-based association rules with multiple minimum supports on microarray datasets. Bioinformatics 27, 22 (2011), 3142--3148.Google ScholarDigital Library
- Thang Mai, Bay Vo, and Loan T. T. Nguyen. 2017. A lattice-based approach for mining high utility association rules. Inf. Sci. 399 (2017), 81--97.Google ScholarDigital Library
- A. Marshall. 1926. In Principles of Economics (8th ed.). Macmillan and Co., London, UK.Google Scholar
- Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. 1998. Pruning closed itemset lattices for association rules. In Proceedings of the International Conference on Advanced Databases. 177--196.Google Scholar
- Jian Pei and Jiawei Han. 2002. Constrained frequent pattern mining: A pattern-growth view. ACM SIGKDD Explor. Newslett. 4, 1 (2002), 31--39.Google ScholarDigital Library
- Jian Pei, Jiawei Han, and Laks V. S. Lakshmanan. 2001. Mining frequent itemsets with convertible constraints. In Proceedings of the 17th International Conference on Data Engineering. IEEE, 433--442.Google Scholar
- Heungmo Ryang, Unil Yun, and Keun Ho Ryu. 2014. Discovering high utility itemsets with multiple minimum supports. Intell. Data Anal. 18, 6 (2014), 1027--1047.Google ScholarDigital Library
- Ron Rymon. 1992. Search through systematic set enumeration. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning. 539--550.Google Scholar
- Wei Song, Zihan Zhang, and Jinhong Li. 2016. A high utility itemset mining algorithm based on subsume index. Knowl. Inf. Syst. 49, 1 (2016), 315--340.Google ScholarDigital Library
- Ramakrishnan Srikant and Rakesh Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the International Conference on Extending Database Technology. Springer, 1--17.Google ScholarCross Ref
- Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu. 2013. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25, 8 (2013), 1772--1786.Google ScholarDigital Library
- Vincent S. Tseng, Cheng-Wei Wu, Philippe Fournier-Viger, and Philip S. Yu. 2016. Efficient algorithms for mining top- high utility itemsets. IEEE Trans. Knowl. Data Eng. 28, 1 (2016), 54--67.Google ScholarDigital Library
- Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu. 2010. UP-Growth: An efficient algorithm for high utility itemset mining. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 253--262.Google Scholar
- Bay Vo, Tzung-Pei Hong, and Bac Le. 2013. A lattice-based approach for mining most generalization association rules. Knowl.-based Syst. 45 (2013), 20--30.Google Scholar
- Bay Vo, Tuong Le, Tzung-Pei Hong, and Bac Le. 2015. Fast updated frequent-itemset lattice for transaction deletion. Data Knowl. Eng. 96 (2015), 78--89.Google ScholarDigital Library
- Jun-Zhe Wang and Jiun-Long Huang. 2018. On incremental high utility sequential pattern mining. ACM Trans. Intell. Syst. Technol. 9, 5 (2018), 55.Google ScholarDigital Library
- Jun-Zhe Wang, Jiun-Long Huang, and Yi-Cheng Chen. 2016. On efficiently mining high utility sequential patterns. Knowl. Inf. Syst. 49, 2 (2016), 597--627.Google ScholarDigital Library
- Gary M. Weiss. 2004. Mining with rarity: A unifying framework. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 7--19.Google ScholarDigital Library
- Kung-Jiuan Yang, Tzung-Pei Hong, Guo-Cheng Lan, and Yuh-Min Chen. 2014. Efficient mining of partial periodic patterns with individual event support thresholds using minimum constraints. Int. J. Uncert. Fuzz. Knowl.-based Syst. 22, 6 (2014), 793--814.Google ScholarCross Ref
- Hong Yao, Howard J. Hamilton, and Cory J. Butz. 2004. A foundational approach to mining itemset utilities from databases. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 482--486.Google Scholar
- Junfu Yin, Zhigang Zheng, and Longbing Cao. 2012. USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 660--668.Google ScholarDigital Library
- Unil Yun, Heungmo Ryang, Gangin Lee, and Hamido Fujita. 2017. An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl.-based Syst. 124 (2017), 188--206.Google Scholar
- Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S. Tseng. 2015. EFIM: A highly efficient algorithm for high-utility itemset mining. In Proceedings of the Mexican International Conference on Artificial Intelligence. Springer, 530--546.Google Scholar
Index Terms
- Beyond Frequency: Utility Mining with Varied Item-specific Minimum Utility
Recommendations
Utility-Driven Mining of Trend Information for Intelligent System
Special Section on WITS 2018 and Regular ArticlesUseful knowledge, embedded in a database, is likely to change over time. Identifying the recent changes in temporal data can provide valuable up-to-date information to decision makers. Nevertheless, techniques for mining high-utility patterns (HUPs) ...
More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds
DEXA 2016: Proceedings, Part I, 27th International Conference on Database and Expert Systems Applications - Volume 9827Mining high-utility itemsets HUIs is a popular data mining task, which consists of discovering sets of items that yield a high profit in a transaction database. Although HUI mining has numerous applications, a key limitation is that a single minimum ...
FDHUP: Fast algorithm for mining discriminative high utility patterns
Recently, high utility pattern mining (HUPM) has been extensively studied. Many approaches for HUPM have been proposed in recent years, but most of them aim at mining HUPs without any consideration for their frequency. This has the major drawback that ...
Comments