skip to main content
research-article
Public Access

Beyond Frequency: Utility Mining with Varied Item-specific Minimum Utility

Published:05 January 2021Publication History
Skip Abstract Section

Abstract

Consumer behavior plays a very important role in economics and targeted marketing. However, understanding economic consumer behavior is quite challenging, such as finding credible and reliable information on product profitability. Different from frequent pattern mining, utility-oriented mining integrates utility theory and data mining. Utility mining is a useful tool for understanding economic consumer behavior. Traditional algorithms for mining high-utility patterns (HUPs) applies a single/uniform minimum utility threshold (minutil) to obtain the set of HUPs, but in some real-life circumstances, some specific products may bring lower utilities compared with others, but their profit may offer some vital information. If minutil is set high, the patterns with low minutil are missed; if minutil is set low, the number of patterns becomes unmanageable. In this article, an efficient one-phase utility-oriented pattern mining algorithm, called HIMU, is proposed for mining HUPs with varied item-specific minimum utility. A novel tree structure called a multiple item utility set-enumeration tree (MIU-tree) and the global sorted and the conditional downward closure properties are introduced in HIMU. In addition, we extended the compact utility-list structure to keep the necessary information, and thus this one-phase HIMU model greatly reduces the computational costs and memory requirements. Moreover, two pruning strategies are then extended to enhance the performance. We conducted extensive experiments in several synthetic and real-world datasets; the results indicate that the designed one-phase HIMU algorithm can address the “rare item problem” and has better performance than the state-of-the-art algorithms in terms of runtime, memory usage, and scalability. Furthermore, the enhanced algorithms outperform the non-optimized HIMU approach.

References

  1. Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In ACM SIGMOD Record, Vol. 22. ACM, 207--216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering. IEEE, 3--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, Vol. 1215. 487--499.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rakesh Agrawal and Ramakrishnan. Quest synthetic data generator. 1994. Retrieved from http://www.Almaden.ibm.com/cs/quest/syndata.html.Google ScholarGoogle Scholar
  5. Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21, 12 (2009), 1708--1721.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Oznur Kirmemis Alkan and Pinar Karagoz. 2015. CROM and HuspExt: Improving efficiency of high utility sequential pattern extraction. IEEE Trans. Knowl. Data Eng. 27, 10 (2015), 2645--2657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Raymond Chan, Qiang Yang, and Yi-Dong Shen. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE, 19--26.Google ScholarGoogle ScholarCross RefCross Ref
  8. Ming-Syan Chen, Jiawei Han, and Philip S. Yu. 1996. Data mining: An overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8, 6 (1996), 866--883.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Microsoft. Example database foodmart of Microsoft analysis services. 2016. Retrieved from http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx.Google ScholarGoogle Scholar
  10. Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems. Springer, 83--92.Google ScholarGoogle Scholar
  11. Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, Shyue-Liang Wang, and Philip S. Yu. 2018. Privacy preserving utility mining: A survey. In Proceedings of the IEEE International Conference on Big Data. IEEE, 2617--2626.Google ScholarGoogle Scholar
  12. Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, and Philip S. Yu. 2019. Utility-driven mining of high utility episodes. In Proceedings of the IEEE International Conference on Big Data. IEEE, 2644--2653.Google ScholarGoogle Scholar
  13. Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, and Justin Zhan. 2017. Data mining in distributed environment: A survey. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 7, 6 (2017).Google ScholarGoogle ScholarCross RefCross Ref
  14. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Han-Chieh Chao. 2016. More efficient algorithms for mining high-utility itemsets with multiple minimum utility thresholds. In Proceedings of the International Conference on Database and Expert Systems Applications. Springer, 71--87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 8, 2 (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Vincent S. Tseng, and Philip S. Yu. 2019. A survey of utility-oriented pattern mining. IEEE Trans. Knowl. Data Eng. DOI:10.1109/TKDE.2019.2942594 (2019).Google ScholarGoogle Scholar
  17. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2020. HUOPM: High utility occupancy pattern mining. IEEE Trans. Cybern. 50, 3 (2020), 1195–1208. DOI:10.1109/TCYB.2019.2896267Google ScholarGoogle ScholarCross RefCross Ref
  18. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Justin Zhan. 2017. Mining of frequent patterns with multiple minimum supports. Eng. Applic. Artif. Intell. 60 (2017), 83--96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Han-Chieh Chao, Hamido Fujita, and Philip S. Yu. 2020. ProUM: Projection-based utility mining on sequence data. Inf. Sci. 513 (2020), 222--240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. Comput. Surv. 38, 3 (2006), 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining Knowl. Discov. 8, 1 (2004), 53--87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ya-Han Hu and Yen-Liang Chen. 2006. Mining association rules with multiple minimum supports: A new mining algorithm and a support tuning mechanism. Decis. Supp. Syst. 42, 1 (2006), 1--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ya-Han Hu, Fan Wu, and Yi-Jiun Liao. 2013. An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports. J. Syst. Softw. 86, 5 (2013), 1224--1238.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tony Cheng-Kui Huang. 2013. Discovery of fuzzy quantitative sequential patterns with multiple minimum supports and adjustable membership functions. Inf. Sci. 222 (2013), 126--146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 2012. Frequent itemset mining dataset repository. Retrieved from http://fimi.ua.ac.be/data/.Google ScholarGoogle Scholar
  26. R. Uday Kiran and P. Krishna Reddy. 2011. Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms. In Proceedings of the 14th International Conference on Extending Database Technology. ACM, 11--20.Google ScholarGoogle Scholar
  27. Srikumar Krishnamoorthy. 2015. Pruning strategies for mining high utility itemsets. Exp. Syst. Applic. 42, 5 (2015), 2371--2381.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Srikumar Krishnamoorthy. 2018. Efficient mining of high utility itemsets with multiple minimum utility thresholds. Eng. Applic. Artif. Intell. 69 (2018), 112--126.Google ScholarGoogle ScholarCross RefCross Ref
  29. Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2014. An efficient projection-based indexing approach for mining high utility itemsets. Knowl. Inf. Syst. 38, 1 (2014), 85--107.Google ScholarGoogle ScholarCross RefCross Ref
  30. Guo-Cheng Lan, Tzung-Pei Hong, Vincent S. Tseng, and Shyue-Liang Wang. 2014. Applying the maximum utility measure in high utility sequential pattern mining. Exp. Syst. Applic. 41, 11 (2014), 5071--5081.Google ScholarGoogle ScholarCross RefCross Ref
  31. Yeong-Chyi Lee, Tzung-Pei Hong, and Wen-Yang Lin. 2004. Mining fuzzy association rules with multiple minimum supports using maximum constraints. In Proceedings of the International Conference on Knowledge-based and Intelligent Information and Engineering Systems. Springer, 1283--1290.Google ScholarGoogle ScholarCross RefCross Ref
  32. Beibei Li, Anindya Ghose, and Panagiotis G. Ipeirotis. 2011. Towards a theory model for product search. In Proceedings of the 20th International Conference on World Wide Web. ACM, 327--336.Google ScholarGoogle Scholar
  33. Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu. 2011. An effective tree structure for mining high utility itemsets. Exp. Syst. Applic. 38, 6 (2011), 7419--7424.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chun-Wei Lin, Guo-Cheng Lan, and Tzung-Pei Hong. 2015. Mining high utility itemsets for transaction deletion in a dynamic database. Intell. Data Anal. 19, 1 (2015), 43--55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Wensheng Gan. 2016. FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits. Knowl.-based Syst. 111 (2016), 283--298.Google ScholarGoogle Scholar
  36. Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2016. Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl.-based Syst. 96 (2016), 171--187.Google ScholarGoogle Scholar
  37. Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2016. Fast algorithms for mining high-utility itemsets with various discount strategies. Adv. Eng. Inform. 30, 2 (2016), 109--126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Justin Zhan. 2016. Efficient mining of high-utility itemsets using multiple minimum utility thresholds. Knowl.-based Syst. 113 (2016), 100--115.Google ScholarGoogle Scholar
  39. Jerry Chun-Wei Lin, Wensheng Gan, Tzung-Pei Hong, and Vincent S. Tseng. 2015. Efficient algorithms for mining up-to-date high-utility patterns. Adv. Eng. Inform. 29, 3 (2015), 648--661.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jerry Chun-Wei Lin, Jiexiong Zhang, and Philippe Fournier-Viger. 2017. High-utility sequential pattern mining with multiple minimum utility thresholds. In Proceedings of the Asia-Pacific Web and Web-age Information Management Joint Conference on Web and Big Data. Springer, 215--229.Google ScholarGoogle Scholar
  41. Bing Liu. 2007. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer Science 8 Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Bing Liu, Wynne Hsu, and Yiming Ma. 1999. Mining association rules with multiple minimum supports. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 337--341.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Junqiang Liu, Ke Wang, and Benjamin C. M. Fung. 2012. Direct discovery of high utility itemsets without candidate generation. In Proceedings of the IEEE 12th International Conference on Data Mining. IEEE, 984--989.Google ScholarGoogle Scholar
  44. Mengchi Liu and Junfeng Qu. 2012. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 55--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ying Liu, Wei-keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 689--695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yu-Cheng Liu, Chun-Pei Cheng, and Vincent S. Tseng. 2011. Discovering relational-based association rules with multiple minimum supports on microarray datasets. Bioinformatics 27, 22 (2011), 3142--3148.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Thang Mai, Bay Vo, and Loan T. T. Nguyen. 2017. A lattice-based approach for mining high utility association rules. Inf. Sci. 399 (2017), 81--97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Marshall. 1926. In Principles of Economics (8th ed.). Macmillan and Co., London, UK.Google ScholarGoogle Scholar
  49. Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. 1998. Pruning closed itemset lattices for association rules. In Proceedings of the International Conference on Advanced Databases. 177--196.Google ScholarGoogle Scholar
  50. Jian Pei and Jiawei Han. 2002. Constrained frequent pattern mining: A pattern-growth view. ACM SIGKDD Explor. Newslett. 4, 1 (2002), 31--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jian Pei, Jiawei Han, and Laks V. S. Lakshmanan. 2001. Mining frequent itemsets with convertible constraints. In Proceedings of the 17th International Conference on Data Engineering. IEEE, 433--442.Google ScholarGoogle Scholar
  52. Heungmo Ryang, Unil Yun, and Keun Ho Ryu. 2014. Discovering high utility itemsets with multiple minimum supports. Intell. Data Anal. 18, 6 (2014), 1027--1047.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Ron Rymon. 1992. Search through systematic set enumeration. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning. 539--550.Google ScholarGoogle Scholar
  54. Wei Song, Zihan Zhang, and Jinhong Li. 2016. A high utility itemset mining algorithm based on subsume index. Knowl. Inf. Syst. 49, 1 (2016), 315--340.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Ramakrishnan Srikant and Rakesh Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the International Conference on Extending Database Technology. Springer, 1--17.Google ScholarGoogle ScholarCross RefCross Ref
  56. Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu. 2013. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25, 8 (2013), 1772--1786.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Vincent S. Tseng, Cheng-Wei Wu, Philippe Fournier-Viger, and Philip S. Yu. 2016. Efficient algorithms for mining top- high utility itemsets. IEEE Trans. Knowl. Data Eng. 28, 1 (2016), 54--67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu. 2010. UP-Growth: An efficient algorithm for high utility itemset mining. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 253--262.Google ScholarGoogle Scholar
  59. Bay Vo, Tzung-Pei Hong, and Bac Le. 2013. A lattice-based approach for mining most generalization association rules. Knowl.-based Syst. 45 (2013), 20--30.Google ScholarGoogle Scholar
  60. Bay Vo, Tuong Le, Tzung-Pei Hong, and Bac Le. 2015. Fast updated frequent-itemset lattice for transaction deletion. Data Knowl. Eng. 96 (2015), 78--89.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Jun-Zhe Wang and Jiun-Long Huang. 2018. On incremental high utility sequential pattern mining. ACM Trans. Intell. Syst. Technol. 9, 5 (2018), 55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Jun-Zhe Wang, Jiun-Long Huang, and Yi-Cheng Chen. 2016. On efficiently mining high utility sequential patterns. Knowl. Inf. Syst. 49, 2 (2016), 597--627.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Gary M. Weiss. 2004. Mining with rarity: A unifying framework. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 7--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Kung-Jiuan Yang, Tzung-Pei Hong, Guo-Cheng Lan, and Yuh-Min Chen. 2014. Efficient mining of partial periodic patterns with individual event support thresholds using minimum constraints. Int. J. Uncert. Fuzz. Knowl.-based Syst. 22, 6 (2014), 793--814.Google ScholarGoogle ScholarCross RefCross Ref
  65. Hong Yao, Howard J. Hamilton, and Cory J. Butz. 2004. A foundational approach to mining itemset utilities from databases. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 482--486.Google ScholarGoogle Scholar
  66. Junfu Yin, Zhigang Zheng, and Longbing Cao. 2012. USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 660--668.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Unil Yun, Heungmo Ryang, Gangin Lee, and Hamido Fujita. 2017. An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl.-based Syst. 124 (2017), 188--206.Google ScholarGoogle Scholar
  68. Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S. Tseng. 2015. EFIM: A highly efficient algorithm for high-utility itemset mining. In Proceedings of the Mexican International Conference on Artificial Intelligence. Springer, 530--546.Google ScholarGoogle Scholar

Index Terms

  1. Beyond Frequency: Utility Mining with Varied Item-specific Minimum Utility

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 21, Issue 1
        Visions Paper, Regular Papers, SI: Blockchain in E-Commerce, and SI: Human-Centered Security, Privacy, and Trust in the Internet of Things
        February 2021
        534 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/3441681
        • Editor:
        • Ling Liu
        Issue’s Table of Contents

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 January 2021
        • Accepted: 1 September 2020
        • Revised: 1 August 2020
        • Received: 1 October 2019
        Published in toit Volume 21, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format