An efficient algorithm for mining high utility itemsets with negative item values in large databases

https://doi.org/10.1016/j.amc.2009.05.066Get rights and content

Abstract

Utility itemsets typically consist of items with different values such as utilities, and the aim of utility mining is to identify the itemsets with highest utilities. In the past studies on utility mining, the values of utility itemsets were considered as positive. In some applications, however, an itemset may be associated with negative item values. Hence, discovery of high utility itemsets with negative item values is important for mining interesting patterns like association rules. In this paper, we propose a novel method, namely HUINIV (High Utility Itemsets with Negative Item Values)-Mine, for efficiently and effectively mining high utility itemsets from large databases with consideration of negative item values. To the best of our knowledge, this is the first work that considers the concept of negative item values in utility mining. The novel contribution of HUINIV-Mine is that it can effectively identify high utility itemsets by generating fewer high transaction-weighted utilization itemsets such that the execution time can be reduced substantially in mining the high utility itemsets. In this way, the process of discovering all high utility itemsets with consideration of negative item values can be accomplished effectively with less requirements on memory space and CPU I/O. This meets the critical requirements of temporal and spatial efficiency for mining high utility itemsets with negative item values. Through experimental evaluation, it is shown that HUINIV-Mine outperforms other methods substantially by generating much less candidate itemsets under different experimental conditions.

Introduction

Mining of association rules in large databases is a well studied technique in the field of data mining with typical methods like Apriori [1], [2]. The problem surrounding association rules mining can be decomposed into two steps. The first step involves finding all frequent itemsets (or large itemsets) in a database. Once the frequent itemsets are found, generating association rules is straightforward and can be accomplished in linear time.

Most methods in finding frequent itemsets are designed for traditional databases. However, the frequency of an itemset may not be a sufficient indicator of significance, because frequency reflects only the number of transactions in the database that contain that itemset. It does not reveal the utility of an itemset, which can be measured in terms of cost, profit, or other expressions of user preference. On the other hand, frequent itemsets may only contribute a small portion of the overall profit, whereas non-frequent itemsets may contribute a large portion of the profit. In reality, a retail business may be most interested in identifying its most valuable customers (customers who contribute a major fraction of the profits to the company). Hence, frequency is not sufficient to answer questions such as whether an itemset is highly profitable, or whether an itemset has a strong impact.

Utility mining is thus useful in a wide range of practical applications and was recently studied in [7], [14], [20], [22]. However, a retail business may sale item with negative value. For example, many super markets may promote certain items to attract customers. In this scenario customers may buy specific items and then receive free goods. Free goods result in negative value for super markets. However, supermarkets may earn higher profits from other items that are cross-promoted with these free items. This practice is common. For example, if a customer bought three of item A, he would then receive one free item B as a promotion from the supermarket. Suppose the supermarket gets five dollars of profit from each unit of item A sold, and loses two dollars for each unit of item B given away. Although giving away a unit of item B results in a loss of two dollars for supermarkets, they could possibly earn 15 dollars from the three units of item A that are cross-promoted with item B. The supermarket thus may have a net gain of 13 dollars from this promotion. This example demonstrates why we propose the concept of mining for negative item values in utility mining. This also motivates our research in developing a new scheme for finding high utility itemsets with negative item values (HUINIV) from large databases.

Recently, a utility mining model has been defined in [22]. Utility is a measure of how “useful” (i.e. “profitable”) an itemset is. The definition of the utility of an itemset X, u(X), states that it is equal to the sum of the utilities of X of all the transactions containing X. The goal of utility mining is to identify high utility itemsets, which drive a large portion of the total utility. Traditional association rules of mining models assume that the utility of each item is always 1 and that the quantity of sales is either 0 or 1; thus it is only a special case of utility mining in which the utility or the quantity of sales of each item can be any number. If u(X) is greater than a specified utility threshold, X is a high utility itemset; otherwise, it is a low utility itemset. Table 1 is an example of utility mining in a transaction database. The number associated with each transaction in Table 1a is the sales volume of each item, and the utility of each item is listed in Table 1b. For example, u({A, D}) = (1 × 5 + 2 × 6) + (3 × 5+5 × 6) + (3 × 5 + 1 × 6) = 83. {A, D} is a high utility itemset if the utility threshold is set at 80.

However, a high utility itemset may consist of low utility items. Another possibility is to adopt the level-wise searching schema that exists in fast algorithms, such as Apriori [3]. This algorithm does not apply to the utility mining model. For example, u(A) = 55 < 80, A is a low utility item, but its superset {A, D} is a high utility itemset. If Apriori is used to find high utility itemsets, all combinations of all items must be generated. Moreover, in order to discover a long pattern, the number of candidates is prohibitively large. The cost in terms of either computation time or memory is intolerable, regardless of the method utilized. The challenge of utility mining is not only in restricting the size of the candidate set but also in simplifying the computation used to calculate its utility. Another challenge of utility mining is finding high utility itemsets with negative item values from large databases.

In this paper, we explore the issue of efficiently mining high utility itemsets with negative item values in large databases. We propose an algorithm named HUINIV (High Utility Itemsets with Negative Item Values)-Mine that can discover high utility itemsets with negative item values from large databases both efficiently and effectively. The underlying idea behind the HUINIV-Mine algorithm is based on the principle of the Two-Phase algorithm [14] and augments with negative item value for mining high utility itemsets efficiently. The novel contribution of HUINIV-Mine is that it can efficiently identify the utility of itemsets in large database so that the execution time for producing high utility itemsets with negative item values can be substantially reduced. That is, HUINIV-Mine can discover high utility itemsets with negative item values using limited memory and comparatively less computation time by the candidate itemsets filter method. In this way, the process of discovering all high utility itemsets in which all transactions are negative can be achieved effectively with limited memory, less candidate itemsets, and CPU I/O. This meets the critical requirements of time and spatial efficiency for mining large databases. Through experimental evaluation, HUINIV-Mine is shown to produce fewer candidate itemsets in the process of finding high utility itemsets with negative item values, so it outperforms other methods in terms of efficiency. We found that the average improvement of HUINIV-Mine compared to the MEU algorithm is about 99.2%. Moreover, it also achieves high scalability in dealing with large databases. To the best of our knowledge, this is the first work to propose a negative item concept in utility mining and the first work on mining high utility itemsets with negative item values from large database.

The rest of this paper is organized as follows: Section 2 provides an overview of related work. Section 3 describes the proposed approach, HUINIV-Mine, for finding the high utility itemsets with negative item values. In Section 4, we describe our experimental results for evaluating the proposed method. The conclusion of the paper is provided in Section 5.

Section snippets

Related work

Apriori [3], DHP [16], Partition [13], and Anti-Skew Algorithms [17] have been proposed to find frequent itemsets using association rules mining. Many important applications have established the need for incremental mining. This is due to the increasing use of record-based databases to which data are being added continuously. Many algorithms like FUP [8], FUP2 [9] and UWEP [4], [5] have been proposed to find frequent itemsets in incremental databases. The FUP algorithm updates the association

Proposed method: HUINIV-Mine

In this section, we present the HUINIV-Mine method. Section 3.1 describes the basic concept of HUINIV-Mine. Section 3.2 gives an example of mining temporal high utility itemsets. The procedure of the HUINIV-Mine algorithm is provided in Section 3.3.

Experimental evaluation

To evaluate the performance of HUINIV-Mine, we conducted experiments using synthetic datasets generated via a randomized dataset generator provided by IBM Quest [3]. However, the IBM Quest data generator only generates quantities of 0 or 1 for each item in a transaction. In order to fit databases into the scenario of utility mining, we randomly generate the quantity of each item in each transaction, ranging from 1 to 5; much like the model used in [14], [20]. Utility tables are also

Conclusions

In this paper, we addressed the problem of discovering high utility itemsets with negative item values in large databases, i.e., the itemsets containing negative item values that are larger than threshold in large databases. We propose a new approach, namely HUINIV-Mine, which can identify high utility itemsets with negative item values in large databases both efficiently and effectively. The novel contribution of HUINIV-Mine is that it can effectively identify high utility itemsets with

Acknowledgement

This research was supported by National Science Council, Taiwan, ROC under Grant No. NSC 97-2631-H-006-001 and NSC 96-2221-E-006-143-MY3.

References (22)

  • R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in Proceedings...
  • R. Agrawal et al.

    Fast discovery of association rules

  • R. Agrawal, R. Srikant, Mining sequential patterns, in: Proceedings of the 11th International Conference on Data...
  • N.F. Ayn, A.U. Tansel, E. Arun, An efficient algorithm to update large itemsets with early pruning, Technical Report...
  • N.F. Ayn, A.U. Tansel, E. Arun, An efficient algorithm to update large itemsets with early pruning, in: Proceedings of...
  • C. Bettini, X.S. Wang, S. Jajodia, Testing complex temporal relationships involving multiple granularities and its...
  • R. Chan, Q. Yang, Y. Shen, Mining high utility itemsets, in: Proceedings of IEEE ICDM, Florida,...
  • D. Cheung, J. Han, V. Ng, C.Y. Wong, Maintenance of discovered association rules in large databases: an incremental...
  • D. Cheung, S.D. Lee, B. Kao, A general incremental technique for updating discovered association rules, in: Proceedings...
  • Y. Chi, H. Wang, P.S. Yu, R. Richard, Muntz: moment: maintaining closed frequent itemsets over a stream sliding window,...
  • G. Das, K.I. Lin, H. Mannila, G. Renganathan, P. Smyth, Rule discovery from time series, in: Proceedings of the 4th ACM...
  • Cited by (77)

    • TKN: An efficient approach for discovering top-k high utility itemsets with positive or negative profits

      2022, Information Sciences
      Citation Excerpt :

      Yet, these algorithms can not handle items with negative utilities. In 2009, HUINIV-Mine [5] was proposed based on the Two-Phase paradigm to process items with negative unit profits. This method presented a modified twu pruning property to diminish the itemsets tree and boost the mining process.

    View all citing articles on Scopus
    View full text