Skip to main content
Top
Published in: Vietnam Journal of Computer Science 2/2016

Open Access 01-05-2016 | Regular Paper

Structure of frequent itemsets with extended double constraints

Authors: Truong Chi Tin, Duong Van Hai, Hoang Nguyen Thuy Ngan

Published in: Vietnam Journal of Computer Science | Issue 2/2016

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Frequent itemset discovering has been one essential task in data mining. In the worst case, the cardinality of the class of all frequent itemsets is of exponent which leads to many difficulties for users. Therefore, a model of constraint-based mining is necessary when their needs and interests are the top priority. This paper aims to find a structure of frequent itemsets that satisfy the following conditions: they include a subset \(C_{10}\), contain no items of a subset \(C^{\prime }_{11} \), and have at least an item belonging to subset \(C^{\prime }_{21}\). The first new point of the paper is the proposed theoretical result that is the generalization of our former researches (Hai et al. in Adv Comput Methods Knowl Eng Sci 479:367–378, 2013). Second, based on new sufficient and necessary conditions discovered just for closed itemsets and their generators in association with the methods of creating borders and eliminating branches and nodes on the lattice, we can effectively and quickly eliminate not only a class of frequent itemsets but also one or more branches of equivalence classes of which elements are insatiate the constraints. Third, a structure and a unique representation of frequent itemsets with extended double constraints are shown by representative closed itemsets and their generators. Finally, all theoretical results in this paper are proven to be reliable and they are firm bases to guarantee the correctness and efficiency of a new algorithm, MFS-EDC, which is used to effectively mine all constrained frequent itemsets. Experiments show the outstanding efficiency of this new algorithm compared to modified post-processing algorithms on benchmark datasets.

1 Introduction

One of the most basic tasks in Data Mining is to discover the groups of items, products, symptoms and so on, that appear together in the given datasets. For this work, mining frequent itemset, researched first by Agrawal et al. [1] in 1993, has become more and more important and many new algorithms or improvements have been proposed to solve the problem more efficiently, such as Eclat [36], FP-Growth [18], FP-Growth* [14], BitTable-FI [12] and Index-BitTableFI [29].
A main difficulty of frequent itemset mining is that the cardinality of the solution set in the worst case is of exponent (\(O(2^{m}\)), where \(m=\vert A\vert \) and A is a set of items appearing in transactions) that can lead to the quite high computational and memory costs of mining algorithms. In fact, users can only care about a small number of them satisfying some given constraints. A model of constraint-based mining has thus been developed [5, 24]. Constraints help to focus on interesting knowledge and to reduce the number of patterns extracted to those of potential interest. In addition, they are used for decreasing the search space and enhancing the mining efficiency. There are many different kinds of constraints such as knowledge-based constraints, data constraints, dimensional constraints, interestingness constraints and rule formation constraints [23]. In relation to the properties of constraints, two important types have been studied, namely anti-monotone constraints [24], denoted as \({{\mathcal C}}_{am}\), and monotone constraints [28], denoted as \({{\mathcal C}}_{m}\). An itemset satisfies a constraint \({\mathcal C}_{am}\) (or \({\mathcal C}_{m}\)) if its arbitrary subset (or superset) also satisfies the constraint. \({\mathcal C}_{am}\) is simple and suitable with Apriori-like algorithms, so it is often integrated into them to prune candidates. On the contrary, \({\mathcal C}_{m}\) is more complicated to exploit and less effective for pruning the search space.
Most previous approaches mine frequent itemsets with either \({\mathcal C}_{am}\) or \({\mathcal C}_{m}\). Mining frequent itemsets with both \({\mathcal C}_{am}\) and \({\mathcal C}_{m}\) is of interest because, in fact, to come closer with users’ true needs, quite many various kinds are used. This can be accomplished by first mining frequent itemsets that satisfy \({\mathcal C}_{am}\) using algorithms, such as Apriori [1, 22], Eclat [34], FP-growth [27], and then filtering the ones matching \({\mathcal C}_{m}\) in a post-processing step. This approach is inefficient because it often has to test a large number of itemsets. A more complicated solution is to integrate both \({\mathcal C}_{am}\) and \({\mathcal C}_{m}\) into the algorithm to find all frequent itemsets satisfying them. However, authors in [20] showed that the integration of \({\mathcal C}_{m}\) can lead to a reduction in the pruning of \({\mathcal C}_{am}\) since their properties are opposite. Therefore, many authors have found difficulties when facing to a quite complicated conjunction of \({\mathcal C}_{am}\) and \({\mathcal C}_{m}\). An impressed approach is to combine between constraint properties and the condensed representation of frequent itemsets, such as maximal ones [11, 21], closed ones or generators [7, 8, 26, 32, 33]. Instead of mining all frequent itemsets, only a small number of the condensed ones are extracted. Condensed representation has three primary advantages. First, it is easier to store because the number of condensed ones is much smaller than the size of the class of all frequent ones, especially on dense datasets. Second, we exploit it only once even when the constraints are changed. And last, the condensed representation can be used to generate all frequent ones and this generation can be performed without any access to the original dataset. In [9], the authors proposed a generic algorithm to exploit frequent itemset with both \({\mathcal C}_{am}\) and \({\mathcal C}_{m}\) using the minimal itemsets (like generators). They claimed that there is a tradeoff in using two of these kinds of constraints concurrently, and thus it is sometimes better to use a ‘generate and test’ strategy. In [10], Bucila et al. pushed both \({\mathcal C}_{am}\) or \({\mathcal C}_{m}\) into algorithm DualMiner and used the concept of positive border as a condensed representation. Unfortunately, it has to scan the dataset many times as well as perform a huge number of useless tests on long itemsets, especially when the minimum support is low. An Apriori-like algorithm, called ExAMiner [6], uses both of these constraints to reduce not only the input data but also the search space. However, its main difficulty is to be executed again whenever the constraints are changed. Thus, the system is hard to immediately return solution sets to users.
In this paper, we are interested in a problem that includes a conjunction of \({\mathcal C}_{am}\) and \({\mathcal C}_{m}\), and each comprises of different specific constraints. Then, using closed itemsets and generators as condensed representations, we propose a new model to deal with the problem presented.

1.1 Problem statement

Before formally describing the current problem, let us present practical examples that motivate us to study and propose the new results in this paper. Let us consider searching documents on the Internet where the finding needs of users are very diverse. The datasets of information regarding documents are usually saved into the tables. Each row in a table can contain keywords, appeared in a document, the author names of the document, the type of document (Article, Book, Sort Survey, and so on) and the research area. It is common that online users usually take interest in looking for documents that comprise of a given set of keywords. Expected documents also have to be related to a specific topic A (including one or more keywords belonging to A), but they are not involved in other topic B (not having any keywords belonging to B). A specific example of this problem is as follows. An online user wants to look for research results from websites or search engines such as CiteSeerX, Springer and ScienceDirect. His/her need is to find the results related to keywords in the set \(C^{\prime }_{21} =\) {‘sequential patterns’, ‘frequent sequence’, ‘web usage mining’, ‘sequential rules’}, but they are not of authors in the list \(C^{\prime }_{11}\) = {‘Peter’, ‘Chan’, ‘Carmona’, ‘Matthews’}. In addition, the found results are also ‘Article’ and belong to the area of ‘Computer Science’. Specifically, the purpose of this user is to find articles in the field of computer science (the results have to contain both words in the set \(C_{10}=\) {‘Article’, ‘Computer Science’}) that comprise of at least a keyword of \(C^{\prime }_{21} \) and contain no authors in the list \(C^{\prime }_{11} \). It is clear that the need above is practical and the interest of many researchers nowadays. Other example for the current problem is when we want to build a filter to allow children searching interesting movies on the internet. Then, a compulsory key to obtain desired results is one in the set \(C_{10}\) = {movie}. Here, we only allow them to watch movies that belong to kinds in the list, \(C^{\prime }_{21}=\) {‘Animated’, ‘Cartoons’, ‘Comedy’, ‘ Fiction’, or ‘Documentary’}, but they are forbidden to see types of movies in the list \(C^{\prime }_{11}=\) {‘Action’, ‘Horror’, ‘Violated’, ‘Secxual’, ‘Thriller’ and ‘Porn’}. In other words, the aim is to allow children to find all enjoyable and good videos (that are pulling in large audience) that are ‘movie’ and follow one or more kinds in \(C^{\prime }_{21}\) but are not any types in \(C^{\prime }_{11}\). In fact, it is able to see more different significant examples.
A formal statement of the problem in our current research is presented as follows.
Formal problem statement Let T be a dataset, \({\mathcal A}\) be the set of all attributes or items in T. An itemset A is a non-empty subset of \({\mathcal A}\) and a transaction in T is a set of items \(t \in 2^{\mathcal A}\). The support of A, denoted as supp(A), is the number of transactions that are superset of A. Given two threshold values, \(s_0\) and \(s_1\), and three subsets, \(C_{10}\), \(C_{11}\) and \(C_{21}\), of \({\mathcal A}\) such that \(0<s_0 \leqslant s_1 \leqslant 1\), \(C_{10} \subseteq C_{11} \not \subseteq C_{21}\). A is called frequent if \(supp(A)\in [s_0 , s_1]\). The task is to find the class \({\mathcal F}{\mathcal S}_{{C_{10}\subseteq C_{11},\nsubseteq C_{21}}} (s_0 ,s_1 )\) of all the frequent itemsets that (1) includes a subset \(C_{10}\), (2) contains no items of a subset \(C^{\prime }_{11} = {\mathcal A} \backslash C_{11}\) and (3) have at least an item belonging to subset \(C^{\prime }_{21}= {\mathcal A} \backslash C_{21}\). In our second example above, \(C_{10}=\) movies, \(C^{\prime }_{11}=L_{1}\) and \(C^{\prime }_{21} =L_{2}\). In other words, the goal is to discover all elements A of \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11},\nsubseteq C_{21}} (s_0 ,s_1)\) that can be stated formally as below:
$$\begin{aligned}&{\mathcal F}{\mathcal S}_{\varvec{C}_{\varvec{10}} \subseteq \varvec{C}_{\varvec{11}} ,\nsubseteq \varvec{C}_{\varvec{21}}} (s_0 ,s_1 ) \\&\quad \buildrel \mathrm{def} \over = {\{}L^{\prime }\in {\mathcal F}{\mathcal S}: C_{10} \subseteq L^{\prime } \subseteq C_{11} \quad \text {and} \quad L^{\prime } \not \subseteq C_{21} {\}}. \end{aligned}$$
Note that the current problem is an extension of many of our formerly considered problems. If \(s_1 = 1\), \({\mathcal F}{\mathcal S}\) is the class of all frequent itemsets in the traditional meaning. When \(s_1<\) 1, we desire to consider frequent itemsets with supports that are not too high because they sometimes are valuable. For instance, they can help discover association rules with high confidences from abnormal phenomena appearing in frequent itemsets of which frequency is not necessary to be quite high (such as new, unusual rules for both positive and negative aspects in the field of network security or for finding out the falsehood on the figure of socioeconomic field). In addition, when the constraints are given special values, we obtain frequent itemsets without any constraint or with single constraints in simple forms, \({\mathcal F}{\mathcal S}(s_0)\) and \({\mathcal F}{\mathcal S}_{C_{11}} (s_0)\) in [2] or \({\mathcal F}{\mathcal S}_{\not \subseteq {C_{21}}} (s_0)\) in [3] or \({\mathcal F}{\mathcal S}_{\supseteq C_{10}}(s_0)\) in [16], or with double constraints presented in \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11}} ({s_0 ,s_1})\) [17]. So, we can find that the extended double constraint presented in this paper is more general than that, \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11}} ({s_0 ,s_1})\), shown in [17], by extending a new kind of constraint set, \(C_{21}\), which is an arbitrary subset of \({\mathcal A}\). Indeed, when we assign \(C_{21}\) = \(\varnothing \), we immediately obtain the problem \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11}}({s_0 ,s_1})\). The extension of the new constraint \(C_{21}\) or \(C^{\prime }_{21} \) has multiple practical meanings as shown in the examples above. A quite naïve thought for solving the problem \({\mathcal F}{\mathcal S}_{\varvec{C}_{\varvec{10}} \subseteq \varvec{C}_{\varvec{11}} ,\nsubseteq \varvec{C}_{\varvec{21}}} (s_0 ,s_1)\) is to filter from the results of \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11}} ({s_0 ,s_1})\) those that satisfy the constraint \(C_{21}\) in a post-processing step. However, this will do so many useless tests, and as a result, it will take much mining time. Thus, using the lattice of closed itemsets and their generators which is also used in [2, 3, 1517], we study and propose a new method to effectively mine frequent itemsets with above constraints, called extended double constraints (EDC). EDC can be categorized into two kinds, \({\mathcal C}_{m}(supp(L^{\prime }) \le s_1,C_{10} \subseteq L^{\prime }\) and \(L^{\prime }\not \subseteq C_{21})\) and \({\mathcal C}_{am}(supp(L^{\prime })\ge s_0\) and \(L^{\prime } \subseteq C_{11})\). Below are my contributions for the method to effectively discover frequent itemsets with EDC.

1.2 Contributions

The contributions of this paper are as follows. First, the result of the paper is the generalization of our former problems which are to find frequent itemsets without constraints or with simpler constraints. Particularly, it is an extension of the problem \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11}} ({s_0,s_1})\) [17] of which the result has been published in a good international journal by further considering a significant constraint, \(C_{21}\) or \(C^{\prime }_{21}\).
Second, we showed sufficient and necessary conditions for the non-emptiness of the solution set. The conditions allow us to turn checking the constraints on the very large number of frequent itemsets into testing them on representative closed itemsets of equivalence classes with quite small amount. Our lattice-based approach becomes more effective when the conditions are combined with the techniques of creating upper and under borders to quickly eliminate branches or nodes on the lattice. It also has a high sustainability in face of regular changes of the constraints following user’s need. Third, we show a structure and a unique representation of frequent itemsets with EDC that help us test the conditions on the quite small number and size of generators. This representation also allows us to integrate the constraints into the process of generating constrained frequent itemsets without checks in a post-processing step. Finally, in practice, based on these theoretical results, we propose a new algorithm, MFS-EDC, to completely and distinctly exploit all frequent itemsets with EDC. The advantages of MFS-EDC are that: it quickly discovers all frequent itemsets that satisfy opposite constraints, \({\varvec{\mathcal C}}_{\varvec{am}}\) and \({\varvec{\mathcal C}}_{\varvec{m}}\), concurrently by pushing the constraints into MFS-EDC without direct checks on them (post-processing or naïve approaches can do this by directly testing the output results of \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11}} ({s_0,s_1})\) on \(C_{21}\) or the ones of \({\mathcal F}{\mathcal S}(s_0)\) on all constraints of EDC); it is easy to be turned into parallel algorithms to obtain real time in mining process; it only needs to access the original dataset once, even if the constraints are changed regularly. This considerably enhances mining performance.
The rest of this paper is organized as follows. Some preliminary concepts related to the problem are reviewed in Sect. 2. Approaches to deal with the current problem are also considered in this section. Section 3 presents a rough partition and then a stricter partition of the solution set. In Sect. 4, we propose a structure of the solution set based on a necessary and sufficient condition of closed itemsets and their generators for the emptiness of \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11}\nsubseteq , C_{21}}(s_0,s_1)\), and a unique representation of frequent itemsets with extended double constraint in each equivalence class based on closed itemsets and their generators. We also propose an efficient algorithm MFS_EDC to exploit all frequent ones with extended double constraint. Experimental results will be discussed in Sect. 5. The conclusions and future work is presented in Sect. 6. Finally, for easier to read, the proof of the theoretical results in this study is moved to “Appendix”.

2 Preliminary concepts and some approaches to the problem

2.1 Preliminary concepts

For a binary dataset according to discovered data context \({\mathcal T}\buildrel \mathrm{def} \over = ({\mathcal O},{\mathcal A},{\mathcal R})\), where \({\mathcal O}\) is a non-empty set of transactions, \({\mathcal A}\) is the set of all items appearing in those transactions and \({\mathcal R}\) is a binary relation on \({\mathcal O}x{\mathcal A}\). A set of items is called an itemset. Consider two Galois connection operators \(\lambda : 2^{\mathcal O} \rightarrow 2^{\mathcal A}\) and \(\rho :2^{\mathcal A} \rightarrow 2^{\mathcal O}\) defined as follows: \(\forall O, A: \varnothing {\,\,\ne O}\subseteq {\mathcal O}, \varnothing \ne A \subseteq {\mathcal A}\), \(\lambda (O)\buildrel \mathrm{def} \over = \{a\in {\mathcal A}{\vert (o, a)}\in {\mathcal R}, \forall o\in O\}, \rho (A) \buildrel \mathrm{def} \over = \{o\in {\mathcal O}{\vert (o, a)\in \,\, }{\mathcal R}, \forall a\in A \}\) and, as convention, \(\lambda (\varnothing ) \buildrel \mathrm{def} \over = {\mathcal A}, \rho (\varnothing ) \buildrel \mathrm{def} \over = {\mathcal O}\). We denote h(A) \(\buildrel \mathrm{def} \over = \lambda (\rho (\mathrm{A}))\) as the closure of A (h is called the closure operation in \(2^{\mathcal A})\). An itemset A is called closed itemset iff1 \(h(A) = A\) [25]. The support of \(A\subseteq {\mathcal A}\), denoted as supp(A), is the ratio of cardinality \({\vert }\rho {(A)}\vert \) to \({\vert } {\mathcal O} {\vert }\), i.e. \(supp({A}) \buildrel \mathrm{def} \over = \vert \rho {(A)}\vert / \vert {\mathcal O} {\vert }\). The minimum and maximum support thresholds are denoted as \(s_0\) and \(s_1\), respectively, where \(0< 1/n \le s_0 \le s_1 \le 1\) and \({n} \buildrel \mathrm{def} \over = \vert {\mathcal O} {\vert }\). We only consider non-trivial items in \({\mathcal A}^{\mathcal F} \,\,{\buildrel \mathrm{def} \over = }\,\, {{\{}a} \in {\mathcal A}{\vert supp({\{}a{\}}) } \ge s_0 {{\}}}\). Let \({\mathcal C}{\mathcal S}\) be the class of all closed itemsets together with their supports. With normal order relation “\(\subseteq \)” over subsets of \({\mathcal A}\), \({\mathcal L}{\mathcal C} \buildrel \mathrm{def} \over =({\mathcal C}{\mathcal S},\subseteq \)) is the lattice of all closed itemsets organized by Hass diagram. A non-empty itemset A (subset of \({\mathcal A}^{\mathcal F})\) is called frequent iff \(s_0 \le \) supp(\(A) \le s_1 \). Note that if \(s_{1}\) is equal to 1, then the traditional frequent itemset concept is obtained. Briefly, \({\mathcal F} {\mathcal S} \,\,{\buildrel \mathrm{def} \over =}\,\,{\mathcal F}{\mathcal S}(s_0 ,s_1 )\,\buildrel \mathrm{def} \over = \{L^{\prime }: \varnothing \ne L^{\prime }\subseteq {\mathcal A},s_0 {\le supp(L^{\prime }) \le }s_1{{\}}}\) denotes the class of all frequent itemsets and \({\mathcal F}{\mathcal C}{\mathcal S} \buildrel \mathrm{def} \over = {\mathcal F}{\mathcal C}{\mathcal S}(s_0,s_1)\,\,\buildrel \mathrm{def} \over ={\mathcal F}{\mathcal S}(s_0 ,s_1 )\cap {\mathcal C}{\mathcal S}\) denotes the class of all frequent closed itemsets. For any two non-empty sets \(G, A: \varnothing \ne G \subseteq A \subseteq {\mathcal A}, G\) is called a generator [25] of A iff \(h(G) = h(A)\) and \((h(G^{\prime }) \subset {h(G), \forall G^{\prime }:} \varnothing \ne G^{\prime }\subset G)\). Let \({\mathcal G}(A)\) be the class of all generators of A. Since \({\mathcal G}(A)\) is non-empty and finite [4], \({\vert } {\mathcal G}(A)\vert = k\), all generators of A are indexed: \({\mathcal G}(A) {= {\{}} A_1 , A_2 ,\ldots , A_k {{\}}}\). Let \({\mathcal L}{\mathcal C}{\mathcal G} {\buildrel \mathrm{def} \over = {\{} \langle L, supp(L), }{\mathcal G}(L){\rangle \vert }{L} \in {\mathcal L}{\mathcal C} {{\}}}\) be the lattice \({\mathcal L}{\mathcal C}\) of closed itemsets together with their generators and \({\mathcal L}{\mathcal F}{\mathcal C}{\mathcal G}(s_0 ,s_1 ) {\buildrel \mathrm{def} \over = {\{} \langle L, supp(L), }{\mathcal G}(L)\rangle \in {\mathcal L}{\mathcal C}{\mathcal G} {\vert L } \in {\mathcal F}{\mathcal C}{\mathcal S}(s_0 ,s_1 ){{\}}}\) be the lattice of frequent ones and the generators.
To present an effective approach for the current problem, based on the closure operator h, we need an equivalence relation on the class of itemsets to partition the solution set into disjoint equivalence sub-classes.
Definition 1
([31], Equivalence relation \(\sim _{\varvec{\mathcal A}}\) over \({\varvec{\mathcal {F}}{\varvec{\mathcal S}}}{({{\varvec{s}}_{{\varvec{0}}},{\varvec{s}}_{\varvec{1}}})}\)) Consider the following binary relation \(\sim _{\mathcal A} \) on \({\mathcal F}{\mathcal S}({s_0,s_1})\), \(\forall {A, B\in }\,\, {\mathcal F}{\mathcal S}({s_0,s_1})\):
$$\begin{aligned} A \sim _{\mathcal A} B \Leftrightarrow h(A) = h(B). \end{aligned}$$
Obviously, \(\sim _{\mathcal A} \) is an equivalence relation. For each \(A\in {\mathcal F}{\mathcal S}({s_0,s_1})\), we denote \([A]\buildrel \mathrm{def} \over = \{B\in {\mathcal F}{\mathcal S}( {s_0,s_1}): h(B)=h(A)\}\) as the equivalence class of all frequent itemsets having the same closure h(A) and for each \({L} \in {\mathcal F}{\mathcal C}{\mathcal S}({s_0,s_1})\), we have \({[L]}:= \{L^{\prime } \subseteq L: L^{\prime } \ne \varnothing , h(L^{\prime })=L\}\). Using this relation, we divide \({\mathcal F}{\mathcal S}({s_0,s_1})\) into the disjoint equivalence classes. We have the following proposition.
Proposition 1
([31], A partition of \({\varvec{\mathcal F}}{\varvec{\mathcal S}}{({{\varvec{s}}_{\varvec{0}} ,{\varvec{s}}_{\varvec{1}}})})\)
$$\begin{aligned} {\mathcal F}{\mathcal S}({s_0,s_1})=\sum \nolimits _{L\in {\mathcal F}{\mathcal C}{\mathcal S}({s_0,s_1})} [L]. \end{aligned}$$
For each \({L} \in {\mathcal F}{\mathcal C}{\mathcal S}({s_0 ,s_1})\), each equivalence class [L] contains frequent itemsets having the same closure L, \(\rho (L)\) and especially, the same support as supp(L). Moreover, this partition allows us to decrease the storage of the support of itemsets in each class, the production of duplicate candidates and the independent exploitation of each class by effective parallel algorithms in distributed environment. There are effective algorithms in the literature to mine the lattice \({\mathcal L}{\mathcal C}{\mathcal G}\) such as CHARM-L [35] and MinimalGenerators [34], Touch [30], GENCLOSE [4] and CHARM-L and GDP [19].

2.1.1 Some approaches to the problem

Two post-processing approaches For the first algorithm, MFS-PP-EDC1, we first find the class of all frequent itemsets \({A} \subseteq C_{11} \), \({A} \in {\mathcal F}{\mathcal S}_{\subseteq {C_{11}}}(s_0)\), by one of the well-known algorithms such as dEclat or FPGrowth with the consideration of only items belonging to \(C_{11} \). Then, the remaining constraints, \({supp(A)} \le s_1 , C_{10} {\subseteq A \ \mathrm{and} \ A \not \subseteq \,\,}C_{21}\), are checked to generate frequent itemsets satisfying EDC. For the second one, MFS-PP-EDC1, we additionally integrate an anti-monotonic constraint, A \(\not \subseteq C_{21}\), into one of the above algorithms to obtain \({\mathcal F}{\mathcal S}_{\subseteq {C_{11}},\nsubseteq C_{21}} (s_0) = \{A\in {\mathcal F}{\mathcal S}(s_0)\vert A \subseteq C_{11} \ \mathrm{and} \ A \not \subseteq C_{21}\}\) before testing the remaining monotonic constraints, supp \((A) \le s_1\), \(C_{10} {\subseteq A}\). Note that, in fact, we often use two dualistic constraints, \({A} \subseteq C_{11}\) and \({A} \not \subseteq C_{21}\) in the form of \({A} \cap C^{\prime }_{11} =\varnothing \) and \({A} \cap C^{\prime }_{21} {\ne }\,\,\varnothing \), where \(C^{\prime }_{11} ={\mathcal A} {\backslash }C_{11}\) and \(C^{\prime }_{21} ={\mathcal A}{\backslash }C_{21} \) with the quite small sizes of \(C^{\prime }_{11}\) and \(C^{\prime }_{21}\).
The drawbacks of two these approaches are taking a lot of time for mining \({\mathcal F}{\mathcal S}_{\subseteq {C_{11}}}(s_0)\) or \({\mathcal F}{\mathcal S}_{\subseteq {C_{11}, \nsubseteq C_{21}}} (s_0)\) again when EDC is changed, and for direct check the remaining constraints on the large number of generated frequent itemsets. If we keep all frequent itemsets in the memory with \(s_0 =1/\vert {\mathcal O}{\vert }\), then that may need enormous storage, especially when \(n= \vert {\mathcal O} {\vert }\) is quite large.
The approach of the paper Based on the partition (1) in the Proposition 2 (which divides the solution set into disjoint equivalence solution sub-classes), we first mine only once the lattice \({\mathcal L}{\mathcal C}{\mathcal G}\) containing closed itemsets and their generators from \({\mathcal T}\). Second, when constraints are changed, we quickly determine from \({\mathcal L}{\mathcal C}{\mathcal G}\) the class \({\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}}(s_0,s_1)\) of all closed frequent itemsets and generators with EDC. Note that, with this partition, instead of checking the constraints on the so large number of frequent itemsets, we just need to do that on the quite small amount of closed frequent itemsets belonging to \({\mathcal L}{\mathcal C}{\mathcal G}\). In this step, based on monotone or anti-monotone properties of constraints, the parent–child relations in the lattice are used to quickly find the supersets or the subsets of a closed itemset. That helps to significantly reduce the search space when determining the elements of \({\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11},\nsubseteq C_{21}}(s_0,s_1)\). Moreover, another outstanding advantage of the partition is to allow us to design parallel algorithms for concurrently, independently mining each the sub-class. Finally, in each equivalence sub-class [L] with \({L} \in {\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}}(s_0,s_1)\), we completely, quickly and distinctly generate all frequent itemsets with EDC which are represented uniquely through L and its generators \({\mathcal G}(L)\)—the essential information of the class [L]. From the theoretical results demonstrated to be reliable, we propose MFS-EDC, an efficient algorithm for mining frequent itemsets that satisfy EDC.
In next section, we first present an ineffective rough partition of the solution set, and then, based on above approach, we show a better strict partition for it.

3 Partitioning solution set by the equivalence relation

For each \({L} \in {\mathcal F}{\mathcal C}{\mathcal S}(s_0,s_1), A \subseteq B \subseteq {\mathcal A}, C \subseteq {\mathcal A}, L_B \buildrel \mathrm{def} \over = L\cap B,\) we denote: \({\mathcal F}{\mathcal S}_{A\subseteq L_B } \buildrel \mathrm{def} \over = \{L^{\prime }\in [L]\,\vert \,A \subseteq L^{\prime } \subseteq L_B \} = \{L^{\prime }\ne \varnothing \, \vert \,A \subseteq L^{\prime } \subseteq L_B , h(L^{\prime }) = L\}\) and \({\mathcal F}{\mathcal S}_{A\subseteq L_B ,\not \subseteq C} {\buildrel \mathrm{def} \over = {\{}L^{\prime } \in }{\mathcal F}{\mathcal S}_{A \subseteq L_B } {\,\vert \,L^{\prime } \not \subseteq C{\}}}\).

3.1 The rough partition of the solution set \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{{\varvec{C}}_{\varvec{10}} \subseteq {\varvec{C}}_{\varvec{11}},\not \subseteq {\varvec{C}}_{\varvec{21}},} ({\varvec{s}}_{\varvec{0}},{\varvec{s}}_{\varvec{1}})\)

From the partition in Proposition 1, we immediately obtain the following rough partition for \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11},\nsubseteq C_{21}}(s_0,s_1)\).
Proposition 2
(A rough partition of \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{{\varvec{C}}_{\varvec{10}} \subseteq {\varvec{C}}_{\varvec{11}} ,\nsubseteq {\varvec{C}}_{\varvec{21}},}{({\varvec{s}}_{\varvec{0}}, {\varvec{s}}_{\varvec{1}})})\)
$$\begin{aligned}&{\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\not \subseteq C_{21},} (s_0 ,s_1 )\nonumber \\&\quad =\sum \nolimits _{L\in {\mathcal F}{\mathcal C}{\mathcal S}(s_0 ,s_1 )} {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\not \subseteq C_{21} } . \end{aligned}$$
(1)
From this partition, we can independently exploit all frequent itemsets with EDC in each equivalence class \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} },\nsubseteq C_{21}}\).
Example 1
(Illustrating the disadvantage of the above rough partition) The rest of this paper considers dataset \({\mathcal T}\) shown in Fig. 1a. Charm-L [35] and MinimalGenerators [34] are used to mine a lattice of all frequent itemsets and their generators.
The results are shown in Fig. 1b. Let us consider the constraints on supports \(s_0=2/7,s_1=1\). For briefness, we denote itemset \(\{a_1, a_2, \ldots , a_k\}\) as \(a_1 a_2 \ldots a_k\), for example, \({{\{}a, f, h{\}}}\) as afh. It is possible that there are many values of constraints and closed itemsets \(L \in {\mathcal F}{\mathcal C}{\mathcal S}(s_0,s_1)\) such that \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}},\nsubseteq C_{21},} = \varnothing \), even \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21},} (s_0,s_1) = \varnothing \). Indeed, we consider examples as follows:
1.
(i)
For \(C_{10} = a, C_{11} = adfh\) and \(C_{21} = adfh\), when using the algorithm of post-processing approach based on the partition in (1), with \({L = afh} \in {\mathcal F}{\mathcal C}{\mathcal S}(s_0 ,s_1 )\), \(supp(L) = 3/7\), we have \([L] = {\{}f, fa, fh, fha, h, ha{\}}\), but \(L_{C_{11} } = afh \subseteq C_{21} \), so \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21},} = \varnothing \). Moreover, after generating \(\vert {\mathcal F}{\mathcal S}(s_0,s_1)\vert =35\) itemsets and then directly testing constraints on them corresponding to all different closed frequent (CF) itemsets L in \({\mathcal F}{\mathcal C}{\mathcal S}(s_0,s_1)\), there is no itemset \(L^{\prime }\) in any of 9 CF classes [L] satisfying the constraints, so \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq {C_{11} } ,\nsubseteq C_{21}} =\varnothing \) and \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}} ,\nsubseteq C_{21}} (s_0,s_1) = \varnothing \).
 
(ii)
We obtain the similar result \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11},\nsubseteq C_{21}} (s_0,s_1) = \varnothing \) for other constraints, \(C_{10} = d, C_{11} = adf\) and \(C_{21}= adf\).
 
2.
With \(C_{10} = a, C_{11} = adfh\) and \(C_{21} = ah\), for all \({L} \in {\mathcal F}{\mathcal C}{\mathcal S}(s_0 ,s_1)\), we only have one class \([L = afh]\) satisfying the constraints, \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11} } ,\not \subseteq C_{21}} = \{ahf, af\} \ne \varnothing \), since \(supp(L) = 3/7, [L] = {\{}f, fa, fh, fha, h, ha{\}}\), i.e. \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}} (s_0 ,s_1 )={\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21}} {= {\{}ahf, af{\}}} \ne \varnothing \).
 
 
Then, the post-processing approach takes quite much time to generate all frequent itemsets in equivalence sub-classes [L], corresponding to closed frequent itemsets L, and then so many or even all sub-classes are eliminated because they do not satisfy the constraints. From this example, we find it is important to have sufficient and necessary conditions and then impose them on the constraints and closed itemsets L to narrow \({\mathcal F}{\mathcal C}{\mathcal S}({s}_0 ,{s}_1 )\) into \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq {C_{11}} ,\nsubseteq C_{21}} (s_0,s_1)\). Here, \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq {C_{11}} ,\nsubseteq C_{21}} (s_0 ,s_1 )\) includes closed itemsets which are not only frequent but they also satisfy constraints regarding sub-items \({{\{}}C_{10} ,C_{11} ,C_{21} {{\}}}\) so that corresponding solution subsets are not empty, i.e. \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11}} ,\nsubseteq C_{21}} \ne \,\,\varnothing .\)

3.2 Strict partition of solution set \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{\varvec{C_{10}} \subseteq {\varvec{C_{11}}, \not \subseteq {\varvec{C_{21}}}}}(s_0,s_1)\)

To briefly present the remaining results, we consider the following lemma which will be used to prove Propositions 3 and 4 in the next section.
Lemma 1
Let \({A, B, L^{\prime }, L} \subseteq {\mathcal A} : L \ne \varnothing , B \ne \varnothing ,\)
(a)
(i)
\(L^{\prime } {\,\,\ne \,\,} \varnothing \Leftrightarrow {\mathcal G}(L^{\prime }){\,\,\ne \,\, }\varnothing \). If \(\varnothing \subset L^{\prime } \subseteq L\) and \(h(L^{\prime })= h(L),\) then \({\mathcal G}(L^{\prime }){\subseteq }{\mathcal G}(L).\)
 
(ii)
If \({\mathcal G}_B (L) \buildrel \mathrm{def} \over = \{L_i \in {\mathcal G}(L) \vert L_i \subseteq B\} \ne \,\, \varnothing ,\) then \({\mathcal G}_B (L) = {\mathcal G}(L_B).\)
 
(iii)
Assume that A \(\subseteq B\), \(\forall { L} \in {\mathcal F}{\mathcal C}{\mathcal S}_{A\subseteq B} (s_0, s_1)\buildrel \mathrm{def} \over = \{L\in FCS(s_0, s_1){\vert L \supseteq A,}{\mathcal G}_B(L)\ne \varnothing \}\). Then, \(\forall {L^{\prime }}\in {\mathcal F}{\mathcal S}_{ {\mathbf A}\subseteq {\mathbf L}_ {\mathbf B}} , \forall L_i {\in }{\mathcal G}(L),\) then \({\mathcal G}(L^{\prime }){\,\,\subseteq \,\,}{\mathcal G}_B (L)\) and \((h L_i)=h(L_B)=L.\)
 
(iv)
\(\forall U, U \in 2^{\mathcal A}: \varnothing \subset U \subseteq V\), we have \({U} \ \cap \ Minimal(V)\subseteq Minimal(U)\), i.e. if \(\exists M\in U \ \cap Minimal(V)\), then \({M} \in Minimal(U)\), where the set Minimal(U) consists of all minimal subsets of elements of U according to the normal order relation “\(\subseteq \)” on subsets.
 
(b) Let \(\mathrm{Cond}(L_i)\) be a logic condition expression related to \(L_i.\) Consider \(L^{\prime } \subseteq {\mathcal A},L^{\prime } \ne \varnothing ,L^{\prime } \supseteq A, h(L^{\prime })=L: \exists L_i \in \,\, {\mathcal G}(L^{\prime })\) and assume that Cond\((L_i)\) is true. Let \(U \buildrel \mathrm{def} \over = \{K_k \mathop =\limits ^\mathrm{def} L_k \backslash A \vert L_k \in {\mathcal G}(L^{\prime }), \mathrm{Cond}(L_k)\}, V \buildrel \mathrm{def} \over = \{K_k \mathop =\limits ^{\mathrm{def}} L_k \backslash A \vert L_k \in {\mathcal G}(L), \mathrm{Cond}(L_k)\}\)-be finite sets. Then, \(\varnothing \subset U \subseteq V, \varnothing \subset \) Minimal(U) \(\subseteq Minimal(V)\) and we can always assume that i is the minimum index such that \(K_i \in Minimal(U)\). Moreover, \(L^{\prime } \supseteq A + K_i\) and \(K_i \in Minimal(V)\).
(c) Let \(A \subseteq L_B \buildrel \mathrm{def} \over = L\cap B \ne \,\, \varnothing , K_\mathrm{min}\) be a non-empty class of subsets in \(L_B(\varnothing \subset K_i \subseteq L_B , \forall K_i \in K_\mathrm{min})\), \(K_U^i \buildrel \mathrm{def} \over =\) \(\left\{ {{\begin{array}{ll} {\mathop \bigcup \nolimits _{K_k \in K_\mathrm{min} :k\le i} K_k ,\mathrm{if} \ i\ge 1}\\ {\varnothing , if i=0}\\ \end{array}}} \right. , K_{U,i}\buildrel \mathrm{def} \over = K_U^{i-1} \backslash K_i\) and \(K_{-,i} \buildrel \mathrm{def} \over = L_B \backslash (A+K_U^{i})\), \(\forall i \ge 1\). If \(L^{\prime } = A+K_i +K_i^{\prime } +K_i^{\sim },\) where \(K_i \in K_\mathrm{min}, K_i^{\prime } \subseteq K_{U,i}, K_i^{\sim } \subseteq K_{-,i},\) then \(A \subseteq L^{\prime } \subseteq L_B\). In addition, if \(K_i {\buildrel \mathrm{def} \over =} L_i {\backslash A}\) with \(L_i {\in }{\mathcal G}_B (L),\) then \(h(L^{\prime })=h(L_B)= L\) and \({L^{\prime }} \in {\mathcal F}{\mathcal S}_{A \subseteq L_B }.\)
Note that in general case, the reverse of the a.(iv) above is not true. However, in the present special cases, if we add some corresponding conditions, then the reverse assertion in Lemma 1b is also true.
We denote \(C^+ \ {\buildrel \mathrm{def} \over = } \ C_{11} {\,\,\cap \,\,}C_{21} , C_- \ {\buildrel \mathrm{def} \over =} \ C_{10} {\,\,\cap \,\,}C_{21} , C^*\ {\buildrel \mathrm{def} \over = } C_{11} {\backslash } C_{21} \) and \({\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}} (s_0 ,s_1 )\buildrel \mathrm{def} \over = \{L\in {\mathcal F}{\mathcal C}{\mathcal S}(s_0 ,s_1 ){\vert }C_{10} {\subseteq }L_{C_{11} } {\not \subseteq }C_{21} , {\mathcal G}_{C_{11} } (L){\,\,\ne \,\,}\varnothing \}\). It is obvious that \({\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}} (s_0,s_1) \subseteq {\mathcal F}{\mathcal C}{\mathcal S}({s}_0, {s}_1 )\), \(C_- \subseteq \,\, C^+\) and for every \(L^{\prime } \in {\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}} (s_0 ,s_1) \,\,\ne \,\, \varnothing ,\) then \(\varnothing \subset L^{\prime } \backslash C_{21} \subseteq L_{C_{11}} \backslash C_{21} \subseteq C_{11} \backslash C_{21} \subseteq C_{11},\) \(supp(C_{11}) \leqslant supp(L^{\prime }) \leqslant s_1\), supp\((C_{10}) \geqslant \) supp(\(L^{\prime }\)) \(\geqslant s_0.\) Thus, from now on, we always assume that the following hypothesis (\(H_1\)) is satisfied:
$$\begin{aligned}&0 < s_0 \le s_1 \le 1, \quad C_{10} \subseteq C_{11} \not \subseteq C_{21}, \\&\qquad supp(C_{11}) \le s_1 , \quad supp(C_{10}) \ge s_0 .\qquad \qquad (\mathrm{H}_1) \end{aligned}$$
Proposition 3
(Necessary and sufficient conditions of \(L \in {\varvec{\mathcal F}}{\varvec{\mathcal C}}{\varvec{\mathcal S}}{({\varvec{s}}_{\varvec{0}} ,{\varvec{s}}_{\varvec{1}})}\) for the emptiness of \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{{\varvec{C}}_{\varvec{10}} \subseteq {\varvec{C}}_{\varvec{11}} ,\nsubseteq {\varvec{C}}_{\varvec{21,}}}{({\varvec{s}}_{\varvec{0}} ,{\varvec{s}}_{\varvec{1}})}\) and \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{{\varvec{C}}_{\varvec{10}} \subseteq L_{{\varvec{C}}_{\varvec{11}} } ,\nsubseteq {\varvec{C}}_{\varvec{21}} } \), a better partition of \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{{\varvec{C}}_{\varvec{10}} \subseteq {\varvec{C}}_{{\mathbf 11}} ,\nsubseteq {\varvec{C}}_{{\mathbf 21},} } {({\varvec{s}}_{\varvec{0}} ,{\varvec{s}}_{\varvec{1}})})\) Assume that the above hypothesis \((\mathrm{H}_1)\) is satisfied. Then:
(a)
\({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}} (s_0,s_1) \ne \varnothing \Leftrightarrow {\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}} (s_0,s_1) \ne \varnothing \).
 
(b)
We obtain a better partition of \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}} (s_0 ,s_1 )\) as follows:
$$\begin{aligned}&{\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}} (s_0 ,s_1)\nonumber \\&\quad =\sum \nolimits _{\text{ L }\in {\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}} (s_0 ,s_1 )} {\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} }. \end{aligned}$$
(2)
 
The following algorithm, MFCS-EDC, is to extract \({\varvec{\mathcal F}}{\varvec{\mathcal C}}{\mathcal S}_{{\varvec{C}}_{\varvec{10}} \subseteq {\varvec{C}}_{\varvec{11}} ,\nsubseteq {\varvec{C}}_{\varvec{21}} } ({\varvec{s}}_{\varvec{0}} ,{\varvec{s}}_{\varvec{1}})\) (the set of constrained frequent closed itemsets which is the output of the algorithm) from the input data which is the lattice \({\varvec{\mathcal L}}{\varvec{\mathcal C}}{\varvec{\mathcal G}}\) of all closed itemsets and their generators together with user constraints, \({\varvec{s}}_{\varvec{0}} ,{\varvec{s}}_{\varvec{1}}, {\varvec{C}}_{\varvec{10}}, {\varvec{C}}_{\varvec{11}}\) and \({\varvec{C}}_{\varvec{21}}\) (Fig. 2).
Remark 1
Methods to eliminate nodes and branches of nodes as well as to create monotonic upper borders and anti-monotonic under borders on the lattice \({\varvec{\mathcal L}}{\varvec{\mathcal C}}{\varvec{\mathcal G}}\) to quickly determine \({\varvec{\mathcal F}}{\varvec{\mathcal C}}{\varvec{\mathcal S}}_{{\varvec{C}}_{\varvec{10}} \subseteq {\varvec{C}}_{\varvec{11}} ,\nsubseteq {\varvec{C}}_{\varvec{21}} } {({\varvec{s}}_{\varvec{0}} ,{\varvec{s}}_{\varvec{1}})}\).
When implementing the algorithm MFCS-EDC, to quickly determine \({\mathcal F}{\mathcal C}{\mathcal S}_{{C}_{10} \subseteq {C}_{11} ,\nsubseteq {C}_{21}}({s}_\mathrm{0} ,{s}_\mathrm{1})\) from \({\mathcal L}{\mathcal C}{\mathcal G}\), we can use the methods of eliminating nodes and branches of nodes, or creating upper and under borders while travelling the lattice. We split the constraints in \({\mathcal F}{\mathcal C}{\mathcal S}_{{C}_{10} \subseteq {C}_{11} ,\nsubseteq {C}_{21}} ({s}_\mathrm{0} ,{s}_\mathrm{1})\) into the groups: monotonic \({\mathcal C}_{m} {\buildrel \mathrm{def} \over = (supp(L)}\leqslant s_1 \ and \ C_{10}{\subseteq \,\, L and\,\, L\,\,\cap \,\,}C_{11} {\not \subseteq }C_{21} )\), anti-monotonic \({\mathcal C}_{am} {\buildrel \mathrm{\mathrm{def}} \over = (supp(L)} \geqslant s_0 )\) and \({\mathcal C}_\mathrm{non} {\buildrel \mathrm{def} \over = (}{\mathcal G}_{C_{11}} (L)\ne \varnothing )\) (that is neither monotonic nor anti-monotonic. An itemset satisfies a constraint \({\mathcal C}_\mathrm{non}\) if it does not satisfy both \({\mathcal C}_{am}\) and \({\mathcal C}_{m})\).
To use the advantages of the properties of \({\mathcal C}_{m}\), when travelling in the bottom-up direction of the lattice, standing at one node L, if:
(i)
The constraint \({\mathcal C}_{m}\) is not satisfied, then we immediately wipe the branch with the root node L out of the lattice (i.e. we do not need to consider all of nodes in this branch since we know for certain that \({\mathcal C}_{m}\) will not meet its requirement on them), and go to other branch.
 
(ii)
\({\mathcal C}_{m}\) is satisfied and \({\mathcal C}_\mathrm{non}\) is not, then only L is eliminated from the lattice.
 
(iii)
Both \({\mathcal C}_{m}\) and \({\mathcal C}_\mathrm{non}\) are satisfied, then we still keep L on the lattice.
 
(iv)
The conditions of \({\mathcal C}_{am}\) or \({\mathcal C}_\mathrm{non}\) are not met, then L is cut out of the lattice.
 
(v)
Both \({\mathcal C}_{am}\) and \({\mathcal C}_\mathrm{non}\) are satisfied, then we put L on the list of anti-monotonic under borders. So, it does not need to check \({\mathcal C}_{am}\) on all of its predecessor nodes.
 
Similarly, we use five steps above when going in the top-down direction of the lattice to consider \({\mathcal C}_{am}\).
Note that we often pre-select one of two groups, \({\mathcal C}_{am}\) and \({\mathcal C}_{m}\), which is much more likely to be not satisfied, such as the group with more constraints in the form of AND than other. We also choose the bottom-up or the top-down first so that eliminating branches is done before borders are created. For the considering problem, it is more suitable to select \({\mathcal C}_{am}\) and go in the bottom-up direction of the lattice first.
Example 2
Illustrating the good effect of the sufficient and necessary conditions so that \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{{{\varvec{C}}_{{\varvec{10}}}} \subseteq {{\varvec{C}}_{\varvec{11}} ,\nsubseteq {{\varvec{C}}_{\varvec{21}}}}}{({\varvec{s}}_{\varvec{0}} ,{\varvec{s}}_{\varvec{1}})}=\varnothing \) or \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{{{\varvec{C}}_{\varvec{10}}} \subseteq \varvec{L}_{\varvec{C}_{11}} ,\nsubseteq {{\varvec{C}}_{21}}}=\varnothing \) when \(L\in {\varvec{\mathcal F}}{\varvec{\mathcal C}}{\varvec{\mathcal S}}({\varvec{s}}_{\varvec{0}} ,{\varvec{s}}_{\varvec{1}} ) \backslash {\varvec{\mathcal F}}{\varvec{\mathcal C}}{\varvec{\mathcal S}}_{{\varvec{C}}_{\varvec{10}} \subseteq {\varvec{C}}_{\varvec{11}} ,\nsubseteq {\varvec{C}}_{\varvec{21}} } ({\varvec{s}}_{\varvec{0}} ,{\varvec{s}}_{\varvec{1}})\) and the effectiveness of the methods of eliminating branches and nodes, and creating borders.
1.
In Example 1.1(i), we just need to find one of the conditions in (\(H_1\)) which is not met: \(adfh = C_{11} \not \subseteq C_{21} = adfh\), then we immediately result \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}} (s_0,s_1) = \varnothing \) and \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}},\nsubseteq C_{21}} = \varnothing , \forall L \in {\mathcal F}{\mathcal C}{\mathcal S}({s}_0, {s}_1)\). In Example 1.1(ii), since other condition in (\(H_1\)), \(1/7 = supp(C_{10}) \ge s_0 = 2/7\), is not satisfied, we lead to the similar conclusion.
 
2.
Consider Example 1.2, the groups of constraints \({\mathcal C}_{am}\) and \({\mathcal C}_m\) for each \(L\in {\mathcal L}{\mathcal C}{\mathcal G},\) respectively, are \(\{supp(L) \geqslant 2/7\}\) and \(\{\mathrm{supp}(L) \leqslant 1, a\subseteq L, L\cap adfh \not \subseteq ah\}\). First, we go in the bottom-up direction of \({\mathcal L}{\mathcal C}{\mathcal G}\) in Fig. 1b. When using the properties of monotonic constraints, the branches started at bceg and aceg (circled in red by dashed lines) all are eliminated since the constraint \({\mathcal C}_m : a \not \subseteq bceg\) and \(a = aceg \cap adfh \subseteq ah\) are violated (case (i)). For \(L = acfh\), the constraint \({\mathcal C}_\mathrm{non}: {\mathcal G}_{C_{11} } (L)\,\,=\,\, \{L_i {\in \{cf, ch\}\vert }L_i {\,\,\subseteq \,\,}C_{11} = adfh\} = \varnothing \), is not satisfied (case (ii)), so \(L\notin {\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11}, \not \subseteq C_{21},}(s_0,s_1)\) and we cut L out of the lattice (circled by dotted, dashed lines in red); then, we only need to consider two remaining nodes on the lattice, afh and adfh. The node adfh (circled by dotted, dashed lines in blue) continues to be eliminated since \({\mathcal C}_{am}\): supp(adfh) \(=\) 1/7 \(<\) 2/7 is violated (case (iv)). Second, in \({\mathcal L}{\mathcal C}{\mathcal G}\) remains only one node, afh, and we travel in both the bottom-up and top-down directions started at \(L = \underline{afh}\) (circled by red solid lines). After that, L is put into the lists of under and upper borders since it satisfies all constraints \({\mathcal C}_m : C_{10}= a \subseteq L_{C_{11}} = L = afh \not \subseteq C_{21} = ah, {\mathcal C}_{am} {: supp(L) = 3/7 }\geqslant 2/7\) and \({\mathcal C}_\mathrm{non} : G_{C_{11} } (L) = G(L){= {\{}f, h{\}}} \ne \varnothing \). Finally, we have \({\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11}, \not \subseteq C_{21},} (s_0, s_1)= \{L = afh\}\ne \varnothing , [L] = \{f, fa, fh, fha, h, ha\}\), so \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21} ,} (s_0,s_1) = {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} }= \{ahf, af\} \ne \varnothing \).
 
It is able to be found from this example that, for post-processing approach, we have to take a lot of time to generate all \(\vert {\mathcal F}{\mathcal S}(s_0,s_1){\vert =35}\) frequent itemsets and then check them on the constraints about sub-items. But, we only obtain two of them, ahf and af, satisfying the constraints. Meanwhile, based on the condition \({L}\in {\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11}, \not \subseteq C_{21},}(s_0,s_1)\) and the methods of eliminating branches and nodes as well as creating borders on the lattice, we quickly wiped out eight of nine equivalence sub-classes (corresponding with 29 of 35 frequent itemsets), which did not meet the requirements of the constraints, and only need to check one node.
Note that, in the final sub-class \([L=afh],\) after generating six frequent itemsets, we have to check them on the constraints in the post-processing step which can still consume time a lot. In next section, we will show the way to partition \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}},\not \subseteq C_{21}}\) into two disjoint solution subsets based on dividing the generators in \({\mathcal G}_{C_{11} }(L)\) into two parts according to \(C_{21}\).

4 Structure of the solution set \({\mathcal F}{\mathcal S}_{{\mathbf{C}}_{\varvec{1}0} \subseteq {\mathbf{C}}_{\varvec{1}1} ,\nsubseteq {\mathbf{C}}_{\varvec{2}1}} (\mathrm{\mathbf{s}}_0,{\mathbf{s}}_1)\) and algorithm MFS-EDC

4.1 Partition and explicit structure of each equivalence class \({\mathcal F}{\mathcal S}_{{C_{10} \subseteq L_{C_{11}},\nsubseteq C_{21}}}\)

Going on the idea of partition, for \(\forall L \,\,\in {\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21}}(s_0,s_1)\), we divide each equivalence class \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11}} ,\nsubseteq C_{21}} {\buildrel \mathrm{def} \over = {\{}L^{\prime }\in [L] \vert }C_{10} {\subseteq L^{\prime } \subseteq }L_{C_{11}}, L^{\prime } \not \subseteq C_{21}{{\}}}\) into two disjoint parts based on the partition of generators in \({\mathcal G}_{C_{11} } (L)\) as follows, \({\mathcal G}_{C_{11} } (L)={\mathcal G}_{C^+} (L)+{\mathcal G}_{C_{11} ,\nsubseteq C_{21} } (L)\), where \({\mathcal G}_{C^+} (L) \buildrel \mathrm{def} \over = {\{}L_i \in {\mathcal G}(L): L_i \subseteq C^+{\}} = {\{}L_i \in {\mathcal G}_{C_{11} } (L): L_i \subseteq C_{21} {\}}, {\mathcal G}_{C_{11} ,\nsubseteq C_{21}} (L) \buildrel \mathrm{def} \over = {\{}L_i \in {\mathcal G}_{C_{11}}(L): L_i \not \subseteq C_{21} {\}}\).
We first number all \(K_i \in K_\mathrm{min,C_{-}\subseteq L_{C^+} } {\buildrel \mathrm{def} \over = Minimal{\{}}K_i \mathop =\limits ^{\text{ def }} L_i {\backslash C{\_},} \ L_i \,\, {\in } \,\, {\mathcal G}_{C^+} (L), 1\le i\le {M{\}}} \ \mathrm{from} \ 1\, \mathrm{to}\, M\, \mathrm{if} \ {\mathcal G}_{C^+} (L) \,\, {\ne } \,\, \varnothing \) or set M \(=\) 0, if \({\mathcal G}_{C^{+}}(L) = \varnothing \). Then, we continue to number, from \(M+1\), subsets in \(K_\mathrm{min, C_{10} \subseteq L_{C_{11} } ,\mathrm{G} \not \subseteq {C}_{21} } \buildrel \mathrm{def} \over =\) Minimal \({\{}K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash C_{10}\), \(L_i \in {\mathcal G}_{C_{11} ,\nsubseteq C_{21}} (L), \forall i>M{\}}\) and denote \(K_\mathrm{min,C_{10} \subseteq L_{C_{11}}, \not \subseteq {C}_{21} }^+\buildrel \mathrm{def} \over = K_\mathrm{min,C\subseteq L_{C^+}}\cup K_\mathrm{min ,C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21}}\).
Note that if \({\mathcal G}_{C^+} (L)=\varnothing \) (like when \(C^+ = \varnothing \)), then we set
$$\begin{aligned} {\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+} ,+L_{C^*} }^*\buildrel \mathrm{def} \over = \lfloor {L}\rfloor _{C_- \subseteq L_{C^+} ,+L_{C^*}} \buildrel \mathrm{def} \over = \varnothing . \end{aligned}$$
Otherwise, if \({\mathcal G}_{C^+} (L) \ne \varnothing \), we define: \(\lfloor {L}\rfloor _{C_-\subseteq L_{C^{+}} ,+L_{C^*}} {\buildrel \mathrm{def} \over =} \{L^{\prime }\in {\mathcal F}{\mathcal S}_{{C}_{10} L_{C_{11}} ,\nsubseteq C_{21} } {\vert \exists }L_i \,\, {\in } \,\, {\mathcal G}_{C^+} (L): L^{\prime } \supseteq L_i \} \ \mathrm{and}\ 2^{\left[ {C_{10} \backslash C_{21} \subseteq L_{C^*} } \right] *}{\buildrel \mathrm{def} \over =} \{ L^{\sim }{\ne } \,\, \varnothing {\vert }C_{10} \backslash C_{21} {\subseteq L}^{\sim }{\subseteq } L_{C^*} \}, K_{U,C_{-}\subseteq L_{C^+} }^i {\buildrel \mathrm{def} \over = }\mathop \bigcup \nolimits _{K_k \in K_\mathrm{min,C_{-}\subseteq L_{C^+} } ,k\le i} K_k , K_{U,C_{-}\subseteq L_{C^+} ,i} {\buildrel \mathrm{def} \over = } \left\{ {{\begin{array}{ll} {K_{U,C_{-}\subseteq L_{C^+} }^{i-1} \backslash K_i,\quad \mathrm{if} \,\, i \ge 2}\\ {\varnothing ,\quad \mathrm{if} \,\, i=1}\\ \end{array}}}\right. , K_{-,C_{-}\subseteq L_{C^+} ,i} {\buildrel \mathrm{def} \over = }L_{C^+} \backslash (C\_+K_{U,C_{-}\subseteq L_{C^+} }^i) = L_{C^{+}} \backslash (C_{10} \cup K_{U,C_{-}\subseteq L_{C^+} }^i)\) and \({\mathcal F}{\mathcal S}_{C_{-}\subseteq L_{C^+} }^*{ \buildrel \mathrm{def} \over =} \{L^{\prime \prime } \, {\buildrel \mathrm{def} \over = } \, C\_+K_i +K_i^{\prime } +K_i^{\sim } {\vert }K_i \in K_\mathrm{min,C_{-}\subseteq L_{C^+} } , K_i^{\prime } \subseteq K_{U,C_{-}\subseteq L_{C^+} ,i} , K_i^{\sim } \subseteq K_{-,C_{-}\subseteq L_{C^+} ,i}\) and \((K_k {\not \subseteq }K_i +K_i^{\prime },\forall K_k \in K_\mathrm{min,C_{-}\subseteq L_{C^+} } : 1\le k<i)^{(*)},1\leqslant i\leqslant M \}.\)
$$\begin{aligned}&{\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+} ,+ L_{C^*}}^{*} \buildrel \mathrm{def} \over = \bigg \{ L'\buildrel \mathrm{def} \over = L''+L^{\sim } {\vert L''\in } \,\, {\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^{+}}}^*,\nonumber \\&\quad L^{\sim } {\in }2^{\left[ {C_{10} \backslash C_{21} \subseteq L_{C^*} } \right] *}\bigg \}. \end{aligned}$$
(3a)
And if \(\,\, {\mathcal G}_{C_{11} ,\nsubseteq C_{21}} (L)=\varnothing \) then we assign \({\mathcal F}{\mathcal S}_{C_{10}\subseteq L_{C_{11}}, C_{20}\not \subseteq , G \nsubseteq C_{21} }^*\equiv \lfloor {L}\rfloor _{C_{10}\subseteq L_{C_{11}}, C_{20} \not \subseteq , G \nsubseteq C_{21}} \buildrel \mathrm{def} \over = \varnothing \). On the contrary, we denote \(\lfloor {L}\rfloor _{C_{10}\subseteq L_{C_{11}}, G \not \subseteq C_{21} } \buildrel \mathrm{def} \over = \{L^{\prime }\in \,\, {\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq L_{C_{11}}, \nsubseteq C_{21}} {\vert (\exists }L_i {\in }{\mathcal G}_{C_{11} ,\nsubseteq C_{21}} (L): L^{\prime } \supseteq L_i)\) and \((L_k \not \subseteq L^{\prime }, \forall L_k \in {\mathcal G}_{C^+} (L))\}, K_{U,C_{10}\subseteq L_{C_{11} } }^i \buildrel \mathrm{def} \over = \mathop \bigcup \nolimits _{K_k \in K_\mathrm{min,C_{10} \subseteq L_{C_{11}},\not \subseteq {C}_{21}}^{+},{\mathrm{k}\le \mathrm{i}}} K_k \), \(K_{U,C_{10}\subseteq L_{C_{11}},i} \buildrel \mathrm{def} \over = \left\{ {{\begin{array}{ll} {K_{U,C_{10} \subseteq L_{C_{11} } }^{i-1} \backslash K_i ,\mathrm{if} \ i \ge 2}\\ {\varnothing , \mathrm{if} \ i=1}\\ \end{array} }} \right. \), \(K_{-,C_{10}\subseteq L_{C_{11} } ,i} \buildrel \mathrm{def} \over = \quad L_{C_{11}}\backslash (C_{10} +K_{U,C_{10}\subseteq L_{C_{11}}}^i)\) and:
$$\begin{aligned}&{{\varvec{\mathcal F}}{\varvec{\mathcal S}}_{{\varvec{C_{10}}} \subseteq {\varvec{L_{C_{11}}}} ,{\varvec{G}}{{\varvec{\nsubseteq C_{21}}}}}^{*}} \nonumber \\&\quad \buildrel \mathrm{def} \over = \{L^{\prime } \buildrel \mathrm{def} \over = C_{10} +K_i +K_i^{{\prime }} +K_i^{\sim } {\vert }K_i \in K_\mathrm{min,C_{10}\subseteq L_{C_{11} } ,G \not \subseteq C_{21} } , \nonumber \\&\quad K_i^{{\prime }} \subseteq K_{U,C_{10}\subseteq L_{C_{11} } ,i} , K_i^{\sim } \subseteq K_{-,C_{10}\subseteq L_{C_{11} } ,i} \quad \mathrm{and} \nonumber \\&\quad (K_k {\not \subseteq }K_i +K_i^{\prime } , \forall K_k \in K_{\mathrm{min,C_{10} \subseteq L_{C_{11}} ,\not \subseteq \text{ C }_{21} }}^{+} : \nonumber \\&\quad 1\le k<i)^{(**)}, \forall i>M\}. \end{aligned}$$
(3b)
$$\begin{aligned}&\mathrm{Finally,} \ \mathrm{we} \ \mathrm{denote} \ {\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} }^*\buildrel \mathrm{def} \over = \nonumber \\&\quad {\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+} ,+L_{C^*} }^*+{\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq L_{C_{11} } ,{rm G}\not \subseteq \mathrm{C}_{21} }^*. \end{aligned}$$
(3c)
Obviously, \({\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq L_{C_{11}},\nsubseteq C_{21},}=\lfloor {L}\rfloor _{C_{-} \subseteq L_{C^+},+L_{C^*}}+\lfloor {L}\rfloor _{C_{10} \subseteq L_{C_{11}},G \not \subseteq C_{21}}\).
Proposition 4
(Partition, explicit structure and unique representation of \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{{\varvec{C_{10}}} \subseteq {\varvec{L_{C_{11}}}, {\varvec{\nsubseteq C_{21}}}}})\) Assume that the hypothesis \((H_{1})\) is satisfied and let \(L \in {\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21},}(s_0,s_1)\), then
(i)
All frequent itemsets in \({\mathcal F}{\mathcal S}_{\text{ C }_{-} \subseteq L_{C^+}}^*\), \({\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq L_{C_{11}},\text{ G }\not \subseteq \text{ C }_{21} }^*\), and thus, all ones in \({\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+},+L_{C^*}}^*\) and \({\mathcal F}{\mathcal S}_{\text{ C }_{10}\subseteq L_{C_{11} } ,\nsubseteq C_{21} }^*\) have a unique representation and are generated distinctly;
 
(ii)
$$\begin{aligned} {\mathcal F}{\mathcal S}_{{C}_{10} \subseteq {C_{11} } ,\nsubseteq C_{21} } = {\mathcal F}{\mathcal S}_{{C}_{10} \subseteq {C_{11} } ,\nsubseteq C_{21} }^*. \end{aligned}$$
(4)
(i.e. each constrained sub-class \({\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq L_{C_{11}},\nsubseteq C_{21}}\) is partitioned into two disjoint subsets \({\mathcal F}{\mathcal S}_{C_{-}\subseteq L_{C^{+}},+L_{C^*}}^*\), \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11}} ,\text{ G }\not \subseteq {C}_{21}}^*\)).
 
Example 3
(Illustrating disjoint partition and structure of \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{\varvec{C_{10}} \subseteq {\varvec{L_{C_{11}}}} ,{\varvec{\nsubseteq C_{21}}}}\)) According to Example 2.1, for \({L = afh} \in {\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11}, \nsubseteq C_{21},}(s_0,s_1)\), \({\mathcal G}_{C_{11}}(L) {= {\{}f, h{\}},}\) we will illustrate the direct generation of two constrained frequent itemsets \({{\{}fa, fha{\}}}\) in \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\nsubseteq L_{C_{21}},}\) using (4) based on the splitting of \({\mathcal G}_{C_{11} } (L)={\mathcal G}_{C^+} (L)+{\mathcal G}_{C_{11} ,\nsubseteq C_{21} } (L)\), where \(C^+ = L_{C^+} = ah, {\mathcal G}_{C^+} (L) \,\, {= \,\, {\{}h{\}},} {\mathcal G}_{C_{11} ,\nsubseteq C_{21} } (L){ \,\, = \,\, {\{}f{\}}. }\) Besides, \(C_{-} = a, L_{C^*} = {\{}f{\}}, K_\mathrm{min,C_{-}\subseteq L_{C^+} } = \{K_1 = h\}, K_{U,C_{-}\subseteq L_{C^+} ,1} =\varnothing \), \(K_{-,C_{-}\subseteq L_{C^+},1} =\varnothing \), \(2^{\left[ {C_{10} \backslash C_{21} \subseteq L_{C^*} } \right] *} = {\{}f{\}}, {\mathcal F}{\mathcal S}_{\text{ C }_{-} \subseteq L_{C^+} ,+L_{C^*} }^*= \{L^{\prime }\buildrel \mathrm{def} \over = L^{\prime \prime }+L^{\sim }= \,\, a+h+f = ahf\}\); \(K_{\mathrm{min},C_{10} \subseteq L_{C_{11}} ,G \not \subseteq C_{21}} = \{K_2 = f\}, K_{U,C_{10} \subseteq L_{C_{11} } ,2} = \{h\}, K_{-,C_{10} \subseteq L_{C_{11} } ,2} = afh \backslash \{a+ hf\} = \varnothing \), \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11} } ,{G}\not \subseteq {C}_{21} }^*= \{L^{\prime }=a+f+K_2^{\prime } +\varnothing \vert \varnothing \subseteq K_2^{\prime } \subseteq \{h\}\) and (\(K_1 \not \subseteq K_2 +K_2^{\prime } )\} = \{ af, \mathrm{with} K_2^{\prime } =\varnothing \} = \{ af\},\) since with \(K_2^{\prime } =h,\) then \(h=K_1 \subseteq K_2 +K_2^{\prime } \) = fh. So we reject \({L^{\prime } = afh}\) (as this itemset was belong to \({\mathcal F}{\mathcal S}_{{C}_{-} \subseteq L_{C^+} ,+L_{C^*} }^*)\) already). Thus, \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21}} = {\mathcal F}{\mathcal S}_{{C}_{-} \subseteq L_{C^+} ,+L_{C^*} }^*+ {\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11} } ,{G} \not \subseteq {C}_{21} }^*{= {\{}ahf, af{\}}.}\)
Remark 2
(An effective way to calculate subsets \({\varvec{K_{U,i}}}\), \({\varvec{K_{-,i}}}\) when finding sets in \({\varvec{\mathcal F}}{\varvec{\mathcal S}}^{*}\))
(a) To calculate \({\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+} }^*\) more effectively, we set \(K_U^i\) \(\buildrel \mathrm{def} \over = K_{U,C_{-}\subseteq L_{C^+}}^i , K_{U,i} \buildrel \mathrm{def} \over = K_U^{i-1} \backslash K_i , K_{-,i}{\buildrel \mathrm{def} \over = }L_{C^+} {\backslash (C{\_}+}K_U^i)\) and then find that \(K_{U,i}= [(K_U^{i-2} \backslash K_{i-1})+K_{i-1}]\backslash K_i = (K_{U,i-1} +K_{i-1} )\backslash K_i \), \(K_{-,i} =K_{-,i-1} \backslash K_i \), \(\forall \mathrm{i} \ge 1\) and \(K_0 \buildrel \mathrm{def} \over = K_{U,0} \buildrel \mathrm{def} \over = \varnothing \), \(K_{-,0} \buildrel \mathrm{def} \over = L_{C^+} \backslash \mathrm{C}{\_}\). Thus, \(K_{U,i} =\left\{ {\begin{array}{ll} {( {K_{U,i-1} +K_{i-1} })\backslash K_i ,\mathrm{if} \,\, 1 \leqslant i \leqslant M}\\ {\varnothing , \mathrm{if} \,\, i=0}\\ \end{array}}\right. \), \(K_{-,i} =\left\{ {\begin{array}{ll} {K_{-,i-1} \backslash K_i ,\mathrm{if} \,\, 1 \leqslant i \leqslant M}\\ {L_{C^+} \backslash {C}\_,\mathrm{if} \,\, i=0}\\ \end{array}} \right. \)\(K_0 \buildrel \mathrm{def} \over = \varnothing \).
(b) Similarly, we also have more effective recursive expressions for \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11}} ,{G}\not \subseteq {C}_{21} }^*\). If we set \(K_U^{i} \buildrel \mathrm{def} \over = K_{U,C_{10} \subseteq L_{C_{11} }}^i \), \(K_{U,i}\buildrel \mathrm{def} \over = K_U^{i-1} \backslash K_i \), \(K_{-,i}\buildrel \mathrm{def} \over = L_{C_{11} } \backslash (C_{10} +K_U^i )\), then, for \(\forall \,\, \mathrm{i}\,\, \geqslant M+1\,\,\geqslant \)2: \(K_{U,i} =( {K_{U,i-1} +K_{i-1} })\backslash K_i \), \(K_{-,i} = \left\{ {{\begin{array}{ll} {K_{-,i-1} \backslash K_i ,\mathrm{if} \,\, i \geqslant M+2}\\ {[(L_{C^*} \backslash C_{10} )+K_{-,M} ]\backslash K_{M+1} ,\mathrm{if} \,\, i=M+1}\\ \end{array} }} \right. .\) Indeed, for \(L_{C^{*}} =L\cap C_{11} \backslash C_{21},\) since \(K_{-,M+1} = L\cap C_{11} \backslash (C_{10} +K_U^{M+1} )= [L_{C^*} \backslash (C_{10} +K_U^M )\backslash K_{M+1}]+[L\cap C_{11} \cap C_{21} \backslash (C_{10}+K_U^M )\backslash K_{M+1}] = [L_{C^*} \backslash C_{10} \backslash K_{M+1}]+[L \cap C_{11} \cap C_{21} \backslash (C_{10} \cap C_{21} +K_U^M \backslash K_{M+1}] = [L_{C^*} \backslash C_{10} \backslash K_{M+1} ] + [L_{C^+} \backslash (\mathrm{C}{\_}+\mathrm{K}_\mathrm{U}^\mathrm{M} )\backslash K_{M+1}]= [(L_{C^*}\backslash C_{10} )+K_{-,M} ]\backslash K_{M+1}\). Especially, if \(M = 0\), then \(K_{\_,1}\,\,\mathbf{=}\,\, [(L_{C^*} \backslash C_{10} )+K_{-,0} ]\backslash K_1 \mathbf{=}\,\,[(L_{C^*} \backslash C_{10} )+(L_{C^+} \backslash C_{-} )] \backslash K_1 \,\, \mathbf{=}\,\, [(L_{C^*} \backslash C_{10} )+(L_{C^+} \backslash C_{10} \mathrm{)]}\backslash K_1 =L_{C_{11} } \backslash (C_{10} +K_1 ).\)
* In brief, for \(\forall M\geqslant \,\, 0 (If {\mathcal G}_{C^+} (L)\ne \varnothing \), then \(M>\,\,\)0; else \(M\,\,=0\,\,\)), if we set
\(.K_\mathrm{\mathrm min} \buildrel \mathrm{def} \over = \left\{ {\begin{array}{ll} {K_\mathrm{min,1} =K_\mathrm{min,C_{-}\subseteq L_{C^+} } , \quad if \,\, Begin=0}\\ {K_\mathrm{min,2} =K_\mathrm{min,C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} } , \ if\,\,Begin=M}\\ \end{array}} \right. , K_\mathrm{min}^+ \buildrel \mathrm{def} \over = K_\mathrm{min,1} +K_\mathrm{min,2} , K_{\text{ min }\_}^+ \buildrel \mathrm{def} \over = \left\{ {\begin{array}{ll} {K_\mathrm{min,1} , \quad if\,\, Begin=0}\\ {K_{\text{ min }}^+ , \quad if\,\, Begin=M}\\ \end{array} } \right. ;\)
\(.K_{U,i} =\left\{ {{\begin{array}{ll} {(K_{U,i-1} +K_{i-1} )\backslash K_i , \quad if i \geqslant 2}\\ {,\quad if i=1}\\ \end{array} }} \right. , \forall Begin \in \{0; M\} and K_{U,0} =\varnothing ;\)
\(.K_{-,i} = \left\{ {{\begin{array}{ll} {K_{-,i-1} \backslash K_i ,\quad if \ i \geqslant Begin+2}\\ {First\_\backslash K_i ,\quad if \ i=Begin+1}\\ \end{array} }} \right. . \forall Begin \ge 0\) where \(First{\_}=\left\{ {{\begin{array}{ll} {L_{C^+} \backslash C\_,\quad if \ Begin=0}\\ {( {L_{C^*} \backslash C_{10} })+K_{-,M} ,\quad if \ Begin=M}\\ \end{array} }} \right. \) and \(K_{-,0} \buildrel \mathrm{def} \over = L_{C^+} \backslash \text{ C }\_;\)
. Sub_L= \(\left\{ {{\begin{array}{ll} {C_- ,\quad if \ Begin=0}\\ {C_{10} ,\quad if \ Begin=M}\\ \end{array} }} \right. .\)
Then, we can calculate \({L^{\prime \prime } in }{\mathcal F}{\mathcal S}_{C_{-} L_{C^+},}^*and\) \(L^{\prime } in {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}} ,{G}\not \subseteq {C}_{21} }^*\) as follows
\({L^{\prime }}\buildrel \mathrm{def} \over = Sub{\_}L+K_i +K_i^{\prime } +K_i^{\sim }\), where \(K_i \in \) \(K_\mathrm{min,M} , K_i^{\prime } \subseteq K_{U,i} , K_i^{\sim } \subseteq K_{-,i} \,\,\) and \((K_k \not \subseteq K_i +K_i^{\prime },\forall K_k \in K_\mathrm{min},C_{10} \subseteq L_{C_{11}} ,\not \subseteq {C}_{21}^{+} :1\le k<i),\forall i\geqslant \,\, Begin+1,\) using the general algorithm below.
$$ \begin{aligned}&{\varvec{\mathcal F}}{\varvec{\mathcal S}_{\varvec{Sub\_L}}}^*= {\varvec{ MFS-EDC-SubClass}}\\&(Sub\_L, K_\mathrm{min}, { \& }{} { Begin}, { \& }K_{U,\text{ Begin }}, { \& }K_{\_,Begin}, K_\mathrm{min\_}^+ ). \end{aligned}$$
Then
$$\begin{aligned}&{\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^{+}} }^{*} = \mathbf{MFS-EDC-SubClass }\\&\quad (C_{-}, K_\mathrm{min,1} , {Begin}=1, {KU Begin}, {K{\_}Begin}, K_\mathrm{min,1}) \end{aligned}$$
And after the algorithm finishes, we obtain \({Begin}\,\,=\,\,M, KUBegin= K_{U,M} , K{\_}Begin = K_{-,M}\). After that, we calculate new values, \({Begin\,\,=\,\,M+1, KUBegin}\,\,=(K_{U,M} +K_M) \backslash K_{M+1} , K{\_}Begin = K_{-,M} \backslash K_{M+1},\) to find \({\mathcal F}{\mathcal S}_{\mathrm{\mathbf{C}}_{10} \subseteq L_{C_{11} } ,\mathrm{\mathbf{G}}\not \subseteq \mathrm{\mathbf{C}}_{21}}^{*} = \mathbf{MFS-EDC-SubClass }(C_{10} \), \(K_\mathrm{min,2} \), \(Begin = M+1, K_{U,Begin}, K_{-,Begin} , K_\mathrm{min}^{+})\).
(c) Note that, the condition \(^{(**)}\) is not satisfied if \(\exists K_i \in K_\mathrm{min,C_{10} \subseteq L_{C_{11}} ,{G}\not \subseteq {C}_{21} } \cap K_{\mathrm{min},C_{-}\subseteq L_{C^+}}\). Thus, \(K_{\mathrm{min},C_{10} \subseteq L_{C_{11}},{G}\not \subseteq {C}_{21} }\) and \(K_{\mathrm{min},C_{10} \subseteq L_{C_{11}} ,\not \subseteq {C}_{21} }^{+} \) can be replaced by \( K_{\mathrm{min},C_{10} \subseteq L_{C_{11} } ,{G}\not \subseteq {C}_{21} }\) \({\buildrel \mathrm{def} \over = Minimal{\{}}K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash C_{10} \), \(L_i \in {\mathcal G}_{C_{11} ,\nsubseteq C_{21} } (L), \forall i>M\)}\(\backslash K_{\mathrm{min},C_{-}\subseteq L_{C^+}}\) and \(K_{\mathrm{min},C_{10} \subseteq L_{C_{11} } ,\not \subseteq {C}_{21} }^+\buildrel \mathrm{def} \over = K_{\mathrm{min},C_{-}\subseteq L_{C^+}} +K_{\mathrm{min},C_{10} \subseteq L_{C_{11}} ,{G}\not \subseteq {C}_{21}}\), respectively (Fig. 3).
(d) Since \(K_k\) and \(K_i\) are two minimal subsets belonging to different sets \(K_{\mathrm{min},C_{-}\subseteq L_{C^+}}\) and \(K_{\mathrm{min},C_{10} \subseteq L_{C_{11}} ,{G}\not \subseteq {C}_{21}}\), respectively, so they can still get equal values and we are unable to replace the sign \(\not \subset \) in \(^{(**)}\) by \(\not \subseteq \), i.e. if \(K_k = K_i^{\prime } +K_i^{\sim }\), then we still eliminate \(K_i^{\prime }\). Indeed, consider the example in [4, 16], for \(L = acdfh, {\mathcal G}(L)\ = \$ \{cd, acf\}, supp(L) = 1/6, C_{10} = afd, C_{11} =acdf, C_{21} = cd, s_0 = 1/6, s_1 = 1\). Then, we have \(C^+ = cd, C_- = d, L_{C^{*}} = af, {\mathcal G}_{C_{11}} (L) = {\mathcal G}(L), {\mathcal G}_{C^{+}} (L)\,\,= \{L_1 = cd\}, {\mathcal G}_{C_{11} ,\nsubseteq C_{21}} (L)= \{L_2 = caf\},K_{\mathrm{min},C_{-}\subseteq L_{C^{+}}} = \{K_1 = c\}, K_{\mathrm{min},C_{10} \subseteq L_{C_{11}}, {G}\not \subseteq {C}_{21}} = \{K_2 = c\}, \varnothing \subset af \subseteq L^{\sim } \subseteq af, K_{U,C_{-}\subseteq L_{C^+} ,1} = \varnothing , K_{-,C_{-}\subseteq L_{C^+} ,1} = acdf \backslash (afd+c) = \varnothing , {\mathcal F}{\mathcal S}_{C_- L_{C^+} ,+L_{C^*}}^*= \{ L^{\prime \prime }+L^{\sim } = (d+c+\varnothing +\varnothing )+af \} = \{ dcaf \}; K_2^{\prime } \subseteq K_{U,C_{10} \subseteq L_{C_{11}}, 2} = K_1 \backslash K_2 = \varnothing , K_2^{\sim } \subseteq K_{-,C_{10} \subseteq L_{C_{11}} ,2} = acdf \backslash (afd+c)=\varnothing , K_2^{\prime } = \varnothing , K_2^{\sim } = \varnothing , K_1 = K_2 +\varnothing = c \) and then obtain a frequent itemset \(L^{\prime } = afd+c = afdc\) which coincides with \(L^{\prime }= dcaf\) in \({\mathcal F}{\mathcal S}_{C_- L_{C^+} ,+L_{C^*}}^*\). Therefore, we still wipe out \(L^{\prime }\) corresponding with \(K^{\prime }_2 =\varnothing \) and \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}} ,G \not \subseteq C_{21} }^*= \varnothing .\) Hence, \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}},G \not \subseteq C_{21} }^*= {\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+} ,+L_{C^*}}^*= {\{}dcaf{\}}\).
Theorem 1
(Structure of the solution set \({\varvec{\mathcal F}}{\varvec{\mathcal S}}_{\varvec{C_{10}} \subseteq {\varvec{C_{11}}}, {\varvec{\nsubseteq C_{21}}}}{} \mathbf{(s_0 ,s_1 )})\)
Assume that the hypothesis \((\mathrm{H}_{1})\) is satisfied. Then
$$\begin{aligned} {\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21} ,} (s_0 ,s_1 )= & {} \sum \nolimits _{L\in {\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq {C_{11}} ,\nsubseteq C_{21} } (s_0 ,s_1 )}\nonumber \\&\times {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} }^*. \end{aligned}$$
(5)
Proof
It is a consequence of (2) in Proposition 3 and (4) in Proposition 4. \(\square \)
Remark 3
(Some typically special cases) When the constraints are gotten special values, we obtain better results than those known in out former papers (since the way of calculating sets \(\mathrm{K}_\mathrm{U}^{\mathrm{i}} \), \(\mathrm{K}_\mathrm{{U,i}} \) and \(\mathrm{K}_{-,\mathrm{i}}\) to find solution sub-classes in this new version will take time less and thus more effective than the former one, according to Remark 2).
(i)
To show the structure of frequent itemsets with simple double constraint \({\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} }(s_0,s_1)\) in [17], we choose \(C_{21} = \varnothing \), with conditions changed as follows: \(C^*= C_{11} \,\, \ne \,\, \varnothing , C^{+} = C_- = \varnothing , {\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} } (s_0,s_1)\buildrel \mathrm{def} \over = \{L\in {\mathcal F}{\mathcal C}{\mathcal S}(s_0 ,s_1 ) \vert C_{10} \subseteq L_{C_{11} } ,L_{C_{11} } \,\, \ne \,\, \varnothing , {\mathcal G}_{C_{11} } (L)\ne \varnothing \}.\) Since \({\mathcal G}_{C^+} (L) = \varnothing \), \(K_\mathrm{min,C_{-}\subseteq L_{C^+}} = \varnothing , M=0\), so \({\mathcal G}_{C_{11} } (L)\equiv {\mathcal G}_{C_{11} ,\not \subseteq \varnothing }(L)\), \({\mathcal F}{\mathcal S}_{C_- L_{C^+} ,,+L_{C^*}}^*= \varnothing \), let \(K_{\mathrm{min},C_{10} \subseteq L_{C_{11} } } {}\buildrel \mathrm{def} \over = K_{\mathrm{min},C_{10} \subseteq L_{C_{11} } , \not \subseteq \varnothing }^+= K_{\mathrm{min},C_{10} \subseteq L_{C_{11} }, G\not \subset \varnothing } \,\,{= \,\, Minimal{\{}}K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash C_{10} , L_i \,\, \in \,\,{\mathcal G}_{C_{11} } (L), i\geqslant \,\, \) 1} and \(\,\, {\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11}}}^*\buildrel \mathrm{def} \over = {\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11} } ,,{G}\not \subseteq \varnothing }^*{= {\{}L^{\prime } \buildrel \mathrm{def} \over = }C_{10} +K_i +K_i^{\prime } +K_i^{\sim } \vert K_i \in K_\mathrm{min,C_{10} \subseteq L_{C_{11} } } , K_i^{\prime } \subseteq K_{U,i} , K_i^{\sim } \subseteq K_{-,i}\) and \( (K_k \,\, \,\, \not \subseteq K_i +K_i^{\prime } , \forall K_k \in K_\mathrm{min,C_{10} \subseteq L_{C_{11}}} : 1 \le k<i){\}},\) vói \(\,\, K_{U,i} = \left\{ {{\begin{array}{ll} {( {K_{U,i-1} +K_{i-1} })\backslash K_i ,if \,\, i \geqslant 1}\\ {\varnothing , if \,\, i=0}\\ \end{array} }} \right. , K_{-,i} = \left\{ {{\begin{array}{ll} {K_{-,i-1} \backslash K_i ,if \,\, i \geqslant 1}\\ {L_{C_{11} } \backslash C_{10} ,if \,\, i=0}\\ \end{array} }} \right. \) and \(K_0 \buildrel \mathrm{def} \over = \varnothing \) (see Remark 2).
 
(ii)
To obtain the results presented in [16] or in [2] with single constraints \({\mathcal F}{\mathcal S}_{{C_{10}}\subseteq }(s_0,s_1)\) or \({\mathcal F}{\mathcal S}_{C_{11}} (s_0,s_1 )\), respectively, we only need to not consider ones \(C_{11}\) or \(C_{10}\) in (i) which can be performed by assigning \(C_{11} ={\mathcal A}\) or \(C_{10} =\varnothing \), respectively.
 
(iii)
When \(C_{10} = C_{21} =\varnothing , C_{11} ={\mathcal A}\), \(s_1 = 1\), we have the structure of frequent itemsets without constraints, \({\mathcal F}{\mathcal S}^*(s_0)={\mathcal F}{\mathcal S}_{\varnothing \subseteq {\mathcal A},\not \subseteq \varnothing }^*(s_0,s_1)\). Then, \({\mathcal G}_{C_{11} } (L)={\mathcal G}(L)\), \(\forall L\in {\mathcal F}{\mathcal C}{\mathcal S}_{\varnothing \subseteq {\mathcal A},\not \subseteq \varnothing } (s_0 ,s_1 ) = {\mathcal F}{\mathcal C}{\mathcal S}(s_0) = \{L\in {\mathcal C}{\mathcal S}: \varnothing {\ne L, supp(L)} \geqslant s_0 \}\) and \({\mathcal F}{\mathcal S}_{\subseteq L}^*\buildrel \mathrm{def} \over = \{L^{\prime } \buildrel \mathrm{def} \over = L_i +L_i^{\prime } +L_i^{\sim } \vert L_i \in {\mathcal G}(L), L_i^{\prime } \subseteq L_{U,i} , L_i^{\sim } \subseteq L_{-,i}\) and \((L_k \not \subseteq L_i +L_i^{\prime } , \forall L_k \in {\mathcal G}(L){:}\, 1\le k<i)\}\), where \(K_{U,i} =\left\{ {{\begin{array}{ll} {( {K_{U,i-1} +K_{i-1} })\backslash K_i ,if \,\, i \geqslant 1} \\ {\varnothing , if \,\, i=0} \\ \end{array} }} \right. , K_{-,i} = \left\{ {{\begin{array}{ll} {K_{-,i-1} \backslash K_i ,if \,\, i \geqslant 1}\\ {L,if \,\, i=0}\\ \end{array} }} \right. \) and \(K_0 \buildrel \mathrm{def} \over = \varnothing .\)
 
(iv)
To show the structure of frequent itemsets with the dualistic constraint \({\mathcal F}{\mathcal S}_{\nsubseteq C_{21} } (s_0 ,s_1 )\buildrel \mathrm{def} \over = \{A\in {\mathcal F}{\mathcal S}(s_0, s_1) \vert A \nsubseteq C_{21} \}\) in [3], we choose \(C_{10} = \varnothing , C_{11} = {\mathcal A}.\,\, \) Conditions are changed as follows: \(C^*= {\mathcal A} \backslash C_{21} \ne \varnothing , supp({\mathcal A}) \leqslant s_1 , C^+ = C_{21} , C_- = \varnothing , {\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} } (s_0 ,s_1 ) \buildrel \mathrm{def} \over = \{L \,\, \in \,\, {\mathcal F}{\mathcal C}{\mathcal S}(s_0, s_1) \vert L \nsubseteq C_{21} \}, {\mathcal G}(L) = {\mathcal G}_{C_{21} } (L) + {\mathcal G}_{\nsubseteq C_{21}} (L)\), \({{\mathcal F}{\mathcal S}_{\subseteq L_{C_{21} } ,+L\backslash C_{21}}^*} \buildrel \mathrm{def} \over = \{L'\buildrel \mathrm{def} \over = L^{\prime \prime }+L^{\sim }\vert L^{\prime \prime }\in {\mathcal F}{\mathcal S}_{\subseteq L_{C_{21} } }^*, \varnothing \subset L^{\sim }\subseteq L\backslash C_{21} \}\), \({\mathcal F}{\mathcal S}_{\subseteq {L}, {G}\not \subseteq {C}_{21} }^*\buildrel \mathrm{def} \over = {\mathcal F}{\mathcal S}_{\varnothing \subseteq L, {G} \not \subseteq {C}_{21} }^*\) and \({\mathcal F}{\mathcal S}_{\nsubseteq L_{C_{21} } }^*= {\mathcal F}{\mathcal S}_{\subseteq L_{C_{21} } ,+L\backslash C_{21}}^*+ {\mathcal F}{\mathcal S}_{\subseteq L,G \not \subseteq C_{21}}^{*}\).
 
According to Propositions 3 and 4, we obtain two procedures MFS-EDC-FirstSubClass and MFS-EDC- SecondSubClass (pseudo code shown in Fig. 5a, b) to produce constrained frequent itemsets in two sub-classes \({\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+} ,+L_{C^*}}^*\) and \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}},G \not \subseteq C_{21} }^*\), respectively, and then the procedure MFS-EDC-OneClass (see in Fig. 5) to generate all constrained frequent itemsets in an equivalence class. Using Theorem 1 and these procedures, the algorithm MFS-EDC is proposed, shown in Fig. 4, for mining all frequent itemsets with EDC.

5 Experiments

Experiments were performed on a PC with an i5-2400 CPU, 3.10 GHz@ 3.09 GHz PC and 3.16 GB of memory, running on Windows XP. The algorithms were coded in C#. To compare the performance, the source code for Charm-L [35], MinimalGenerators [34] and dEclat [36] was converted to C#. Charm-L and MinimalGenerators were used to mine the lattice of the closed itemsets and their generators. dEclat was used to exploit all frequent itemsets.
To test and evaluate our new proposed algorithm, \({\mathbf {MFS}}\text {-}{\mathbf {EDC}}\), we compare its performance to those of two different new post-processing algorithms. The first one is called MFS-E-EDC that is a new modified version of dEclat for mining frequent itemsets with the extended double constraint. MFS-E-EDC is done by integrating constraints \(s_{0}\) and \(C_{11}\) into dEclat algorithm to discover only frequent itemsets satisfying two these constraints. Then, MFS-E-EDC implements a post-processing step to filter frequent itemsets satisfying the remaining constraints, \(s_{1}\), \(C_{10}\) and \(C_{21}\). The second new post-processing algorithm is named MFS-PP-EDC that is a modification of Gen_Itemsets [2]. MFS-PP-EDC includes two steps. In the first step, it uses Gen_Itemsets to mine all frequent itemsets without constraints. The second one is to directly check all generated frequent itemsets on the constraints to filter frequent itemsets satisfying extended double constraint.
Table 1
Dataset characteristics
Dataset
#Items
#Records
Avg. length
Connect (C)
129
67,557
43
Mushroom (M)
119
8124
23
Pumsb (P)
7117
49,046
74
Chess (Ch)
75
3196
37
T40I10D100K (T40)
1000
100,000
40
We chose benchmark datasets in FIMDR [13] including Pumsb, Connect, Mushroom, Chess, and T40I10D100K to test the algorithms in performance. Pumsb, Connect, Chess, and Mushroom are real and dense, i.e. they produce many long frequent itemsets even for very high support values. The other is synthetic and sparse. Table 1 shows their characteristics.
We keep the support threshold s\(_{1}\) unchanged at 0.9. Assuming that the size of \(C_{10}\) is m, then \(C_{11}\) with the size of \(m+d{*}\vert A^{F}\vert /100 (d. \in [1, 100])\) is chosen. For each pair of datasets (DB) and minimum support (MS), m ranges from 10 to 28 % of \(\vert A^{F}\vert \) (step 2 %) and d = 60. For each pair of \(C_{10}\)’s size and \(C_{11}\)’s size, there are 10 value triples of \(C_{10}\), \(C_{11}\) and \(C_{21}\) randomly selected from \(A^{F}\) (the size of \(C_{21}\) is also chosen randomly).
Let T_EDC, T_PP_EDC, and T_E_EDC be the average execution times of MFS-EDC, MFS-PP-EDC, and MFS-E-EDC for 100 selected extended double constraints.
Table 2
Time reductions of MFS-EDC compared to MFS-PP-EDC and MFS-E-EDC
DS-MS
R_PP (%)
R_E (%)
DS-MS
R_PP (%)
R_E (%)
M-14
1.50
6.32
Ch-76
33.06
17.58
M-12
1.44
6.86
Ch-72
25.73
27.86
M-10
0.60
9.82
C-82
4.48
5.12
M-8
0.53
10.40
C-80
23.62
6.71
M-6
0.34
11.97
C-78
13.01
5.73
P-82
65.87
5.89
C-76
9.11
6.20
P-78
26.06
9.62
C-72
4.03
6.27
P-76
17.70
12.80
T40-1
45.27
5.41
P-74
11.71
16.17
T40-0.8
42.23
5.12
P-72
8.57
23.24
T40-0.6
41.12
4.87
Ch-84
3.51
2.94
T40-0.4
39.23
4.13
Ch-82
7.54
5.82
T40-0.2
38.01
3.98
Ch-80
2.85
2.90
   
Table 2 shows the experimental evaluation of MFS-EDC against MFS-PP-EDC and MFS-E-EDC, where column DS-MS denotes the dataset DS with the minimum support MS (for example, \(M-14\) means the dataset Mushroom with the minimum support of 14 %), column R_PP shows the ratios of T_EDC and T_PP_EDC, and column R_E reveals the rates of T_EDC and T_E_EDC. Compared to MFS-PP-EDC, MFS-EDC is faster for all selected datasets. The time is reduced by 65.87–0.34 %. MFS-EDC is also much faster than MFS-E-EDC for all datasets with the time reduction from 27.86 to 2.90 %.
We found that the reason for the reduction in the mining time of MFS-EDC in comparison with MFS-PP-EDC and MFS-E-EDC is because there are a large number of candidates which fail the last test of both MFS-PP-EDC and MFS-E-EDC, leading to their lower performance. Note that, for sparse dataset T10, the time reduction of MFS-EDC, compared to MFS-PP-EDC, in general, is not high (over 54.73 %) because the number of frequent itemsets is small and their size is small too, leading to a low cost for testing the constraints. However, when compared to MFS-E-EDC for this dataset, the figure is quite high, accounting for over 94.59 %. This can be explained that when constraints are changed, MFS-E-EDC have to re-scan the original dataset, which will take a lot of mining time, while MFS-EDC only needs to travel back to the lattice.
Figure 6a, b show the comparisons of the average execution times for various support values. The performance and scalability of MFS-EDC are superior to those of MFS-PP-EDC and MFS-E-EDC.
Figure 7a, b show the results in the average execution time of different numbers of constraints. We realize that the performance gap between MFS-EDC and MFS-E-EDC increases along with the number of constraints (#Constraints). The main reason is that, when the extended double constraints changes, MFS-EDC executes without creating the lattice of closed itemset and their generators again from the dataset.
In general, MFS-EDC outperforms both MFS-PP-EDC and MFS-E-EDC, especially when the minimum support is lower and the number of constraints is high.

6 Conclusion and future work

In this paper, in theory, checking the general constraints was performed directly on the lattice of closed itemsets and their generators based on partitioning solution set into disjoint equivalence sub-classes. Instead of eliminating an enormous amount of itemsets not satisfying the constraints by so many direct checks, the partition helps to test and eliminate redundant, candidate equivalence classes only based on the necessary conditions. Thereby, the structure and explicit representation of \({\mathcal F}{\mathcal S}_{C_{10}\subseteq C_{11}, \nsubseteq C_{21}}(s_0,s_1)\) were shown and proven to be reliable. On the basis of the theoretical results, on practice, the corresponding algorithm, MFS-EDC, to find solution set without generating any redundant candidate was obtained. Its efficiency was verified and compared to several post-processing algorithms on a lot of benchmark datasets in the domain.
Based on saving a not too large number of the lattice of closed itemsets and their generators, the approach of the paper is sustainable through the regular changes of constraints given by online users. In addition, these generally theoretical results are also the reliable basis for designing parallel algorithms that efficiently mine frequent itemsets with more general constraints in real time.

7 Appendix

7.1 Appendix 1: Proof of Proposition 2

Obviously, \(\forall L^{\prime }\in {\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21},} (s_0,s_1) \subseteq {\mathcal F}{\mathcal S}(s_0,s_1),\) we have \(L \buildrel \mathrm{def} \over = h(L^{\prime })\in {\mathcal F}{\mathcal C}{\mathcal S}(s_0 , s_1), {\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21},} (s_0, s_1) = \sum \nolimits _{L\in {\mathcal F}{\mathcal C}{\mathcal S}(s_0, s_1)} \{[L] \cap {\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\not \subseteq C_{21},} (s_0,s_1)\}\) and \({[L]}\cap {\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\not \subseteq C_{21},} (s_0, s_1 )= \{L^{\prime } \in [L] \vert C_{10} \subseteq L^{\prime } \subseteq L_{C_{11}} , L^{\prime } \not \subseteq C_{21} \} = \{L^{\prime } \in {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} }} \vert L^{\prime } \not \subseteq C_{21} \} = {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}}, \not \subseteq C_{21}} \). \(\square \)

7.2 Appendix 2: Proof of Lemma 1 (a)

(i)
We need only prove that \({\mathcal G}(L^{\prime }) \ne \varnothing \Rightarrow L^{\prime } \ne \varnothing \), the remain of the assertion could see in [4]. Indeed, if \({\mathcal G}(L^{\prime }) \ne \varnothing \), then \(\exists L_i \in {\mathcal G}(L^{\prime }): \varnothing \subset L_i\subseteq L^{\prime }\), i.e. \(L^{\prime } \ne \,\, \varnothing .\)
 
(ii)
\(\subseteq ^{\prime \prime }: \forall L_k \in G_B (L): L_k \in {\mathcal G}(L), L_k \subseteq B, \) we have \(L_k\subseteq L_B \subseteq L, L = h(L_k ) = h(L_B ),\) thus, \(L_k \in {\mathcal G}(L_B ).\) .“\(\supseteq \)”: Conversely, \(\forall L_k \in {\mathcal G}(L_B ),\) then \(L_k\subseteq L_B \subseteq B, h(L_k ) = h(L_B );\) since \({\mathcal G}_B (L) \ne \varnothing \), so \(\exists L_i \in G(L): L_i \subseteq B, L_i\subseteq L_B \subseteq L\) and \(L= h(L_i)= h(L_B ) = h(L_k ),\) i.e. \(L_k \in G(L): L_k \subseteq B\) or \(L_k \in \ {\mathcal G}_B (L).\)
 
(iii)
Since \({\mathcal G}_B (L) \ne \varnothing , so \,\, \exists L_j \in {\mathcal G}(L): L_j \,\, \subseteq \,\, L_B {\buildrel \mathrm{def} \over = L\cap B}\) \(\subseteq L\), so \(h(L_j) = L = h(L_B )\). Denote \(L{*}\buildrel \mathrm{def} \over =\) A\(\cup L_j,\) then \(A\subseteq L*, \varnothing \,\, \subset L_j \subseteq L*\subseteq L_B \subseteq L, L = h(L_j) = h(L*) = h(L_B )\) and \(L* \in FS_{A\subseteq L_B } \ne \varnothing \). Moreover, \(\forall L^{\prime } \in FS_{A \subseteq L_B } , \forall L_i \in \mathcal G(L),\) since \(L^{\prime } \ne \varnothing ,\) by (i), then for any \(L_i \in \mathcal G(L^{\prime }) \ne \varnothing ,\) we have \(L_i \subseteq L^{\prime }\subseteq L_B \subseteq B, h(L_i) = h(L^{\prime })= L,\) thus \(L_i \in {\mathcal G}_B (L), {\mathcal G}(L^{\prime }) \subseteq {\mathcal G}_B (L)\) and \( h(L_i) = L = h(L_B).\)
 
(iv)
Since \(\varnothing \subset U \subseteq V, U \) and V are finite, so there exist Minimal(U) and Minimal(V). If \(\exists M \in U\cap Minimal(V)\), and assume that \(M\notin \) Minimal(U), so \(\exists K \in Minimal(U) \subseteq V\) such that K \(\subset M\). It contradicts the hypothesis \(M \in Minimal(V)\)!
 
(b) Consider \(\forall L^{\prime }\subseteq A: L^{\prime } \ne \varnothing , L^{\prime }\supseteq A, h(L^{\prime }) = L: \exists L_i \in {\mathcal G}(L^{\prime }), Cond(L_i ). \) Then, \(K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash A{\in U}\) and \(U \ne \varnothing .\) For any \(K_k \mathop =\limits ^{\mathrm{def}} L_k {\backslash A\in U:} L_k \in {\mathcal G}(L^{\prime }) \subseteq {\mathcal G}(L), Cond(L_k)\), then \(L_k \in {\mathcal G}(L)\) and \(K_k {\in V,}\) thus, \( U\subseteq V\). Due to \(\varnothing {\subset U} \ \subseteq V,U\) and V is finite, then \(Minimal(V)\ne \varnothing , Minimal(U) \ne \varnothing \). Consider \(\forall K_j {\in Minimal(U),}\) then \(L^{\prime }\supseteq A\cup L_j = A+K_j \) and \(K_j {\in Minimal(V).}\) Indeed, assume \(K_j \notin Minimal(V),\) then \(\exists K_k \mathop =\limits ^{\mathrm{def}} L_k \backslash A \in Minimal(V): K_k {\subset }K_j , L_k \ {\in } \ {\mathcal G}(L)\) and \(Cond(L_k )\); since \(L_k \subseteq A+K_k \subseteq A+K_j \subseteq L^{\prime }\subseteq L,\) so \(L = h(L_k ) = h(L^{\prime }), L_k \in {\mathcal G}(L^{\prime }), K_k \in U\cap Minimal(V),\) thus \(K_k \in Minimal(U)\): it is a contradiction of the facts \(K_j \in Minimal(U)\) and \(K_k \subset K_j !\) Hence, \(Minimal(U) \subseteq Minimal(V)\). We can always assume that i is the minimum index in all ones of \(K_i\) in Minimal(U).
(c) If \(L^{\prime } = A+K_i +K_i^{\prime } +K_i^{\sim },\) where \(K_i \in K_\mathrm{min} , K_i^{\prime } \subseteq K_{U,i} , K_i^{\sim } \subseteq K_{-,i} ,\) then, since \(K_i \subseteq L_B , K_{-,i} \subseteq L_B , \)so \(K_{U,i} \subseteq K_U^{i-1} \subseteq L_B .\) Hence, \( A \subseteq L^{\prime }\subseteq L_B .\) If \(K_i \buildrel \mathrm{def} \over = L_i \backslash A,\) with \(L_i \in {\mathcal G}(L),\) then \(\varnothing \subset L_i \subseteq A+K_i \subseteq L^{\prime }\subseteq L_B \subseteq L, \) so \( L^{\prime } \ne \varnothing , L = h(L^{\prime }) = h(L_i ) = h(L_B ) \) and \(L^{\prime }\in FS_{A \subseteq L_B }.\)
\(\square \)

7.3 Appendix 3: Proof of Proposition 3

\(+ ``(a). \Rightarrow \) và b. \(\subseteq ^{\prime \prime }: \forall { L^{\prime }\in }{\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21} ,} (s_0 ,s_1 ),{ L^{\prime }\in }{\mathcal F}{\mathcal S}(s_0 ,s_1 ), C_{10} \subseteq L^{\prime }\subseteq C_{11} {, L^{\prime } \not \subseteq }C_{21} , \mathrm{call} \ { L=h(L^{\prime }), }L_i {\in }{\mathcal G}(L^{\prime })\subseteq {\mathcal G}(L)\) (by Lemma 1a(i)) then \({ supp(L) =} supp(L^{\prime })\in [s_0 ,s_1 ], C_{10} \subseteq L^{\prime } \subseteq L_{C_{11} } , \varnothing \subset L_i \subseteq L_i \cup C_{10} \subseteq L^{\prime }\subseteq L_{C_{11} } \subseteq L \) and \(L = h(L_i ) = h(L^{\prime }) = h(L_{C_{11}} )\). Due to \(\varnothing \subset L^{\prime }\backslash C_{21} = L^{\prime }\backslash L_{C_{21} } \subseteq L_{C_{11}} \backslash C_{21} , \) then \({L^{\prime }\in } {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}} ,\nsubseteq C_{21}} ,{ L\in }{\mathcal F}{\mathcal C}{\mathcal S}(s_0, s_1), C_{10} \subseteq L_{C_{11}} \nsubseteq C_{21}\) and \(L_i \in {\mathcal G}_{C_{11} } (L) \ne \varnothing , \) i.e. \({L\in }{\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21} ,} (s_0 ,s_1 ) \ne \varnothing .\)
\({(a).\Leftarrow \text {''}: } \ \forall {L\in }{\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21} ,} (s_0,s_1), C_{10} \subseteq L_{C_{11} } \nsubseteq C_{21} , {\mathcal G}_{C_{11} } (L) \ne \varnothing , \) let \(L_i \in {\mathcal G}_{C_{11} } (L)\subseteq {\mathcal G}(L): L_i \subseteq C_{11} \) and \( L^{\prime } \buildrel \mathrm{def} \over = (C_{10} {\cup L_i )\cup {\{}a{\}}}\), where \({a\in }L_{C_{11}}{\backslash } C_{21} \ne \varnothing \). Then, \( {a\in L^{\prime }\backslash } C_{21} \ne \varnothing \), so \(C_{10} \subseteq L^{\prime } \subseteq L_{C_{11} } \subseteq C_{11} {, L^{\prime } \not \subseteq }C_{21} , \varnothing \ {\subset } \ L_i \subseteq L^{\prime }\subseteq L_{C_{11} } \subseteq L\) and \( L = h(L_i) = h(L^{\prime }) = h(L_{C_{11} }), supp(L^{\prime }) = supp(L)\in [s_0 ,s_1 ], L^{\prime }\in {\mathcal F}{\mathcal S}(s_0 ,s_1 ). \) Thus, \(L^{\prime }\in {\mathcal F}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21} } (s_0 ,s_1 ) \ne \varnothing \).
\({+ ``b. \supseteq }^{\prime \prime }\): It is a consequence of Proposition 2 and \({\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21},}(s_0,s_1) \subseteq {\mathcal F}{\mathcal C}{\mathcal S}(s_0,s_1).\) \(\square \)

7.4 Appendix 4: Proof of Proposition 4

Note that, \({\mathcal F}{\mathcal S}_{C_{10} \subseteq {C_{11} } ,\nsubseteq C_{21} } = \lfloor {L}\rfloor _{C_{-} \subseteq L_{C^+} ,+\lfloor {L}\rfloor _{C^*}} +L_{C_{10} \subseteq {C_{11} } ,G \not \subseteq C_{21} } \). Consider \(\forall {L\in }{\mathcal F}{\mathcal C}{\mathcal S}_{C_{10} \subseteq C_{11} ,\nsubseteq C_{21} ,}(s_0 ,s_1)\).
(i) + The uniqueness in the representation of \(L^{\prime \prime }\in {\mathcal F}{\mathcal S}_{\text{ C }_{-} \subseteq L_{C^+} ,}^*\): Assume that \( L^{\prime }\) has two representations: \( { L^{\prime \prime } = C{\_}+}K_k +K_k^{\prime } +K_k^{\sim } {= C{\_}+}K_i +K_i^{\prime } +K_i^{\sim } , \) where \({ 1 \le k < i, }K_k \) and \( K_i \in K_{\mathrm{min},C_{-} \subseteq L_{C^+} } , K_i^{\prime } \subseteq K_{U,\mathrm min,C_{-}\subseteq L_{C^+} ,i} , K_i^{\sim } \subseteq K_{-,C_{-}\subseteq L_{C^+} i} , K_k^{\prime } \subseteq K_{U,C_{-}\subseteq L_{C^+},k} , K_k^{\sim } \subseteq K_{-,C_{-}\subseteq L_{C^+} ,k} \). Then, \(K_k \subseteq K_i +K_i^{\prime } +K_i^{\sim } , \) but due to \(K_k\subseteq K_{U,C_{-} \subseteq L_{C^+} }^{k} \subseteq K_{U,C_{-}\subseteq L_{C^+} }^{i} , K_i^{\sim } {\cap } K_{U,C_{-}\subseteq L_{C^+} }^i = \varnothing \),so \(K_k \cap K_i^{\sim } =\varnothing \) and \(K_k \subset K_i +K_i^{\prime }\) (the equality does not happen, because \(K_{k}\) and\( K_{i}\) are two different minimal sets in \(K_\mathrm{min,C\subseteq L_{C^+} }):\) it is a contradiction of the selection of \(\mathrm{K}_\mathrm{i}\)!
+ The uniqueness in the representation of \(L^{\prime }\in {\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq L_{C_{11} } ,,\text{ G }\not \subseteq \text{ C }_{21} }^*\): Assume that \(L^{\prime }\) have two representations: \( L^{\prime } = C_{10} +K_k +K_k^{\prime } +K_k^{\sim } =C_{10} +K_i +K_i^{\prime } +K_i^{\sim }\), where \({M \le k < i}, K_i {\in }K_\mathrm{min,C_{10} \subseteq L_{C_{11} } ,\text{ G }\not \subseteq \text{ C }_{21}} ,\) \(K_k {\in }K_\mathrm{min,C_{10} \subseteq L_{C_{11} } ,\text{ G }\not \subseteq \text{ C }_{21} } \subseteq K_\mathrm{min,C_{10} \subseteq L_{C_{11} } ,\not \subseteq \text{ C }_{21} }^+ \), \(K_k \ne K_i \), \(K_i^{\prime } \subseteq K_{U,C_{10} \subseteq L_{C_{11} } ,\not \subseteq \text{ C }_{21} ,i} \), \(K_i^{\sim } \subseteq K_{-,C_{10} L_{C_{11} } ,\not \subseteq \text{ C }_{21} ,i} \),\(_{ }K_k^{\prime } \subseteq K_{U,C_{10}L_{C_{11} } ,\not \subseteq \text{ C }_{21} ,k} \), \(K_k^{\sim } \subseteq K_{-,C_{10} \subseteq L_{C_{11} } ,\not \subseteq \text{ C }_{21} ,k} \). Then, \(K_k \subseteq K_i +K_i^{\prime } +K_i^{\sim } \), but since \(K_k\subseteq K_{U,C_{10} \subseteq L_{C_{11} } ,\not \subseteq \text{ C }_{21} }^k \subseteq K_{U,C_{10} \subseteq L_{C_{11}} ,\not \subseteq \text{ C }_{21} }^i \), \(K_i^{\sim } \cap K_{U,C_{10} \subseteq L_{C_{11} } ,\not \subseteq \text{ C }_{21} }^i =\varnothing \), so \(K_k \cap K_i^{\sim } = \varnothing \) and \(K_k \subset K_i +K_i^{\prime }\) ((the equality does not also happen, because \(K_{k}\) and\( K_{i}\) are two different minimal sets in \(K_\mathrm{min,C_{10} \subseteq L_{C_{11} },\text{ G }\not \subseteq \text{ C }_{21}})\): it contradicts the selection of \(K_i\)!
(ii) + “\({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11}} ,\nsubseteq C_{21} } \subseteq {\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} }^*\)”: We will prove that \(\lfloor {L}\rfloor _{C_{-} \subseteq L_{C^+},+L_{C^*}} \subseteq {\mathcal F}{\mathcal S}_{\text{ C }_{-} \subseteq L_{C^+} ,+L_{C^*} }^*\) and \(\lfloor {L}\rfloor _{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} } \subseteq {\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} }^*\), thus, \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} } \subseteq {\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq {C_{11} } ,\nsubseteq C_{21} }^*\).
. “\(\lfloor {L}\rfloor _{C_- L_{C^+} ,+L_{C^*} } \subseteq {\mathcal F}{\mathcal S}_{\text{ C }_- L_{C^+} ,+L_{C^*} }^{*}\)”: \(\forall {L^{\prime }} \ \in \ {\mathcal F}{\mathcal S}_{C_{10} \subseteq {C_{11} } ,\nsubseteq C_{21} } , \exists L_i \ \in {\mathcal G}_{C^+} (L) \subseteq {\mathcal G}(L): L_i \subseteq L^{\prime },\) then \(L^{\prime } \ne \varnothing \) and \(L^{\prime } \supseteq C_{10}, L_i \subseteq L^{\prime } \subseteq L_{C_{11} } \subseteq C_{11}, L_i \subseteq C^+ \subseteq C_{21} .\) Since \(L^{\prime }\subseteq L_{C_{11} } \), so based on \(C_{21},\) we can partition \(L'\) into disjoint subsets \({L' \buildrel \mathrm{def} \over = L^{\prime \prime }+L}^{\sim }, \) where \(C_-\subseteq { L^{\prime \prime } \buildrel \mathrm{def} \over = L^{\prime }\cap }C_{21} \subseteq L_{C_{11} } \cap C_{21} = L_{C^+} \subseteq L_{C_{21}}\) and \(C_{10} \backslash C_{21} \subseteq L^{\sim } \buildrel \mathrm{def} \over = L^{\prime }\backslash C_{21} \subseteq L_{C_{11}} \backslash C_{21} = L_{C^*}\). Moreover, due to \(L^{\prime \prime }\backslash C_{21} = \varnothing , L^{\sim } \cap C_{21} = \varnothing {, L^{\prime } \not \subseteq }L_{C_{21} } ,\) so \(\varnothing \subset L^{\prime } \backslash L_{C_{21} } = L^{\sim } \backslash L_{C_{21}} = L^{\sim }, L^{\sim } \in 2^{\left[ {\text{ C }_{10} \backslash C_{21} \subseteq L_{C^*} } \right] *}.\) We need to prove that \(L^{\prime \prime }\in {\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+} }^*\). Let \(K_i \buildrel \mathrm{def} \over = L_i \backslash C\_ = L_i \backslash C_{10} (\mathrm{v}\)ì \(L_i \subseteq C_{21} ), Cond(L_k ) {\buildrel \mathrm{def} \over =} (L_k \subseteq C^+),\) \(U \buildrel \mathrm{def} \over = \{K_k {\buildrel \mathrm{def} \over = }L_k \backslash C\_{\vert }L_k {\in }{\mathcal G}(L^{\prime \prime }), Cond(L_k )\}, V {\buildrel \mathrm{def} \over =} \{K_k {\buildrel \mathrm{def} \over =} L_k {\backslash }C\_{\vert }L_k \in {\mathcal G}(L), Cond(L_k )\} = \{K_k \buildrel \mathrm{def} \over =L_k {\backslash }C\_{\vert }L_k \in {\mathcal G}_{C^+} (L)\},\) then \(L_i \subseteq C\_+K_i = C\_{\cup }L_i \subseteq L^{\prime \prime } \subseteq L, h(L_i) = h(L^{\prime \prime }) =L\) and \(L_i {\in }{\mathcal G}(L^{\prime \prime })\), thus, \(K_i {\in U }\subseteq V, K_\mathrm{min,C_{-}\subseteq L_{C^+}} = Minimal(V).\) From Lemma 1b, we can always assume that i is the minimum index such that \(K_i {\in Minimal(U).}\) Then, \(K_i {\in Minimal(V) }\)\({L^{\prime \prime } \supseteq }C\_+K_i \). We can represent \(L^{\prime \prime }\) as follows: \({L^{\prime \prime } \buildrel \mathrm{def} \over = }C\_+K_i +K_i^{\prime } +K_i^{\sim },\) where \(K_i^{\prime } \ {\buildrel \mathrm{def} \over = [L^{\prime \prime }\backslash (}C\_+K_i {)]\cap }K_{U,C_{-} \subseteq L_{C^+} }^i \subseteq K_{U,C_{-}\subseteq L_{C^+} }^i {\backslash }K_i =K_{U,C_{-}\subseteq L_{C^+} ,i} , K_i^{\sim }{\buildrel \mathrm{def} \over = [L^{\prime \prime }\backslash (}C\_+K_i )\backslash K_{U,C_{-}\subseteq L_{C^+} }^i \subseteq K_{-,C_{-}\subseteq L_{C^+} ,i}.\) Finally, assume that the condition \(^{(*)}\) is false, i.e. \(\exists K_k \equiv L_k {\backslash }C_- {\in }K_\mathrm{min,C_{-} \subseteq L_{C^+} } {: 1 \le k < i}\) and \(K_k {\subset }K_i +K_i^{\prime },\) then \(L_k {\in }{\mathcal G}_{C^+} (L) \subseteq {\mathcal G}(L), L_k \subseteq C_{-} +K_k \subseteq C_{-} +K_i +K_i^{\prime } \subseteq L^{\prime \prime } \subseteq L \) and \(L_k {\in }{\mathcal G}(L^{\prime \prime }).\) Hence, \(K_k \in U \cap Minimal(V), K_k \in Minimal(U)\) and \(k<i\): it contradicts the selection of the index i! Thus, \({ L''\in }{\mathcal F}{\mathcal S}_{C_{-} L_{C^+} }^*.\)
. \(``\lfloor {L}\rfloor _{C_{10} \subseteq L_{C_{11} } ,G\not \subseteq C_{21} } \subseteq {{\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11} } ,G\not \subseteq C_{21}}^{*}} ^{\prime \prime }\): cxv \(\forall \) \({L^{\prime }} \in \lfloor {L}\rfloor _{{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} }},\) we have \(L^{\prime } \in {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} } , \exists L_i \in {\mathcal G}_{C_{11} ,\nsubseteq C_{21} } (L) \subseteq G(L): L_i \subseteq L^{\prime }, \) so \( L^{\prime } \ne \varnothing , { L^{\prime } \supseteq }C_{10} , L_i \subseteq L^{\prime } \subseteq L_{C_{11} } \subseteq C_{11} , L_i \not \subseteq C_{21} \) and \((L_k {\not \subseteq L^{\prime }, } \forall L_k \in {\mathcal G}_{C^+} (L))^{(-)}. \) From Lemma 1a, we have \({h(L^{\prime }) = L}\) and \(L_i \in {\mathcal G}(L^{\prime }).\) Let \(Cond^{\prime }(L_k) \buildrel \mathrm{def} \over = (L_k {\not \subseteq }C_{21} , L_k \subseteq C_{11} ),\) \(U \buildrel \mathrm{def} \over = \{K_i \mathop =\limits ^{\mathrm{def}} L_i {\backslash }C_{10} {\vert }L_k \ {\in } \ {\mathcal G}(L^{\prime }), Cond^{\prime }(L_k )\},\) \(V \buildrel \mathrm{def} \over = \{K_i \mathop =\limits ^{\mathrm{def}} L_i {\backslash }C_{10} {\vert }L_k \ {\in } \ {\mathcal G}(L), Cond^{\prime }(L_k )\} =\{K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash C_{10} \vert L_k \in {\mathcal G}_{C_{11} ,\nsubseteq C_{21} } (L)\}\). From Lemma 1b, due to \(K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash C_{10} \in U,\) then \(\varnothing \ne U \subseteq V \) and we can choose the minimum index i such that \(K_i \in Minimal(U)\). Then, \({L^{\prime } \supseteq } C_{10} +K_i \) and \(K_i \in {Minimal(V) \equiv }K_\mathrm{min,C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} } \ne \varnothing \). Since \(C_{10} +K_i \subseteq L^{\prime } \subseteq L_{C_{11} } ,\) call \(L_i^{\prime \prime } \buildrel \mathrm{def} \over = L'\backslash (C_{10} +K_i ) \subseteq L_{C_{11}} {\backslash (}C_{10} +K_i ), \) so \(L^{\prime } = (C_{10} +K_i )+L_i^{\prime \prime } = C_{10} +K_i +K_i^{\prime } +K_i^{\sim }\), where \(K_i^{\prime } = L_i^{\prime \prime } \cap K_{U,C_{10} \subseteq L_{C_{11} } }^{i-1} \subseteq K_{U,C_{10} \subseteq L_{C_{11} } }^{i-1} \backslash K_i = K_{U,C_{10} \subseteq L_{C_{11} } ,i} , K_i^{\sim } = L_i^{\prime \prime } \backslash K_{U,C_{10} \subseteq L_{C_{11}} }^{i-1} \subseteq L_{C_{11}} \backslash (C_{10} +K_{U,C_{10} \subseteq L_{C_{11} } }^i)= K_{-,C_{10} \subseteq L_{C_{11}},i}\). Finally, assume that the condition \(^{(**)}\) is not true, i.e. \(\exists K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash C_{10}\in K_\mathrm{min},C_{10} \subseteq L_{C_{11}}\), \(\text{ C }_{21}^+ {: 1 \le k < i }\) and \(K_k \subseteq K_i +K_i^{\prime },\) then \(L_k \in {\mathcal G}_{C_{11} } (L) \subseteq {\mathcal G}(L), L_k \subseteq C_{11} , L_k \subseteq C_{10} +K_k \subseteq C_{10} +K_i +K_i^{\prime } \subseteq L^{\prime } \subseteq L \) and \( h(L_k {) = h(L^{\prime }) = L.}\) Hence, \(L_k \in (L^{\prime })\), \(L_k \not \subseteq C_{21}\) (because if \(L_k \subseteq C_{21} ,\) then \(L_k \subseteq C^+\) and \(L_k\in {\mathcal G}_{C^+} (L):\) it contradicts the above condition \(^{(-)}\)!). Moreover, \(L_k \in {\mathcal G}_{C_{11} ,\nsubseteq C_{21} } (L)\), \(K_k\in K_\mathrm{min},C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21}\cap U\), \(K_k \in Minimal(U)\) and \( k<i: \) it is a contradiction of the selection the minimum index i such that \(K_i \in Minimal(U)!\) Thus, the condition \(^{(**)}\) in \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} }^*\) is true and \(\mathrm{L}\prime \in {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} }^*.\)
+ “\({\mathcal F}{\mathcal S}_{\text{ C }_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} ,}^*\subseteq {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} }\)” We will prove that \({\mathcal F}{\mathcal S}_{\text{ C }_{-} \subseteq L_{C^+} ,+L_{C^*} }^*\subseteq \lfloor {L}\rfloor _{C_{-} \subseteq L_{C^+} ,+L_{C^*} } \) and \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} }^*\subseteq \lfloor {L}\rfloor _{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} } \), thus, \({\mathcal F}{\mathcal S}_{\text{ C }_{-} \subseteq L_{C^+},+L_{C^*}}^*\) \(\cap \) \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} }^*=\varnothing \) an \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11}} ,\nsubseteq C_{21} ,}^*\) \({\subseteq {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21}}}\).
. “\({\mathcal F}{\mathcal S}_{\text{ C }_{-} \subseteq L_{C^+} ,+L_{C^*} }^*\subseteq \lfloor {L}\rfloor _{C_{-} \subseteq L_{C^+} ,+L_{C^*} } \)”: \(\forall L^{\prime } \buildrel \mathrm{def} \over \!=\! L^{\prime \prime }\!+L^{\sim } \in \) \({\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+} ,+L_{C^*}}^*{, L^{\prime \prime }\buildrel \mathrm{def} \over = }C\_+K_i +K_i^{\prime } +K_i^{\sim } {\in }{\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+} }^*, L^{\sim } \in 2^{\left[ {\text{ C }_{10} \backslash \nsubseteq C_{21} \subseteq L_{C^*}} \right] *}\), where \(K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash C_{-} = L_i \backslash C_{10} \) (because \(L_i \in {\mathcal G}_{C^+} (L) \subseteq {\mathcal G}(L)\), \(L_i \subseteq C^+ \subseteq \nsubseteq C_{21} )\) and from Lemma 1c, \({L^{\prime \prime }\,\, \in \,\, }{\mathcal F}{\mathcal S}_{C_{-} \subseteq L_{C^+} } . \) On the other hand, since \({C}_{10} = (\ne ) + ({C}_{10} \nsubseteq C_{21} ) \subseteq L^{\prime } = L^{\sim }+L^{\prime \prime } \subseteq L_{C^*} +L_{C^+} = L_{C_{11} } \), \(L_i \subseteq C\_+K_i \subseteq L^{\prime \prime } \subseteq L^{\prime } \subseteq L_{C_{11} } \subseteq L, h(L_{C_{11} } {) = h(L^{\prime }) = h(}L_i ) = L, \varnothing \subset L^{\sim } = L^{\sim }{\backslash }C_{21} \subseteq { L^{\prime }\backslash }C_{21} , \) then \({L^{\prime } \not \subseteq } C_{21} \). Hence, \(L^{\prime }\,\, {\in }\,\, {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} } \) and \({L'\in }L_{C_- L_{C^+} ,+L_{C^*} } .\)
.“\({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} }^*\subseteq \lfloor {L}\rfloor _{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} } \text {''}: \forall {L^{\prime } \buildrel \mathrm{def} \over = }C_{10} +K_i +K_i^{\prime } +K_i^{\sim } {\in }{\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} }^*, K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash C_{10} , L_i \in {\mathcal G}_{C_{11} ,\nsubseteq C_{21} } (L)\) and (\(K_k \not \subseteq K_i +K_i^{\prime } , \forall K_k \in K_{\mathrm{min},C_{10} \subseteq L_{C_{11} } ,\not \subseteq {C}_{21} }^+ {: 1\le k<i)}^{(**)}, \) then from Lemma 1a and c, due to \({\mathcal G}_{C_{11} ,\nsubseteq C_{21} } (L) \subseteq {\mathcal G}(L), \) we obtain \(h(L^{\prime }) = L, L^{\prime }\in {\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } } \) and \({L^{\prime }\backslash }C_{21} \supseteq L_i \backslash C_{21} \supset \varnothing \), \(L_i \subseteq K_i +C_{10} \subseteq L^{\prime }\), thus \({L^{\prime }\in }{\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11} } ,\nsubseteq C_{21} } \). To prove \(L^{\prime } \in \lfloor {L}\rfloor _{C_{10} \subseteq L_{C_{11} } ,G \not \subseteq C_{21} } , \) in addition, we need to test the condition \(^{(-)}: (L_k \not \subseteq L^{\prime }\), \(\forall L_k \,\, \in \,\, {\mathcal G}_{C^+} (L)) \) by contradiction. Assume that \(\exists L_k \,\, \in \,\,{\mathcal G}_{C^+} (L) \subseteq {\mathcal G}_{C_{11} } (L) \subseteq {\mathcal G}(L): L_k \subseteq L^{\prime } = C_{10} +K_i +K_i^{\prime } +K_i^{\sim } \subseteq L, L_k \subseteq L_{C^+} \subseteq \nsubseteq C_{21} , \) then \(L_k \in {\mathcal G}(L^{\prime })\) and \({ k \leqslant M<i }\) (from the above indexing k of \(K_k)\). Using Lemma 1b, with \({U^{\prime \prime } \buildrel \mathrm{def} \over = {\{}}K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash C_{10} , L_i \,\, \in \,\, {\mathcal G}(L^{\prime }), \mathrm{Cond}(L_k){\}}\) and \(K_i \mathop =\limits ^{\mathrm{def}} L_i \backslash C_{10} = L_k \backslash C_- \in U^{\prime \prime } \ne \varnothing , U^{\prime \prime } \subseteq V, \) we can choose the minimum index k such that \(K_k \in Minimal(U^{\prime \prime }) \) and \(K_k \in Minmal(V) = K_{\mathrm{min},C_{-}\subseteq L_{C^+} } \subseteq K_{\mathrm{min},C_{10} \subseteq L_{C_{11} } ,{C}_{21} }^+.\) Since \(K_k \subseteq { L^{\prime }\backslash }C_{10} = K_i +K_i^{\prime } +K_i^{\sim } , K_k \subseteq K_{U,C_{-} \subseteq L_{C^+} }^k = K_{U,C_{10} \subseteq L_{C_{11} } }^k \subseteq K_{U,C_{10} \subseteq L_{C_{11} } }^i \) and \(K_i^{\sim }\,\, \cap \,\,K_{U,C_{10} \subseteq L_{C_{11} } }^i = \varnothing ,\) so \(K_k \,\, \cap \,\, K_i^{\sim } = \varnothing \) and \(K_k \subseteq K_i +K_i^{\prime }:\) It is a contradiction of the selection of the index i in \(^{(**)}!\) Thus, the condition \(^{(-)}\) is true.
Finally, all itemsets in \({\mathcal F}{\mathcal S}_{C_{10} \subseteq L_{C_{11}} ,\nsubseteq C_{21}}\) have a unique representation and are generated completely and distinctly by itemsets in \({\mathcal F}{\mathcal S}_{{C}_{10} \subseteq L_{C_{11}} ,\nsubseteq C_{21} }^*\). \(\square \)
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Footnotes
1
iff is denoted as if and only if.
 
Literature
1.
go back to reference Agrawal, R., Imielinski, T., Swami, N: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOID, pp. 207–216 (1993) Agrawal, R., Imielinski, T., Swami, N: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOID, pp. 207–216 (1993)
2.
go back to reference Anh, T., Hai, D., Tin, T., Bac, L.: Efficient algorithms for mining frequent Itemsets with constraint. In: Proceedings of the Third International Conference on Knowledge and Systems Engineering, pp. 19–25 (2011) Anh, T., Hai, D., Tin, T., Bac, L.: Efficient algorithms for mining frequent Itemsets with constraint. In: Proceedings of the Third International Conference on Knowledge and Systems Engineering, pp. 19–25 (2011)
3.
go back to reference Anh, T., Hai, D., Tin, T., Bac, L.: Mining frequent itemsets with dualistic constraints. In: Proceedings PRICAI 2012, LNAI, vol. 7458, pp. 807–813 (2012) Anh, T., Hai, D., Tin, T., Bac, L.: Mining frequent itemsets with dualistic constraints. In: Proceedings PRICAI 2012, LNAI, vol. 7458, pp. 807–813 (2012)
4.
go back to reference Anh, T., Tin, T., Bac, L.: Simultaneous mining of frequent closed itemsets and their generators: foundation and algorithm. Int. J. Eng. Appl. Artif. Intell. (EAAI) 36, 64–80 (2014)CrossRef Anh, T., Tin, T., Bac, L.: Simultaneous mining of frequent closed itemsets and their generators: foundation and algorithm. Int. J. Eng. Appl. Artif. Intell. (EAAI) 36, 64–80 (2014)CrossRef
5.
go back to reference Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. Proc. Data Min. Knowl. Discov. 4, 217–240 (2000)CrossRef Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. Proc. Data Min. Knowl. Discov. 4, 217–240 (2000)CrossRef
6.
go back to reference Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Examiner: optimized level-wise frequent pattern mining with monotone constraints. In: Proceedings IEEE ICDM’03, pp. 11–18 (2003) Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Examiner: optimized level-wise frequent pattern mining with monotone constraints. In: Proceedings IEEE ICDM’03, pp. 11–18 (2003)
7.
go back to reference Bonchi, F., Lucchese, C.: On closed constrained frequent pattern mining. In: Proceedings IEEE ICDM’04, pp. 35–42 (2004) Bonchi, F., Lucchese, C.: On closed constrained frequent pattern mining. In: Proceedings IEEE ICDM’04, pp. 35–42 (2004)
8.
go back to reference Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Dis. 7, 5–22 (2003)MathSciNetCrossRef Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Dis. 7, 5–22 (2003)MathSciNetCrossRef
9.
go back to reference Boulicaut, J.F., Jeudy, B.: Using constraints during set mining: should we prune or not. In: Actes des Seizime Journes Bases de Donnes Avances BDA’00, Blois, pp. 221–237 (2000) Boulicaut, J.F., Jeudy, B.: Using constraints during set mining: should we prune or not. In: Actes des Seizime Journes Bases de Donnes Avances BDA’00, Blois, pp. 221–237 (2000)
10.
go back to reference Bucila, C., Gehrke, J.E., Kifer, D., White, W.: Dualminer: a dual-pruning algorithm for itemsets with constraints. Data Min. Knowl. Dis. 7, 241–272 (2003)MathSciNetCrossRef Bucila, C., Gehrke, J.E., Kifer, D., White, W.: Dualminer: a dual-pruning algorithm for itemsets with constraints. Data Min. Knowl. Dis. 7, 241–272 (2003)MathSciNetCrossRef
11.
go back to reference Burdick, D., Calimlim, M, Gehrke, J.: MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings IEEE ICDE’01, pp. 443–452 (2001) Burdick, D., Calimlim, M, Gehrke, J.: MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings IEEE ICDE’01, pp. 443–452 (2001)
12.
go back to reference Dong, J., Han, M.: BitTable-FI: an efficient mining frequent itemsets algorithm. Int. J. Knowl. Based Sys. 20, 329–335 (2007)CrossRef Dong, J., Han, M.: BitTable-FI: an efficient mining frequent itemsets algorithm. Int. J. Knowl. Based Sys. 20, 329–335 (2007)CrossRef
14.
go back to reference Grahne, G., Zhu, J.: Fast algorithms for frequent itemset mining using fp-trees. Proc. IEEE Trans. Knowl. Data Eng. 17, 1347–1362 (2005)CrossRef Grahne, G., Zhu, J.: Fast algorithms for frequent itemset mining using fp-trees. Proc. IEEE Trans. Knowl. Data Eng. 17, 1347–1362 (2005)CrossRef
15.
go back to reference Hai, D., Tin, T.: An efficient method for mining association rules based on minimum single constraints. Vietnam J. Comput. Sci. 2, 67–83 (2015)CrossRef Hai, D., Tin, T.: An efficient method for mining association rules based on minimum single constraints. Vietnam J. Comput. Sci. 2, 67–83 (2015)CrossRef
16.
go back to reference Hai, D., Tin, T., Bac, L.: An efficient algorithm for mining frequent itemsets with single constraint. Adv. Comput. Methods Knowl. Eng. Sci. 479, 367–378 (2013)CrossRef Hai, D., Tin, T., Bac, L.: An efficient algorithm for mining frequent itemsets with single constraint. Adv. Comput. Methods Knowl. Eng. Sci. 479, 367–378 (2013)CrossRef
17.
go back to reference Hai, D., Tin, T., Bay, V.: An efficient method for mining frequent itemsets with double constraints. Int. J. Eng. Appl. Artif. Intell. (EAAI) 27, 148–154 (2014) Hai, D., Tin, T., Bay, V.: An efficient method for mining frequent itemsets with double constraints. Int. J. Eng. Appl. Artif. Intell. (EAAI) 27, 148–154 (2014)
18.
go back to reference Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD’00, pp. 1–12 (2000) Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD’00, pp. 1–12 (2000)
19.
go back to reference Huy, P., Tin, T.: An efficient lattice-based approach for generator mining. Int. J. Adv. Comput. Res. 4, 741–751 (2014) Huy, P., Tin, T.: An efficient lattice-based approach for generator mining. Int. J. Adv. Comput. Res. 4, 741–751 (2014)
20.
go back to reference Jeudy, B., Boulicaut, J.F.: Optimization of association rule mining queries. Intell. Data Anal. 6, 341–357 (2002)MATH Jeudy, B., Boulicaut, J.F.: Optimization of association rule mining queries. Intell. Data Anal. 6, 341–357 (2002)MATH
21.
go back to reference Lin, D.I., Kedem, Z.M.: Pincer search: an efficient algorithm for discovering the maximum frequent sets. IEEE Trans Knowl. Data Eng. 14, 553–566 (2002) Lin, D.I., Kedem, Z.M.: Pincer search: an efficient algorithm for discovering the maximum frequent sets. IEEE Trans Knowl. Data Eng. 14, 553–566 (2002)
22.
go back to reference Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Dis. 1, 241–258 (1997)CrossRef Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Dis. 1, 241–258 (1997)CrossRef
23.
go back to reference Mashoria, V., Singh, A.: A survey of mining association rules using constraints. Int. J. Comput. Technol. 7, 620–625 (2013) Mashoria, V., Singh, A.: A survey of mining association rules using constraints. Int. J. Comput. Technol. 7, 620–625 (2013)
24.
go back to reference Nguyen, R.T., Lakshmanan, V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. In: Proceedings of the 1998 ACM-SIG-MOD International Conference on the Management of Data, pp. 13–24 (1998) Nguyen, R.T., Lakshmanan, V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. In: Proceedings of the 1998 ACM-SIG-MOD International Conference on the Management of Data, pp. 13–24 (1998)
25.
go back to reference Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefMATH Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefMATH
26.
go back to reference Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. Intell. Inf. Syst. 24, 29–60 (2005)CrossRefMATH Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. Intell. Inf. Syst. 24, 29–60 (2005)CrossRefMATH
27.
go back to reference Pei, J., Han, J.: Constrained frequent pattern mining: a pattern-growth view. Proc. ACM SIGKDD Explor. 4, 31–39 (2002)CrossRef Pei, J., Han, J.: Constrained frequent pattern mining: a pattern-growth view. Proc. ACM SIGKDD Explor. 4, 31–39 (2002)CrossRef
28.
go back to reference Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent itemsets with convertible constraints. In: Proceedings IEEE ICDE’01, pp. 433–442 (2001) Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent itemsets with convertible constraints. In: Proceedings IEEE ICDE’01, pp. 433–442 (2001)
29.
go back to reference Song, W., Yang, B., Xu, Z.: Index-BitTableFI: an improved algorithm formining frequent itemsets. Int. J. Knowl. Based Syst. 21, 507–513 (2008)CrossRef Song, W., Yang, B., Xu, Z.: Index-BitTableFI: an improved algorithm formining frequent itemsets. Int. J. Knowl. Based Syst. 21, 507–513 (2008)CrossRef
30.
go back to reference Szathmary, L., Valtchev, P., Napoli, A.: Efficient vertical mining of frequent closed itemsets and generators. IDA 2009, 393–404 (2013) Szathmary, L., Valtchev, P., Napoli, A.: Efficient vertical mining of frequent closed itemsets and generators. IDA 2009, 393–404 (2013)
31.
go back to reference Tin, T., Anh, T.: Structure of set of association rules based on concept lattice. In: Advances in Intelligent Information and Database Systems, SCI, vol. 283, pp. 217–227. Springer (2010) Tin, T., Anh, T.: Structure of set of association rules based on concept lattice. In: Advances in Intelligent Information and Database Systems, SCI, vol. 283, pp. 217–227. Springer (2010)
32.
go back to reference Vo, B., Hong, T.P., Le, B.: A lattice-based approach for mining most generalization association rules. Knowl. Based Syst. 45, 20–30 (2013)CrossRef Vo, B., Hong, T.P., Le, B.: A lattice-based approach for mining most generalization association rules. Knowl. Based Syst. 45, 20–30 (2013)CrossRef
33.
go back to reference Vo, B., Hong, T.P., Le, B.: DBV-Miner: a dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst. Appl. 39, 7196–7206 (2012)CrossRef Vo, B., Hong, T.P., Le, B.: DBV-Miner: a dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst. Appl. 39, 7196–7206 (2012)CrossRef
35.
go back to reference Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17, 462–478 (2005)CrossRef Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17, 462–478 (2005)CrossRef
36.
go back to reference Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD’97), pp. 283–296 (1997) Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD’97), pp. 283–296 (1997)
Metadata
Title
Structure of frequent itemsets with extended double constraints
Authors
Truong Chi Tin
Duong Van Hai
Hoang Nguyen Thuy Ngan
Publication date
01-05-2016
Publisher
Springer Berlin Heidelberg
Published in
Vietnam Journal of Computer Science / Issue 2/2016
Print ISSN: 2196-8888
Electronic ISSN: 2196-8896
DOI
https://doi.org/10.1007/s40595-015-0056-7

Other articles of this Issue 2/2016

Vietnam Journal of Computer Science 2/2016 Go to the issue

Premium Partner