nach oben

Vietnam Journal of Computer Science

Erschienen in:

Open Access 15.07.2017 | Regular Paper

Towards an enhanced user’s preferences integration into ranking process using dominance approach

verfasst von: Mohammed Mouhir, Youssef Balouki, Taoufiq Gadi

Erschienen in: Vietnam Journal of Computer Science | Ausgabe 1/2018

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

User preference is very important in orienting data miner, and this is the reason why these user preferences are integrated in the mining process, where they are coupled with Association Rules Mining “ARM” Algorithms to select only Association Rules “ARs” that satisfy the user’s wishes and expectations. Within this framework, several approaches were proposed to overcome some problems which persist with the traditional ARM algorithms mainly dimensionality phenomenon engendered by thresholding and the subjective choice of measures. “MDP$_{\mathrm {REF}}$ Algorithm” is one of these approaches; it prunes, filters to select the relevant ARs, while ”Rank-Sort-MDP$_{\mathrm {REF}}$” sorts, ranks, and stores ARs to complete the MDP$_{\mathrm {REF}}$ algorithm mining operation. Experiment result on real database showed the advantages of MDP$_{\mathrm {REF}}$ algorithm and Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm over the other algorithms.

1 Introduction

Data mining (DM) has been of growing importance since the 1960s, and it is in fact the most important step in the mining process especially of frequent patterns, and ARs which are the subject matter of this paper. The main concern of authors is the challenge of dimensionality phenomena. Several methods have been developed on the basis of threshold fixing or use of different measures other than Support and Confidence, or else on the basis of other criteria [4, 7, 12], the objective is to mine interesting data quantitatively less and qualitatively more than the traditional techniques could do. Having the same objective, other approaches use, dominance or Pareto-dominance to classify rules into two categories: Dominant and Dominated rules.

Then, they chase out the category of the dominated and keep that of the dominant rules. However, it seems reasonable to wonder about this classification into two categories. Is it not possible to have more than two categories? Among the rules of the discarded category, cannot there be equivalent rules? Moreover, is there any guarantee that all the relevant information is kept and no relevant information is lost or that the category of the dominant rules really satisfy the user’s expectations?

This paper proposes MDP$_{\mathrm {REF}}$ Algorithm to handle or process the AR-set in such a way as to determine the subset of the most dominant rules responding to the user request. The remaining set is further examinated. During this examination, each single rule is given a statistical value. It is reasonably expected to have rules sharing the same statistical value called Statistically Equivalent Rules (SER). These SER are kept or discarded according to the user wishes. The third subset is discarded, because it includes dominated rules. The selected association rules via the MDP$_{\mathrm {REF}}$ algorithm are called MDP$_{\mathrm {REF}}$ rules, Most Dominant, and Preferential rules, and it is, therefore, obvious that the said algorithm combines the notion of dominance and preference to mine rules and helps shrink the dimensionality character of results.

This paper includes six sections including the introduction; the second one points out to some works in the literature and gives definition of the used concepts; and the third section introduces the MDP$_{\mathrm {REF}}$ Algorithm and an evaluation experiment. In the fourth and fifth sections, we clarify the reason and our motivation for the suggestion of Rank-Sort-MDP$_{\mathrm {REF}}$ Algorithm and we evaluate its performance according to the accuracy and execution time. The last section concludes the paper and sheds light on the future prospects of our research.

2 Literature review and background

2.1 Literature review

Many computer applications recognize user preferences as essential. Xiaoye Miao [14] considers them in a multidimensional space including language and preference operators, where a set of preference builders are assigned to categorical and numerical domains. Elsewhere are presented statistical models for user preferences, where the frequency of an item depends on the user preference and item accessibility. The user preference is modelable as an algebraic function to approximate the statistical value of the item’s features and the user profile. In [10], preference samples provided by the user are used to establish the order of tuples in the database. These samples are classified into two classes: Superior and Inferior samples; they contain information about relevant and irrelevant samples, respectively. In [7], the authors suggest “ProfMiner algorithm” to discover user profile on the basis of preferences and wishes which are user-provided. ProfMiner algorithm operates on a database containing contextual preference rules. This algorithm determines a threshold ‘k’ to select the contextual preference rules, describing the user profile and the member of these rules depends on ‘k’. However ProfMiner algorithm relies only on two measures: support and confidence which not be sufficient to preserve all the relevant information. Worth noting that the contextual preference rules is determined and extracted by “CPrefMinerAlgorithm”. The latter is a qualitative approach based on Baysian Network preference rules. The main strength of this approach that it produces a compact model of ordered preferences and products accurate result as well. In [24], the authors propose processing contextual logs of mobile device users to find out context-aware preferences.

In the same framework, PrefMinerAlgorithm [13] proposes a new solution to mine user’s preferences for intelligent mobile device notification management. PrefMiner Algorithm has the ability to determine automatically rules that reflect user’s preferences by studying notifications collected in advance in databases. In [22], the authors present an algorithm based on clustering and filtering user preferences, it is adapted to the different habits of users, and it partitions users into three groups according to their different habits and preferences: optimistic, pessimistic, and neutral. This grouping or clustering is based on new similarity measures to solve the shortcoming of previous or classical methods. In addition, some people used to resort to query rewriting or merely query enhancement [2] which consists of integrating into the user query some elements from the user profile. This technique is well used in Information Retrieval domain [8] and this is very recent in database domain.

Between business activity and Datamining lies a relationship of reflexion, i.e., the complexity of datamining is only a reflexion of that of business activity. A huge amount of business-related information is stored in big databases with thousands if not millions of pieces of information. Datamining is the fields, where these databases are exploited to get interesting and valuable information for the benefit of business management. Therefore, different techniques are devised to analyze databases to get this objective. Ranwar [12] is one of these techniques which uses interestingness measures to sort and rank ARs; Acdr [11] as an algorithm which relies on rule-dissimilarity criterion to get rid of redundant rules and sort dissimilar rules. These dissimilar rules are ranked from top to bottom according to their priority and frequency degrees. In [17], the algorithm uses interesting measures and clustering techniques to chase out redundancy and to keep rules which satisfy predefined criteria. Skyrules Algorithm [4] proposes a statistical dominance-based algorithm which distinguishes dominant from non-dominant rules; the algorithm keeps the former and discards the latter with complete reliance on skyline operator. This techniques is only an extended exploitation of the technique proposed in [19] which is based on the notion of dominance to generate dominant patterns and reject dominated ones with regard to skyline operators [20]. In [1], authors are interested in modeling and automating the mining process of relevant ARs. They use Electre Tri as a Multi-Criteria Analysis approach. Recently, the authors focus on combining by Multi-Criteria Decision Analysis and multiobjective evolutionary algorithms to select the most preferred solution from the generated set [5, 6]. In [3], the authors introduce the hash algorithm to push speediness and efficiency of ARM process with the aim of providing a faster mining.

Table 1

AR-set and measures

Rules	Measures confidence	Support	Pearl
a-Rules Set
ar $_{1}$	0.66	0.20	0.02
ar $_{ 2}$	0.66	0.20	0.05
ar $_{ 3}$	0.66	0.20	0.02
ar $_{ 4}$	0.4	0.20	0.05
ar $_{ 5}$	0.4	0.20	0.10
ar $_{ 6}$	0.33	0.20	0.02
ar $_{ 7}$	0.33	0.20	0.01
ar $_{ 8}$	0.33	0.20	0.10
ar $_{ 9}$	0.33	0.10	0.03
ar $_{ 10}$	0.66	0.20	0.05
ar $_{ 11}$	0.16	0.10	0.02
ar $_{ 12}$	0.50	0.10	0.02
ar $_{ 13}$	0.50	0.10	0.00
ar $_{ 14}$	0.50	0.10	0.04
Measures	Formula
(b-Measures)
Confidence ($B\rightarrow H)$	$P(H/B)=\frac{P\left( {BH} \right) }{P\left( B \right) }$
Support ($B\rightarrow H)$	$P\left( {BH} \right) $
Pearl ($B\rightarrow H)$	$P\left( B \right) \times \left\| {P(H/B)-P\left( H \right) } \right\| $
Recal ($B\rightarrow H)$	$P(B/H)=\frac{P\left( {BH} \right) }{P\left( H \right) }$
Zhang ($B\rightarrow H)$	$\frac{P\left( {BH} \right) -P\left( B \right) P\left( H \right) }{\max \left\{ {P\left( {BH} \right) P\left( {\overline{H} } \right) ,\left. {P\left( H \right) P\left( {B\overline{H} } \right) } \right\} } \right. }$
Loevinger ($B\rightarrow H)$	$\frac{P(H/B)-P\left( H \right) }{1-P\left( B \right) }$

The common objective of the techniques described above is to minimize the number of rules to be generated. We reasonably notice that there is a causative relationship between the number of generated rules and the number of criteria or interestingness measures imposed on the databases: the higher the latter, the lesser the former.

Unlike the approaches described above, our contribution presents a method which allows for the user preference as a further restriction of the mining operation so as to optimize the ARs cardinality.

2.2 Background and formalization

2.2.1 Association rules

“Association rules”, as a field of research, is a vital concern within the framework of business intelligence. These rules have continuously been extensively studied using different tools and techniques with the ultimate aim of discovering regularities, harmonies, and correlations between items in a database. An Association Rule usually takes the form of B $\rightarrow $ H, where B and H are different and separate item sets, also B is called a premise and H is called a conclusion [18]. The strength of an association rule is often determined by its support and confidence [9].

Table 1 presents an illustrative example of an input association rules set (noted as: “AR-Set” or the “14-Rule Set”), and the mathematical formulas of some interestingness measures.

2.2.2 Dominance relationship

Definition

(Domination) A point $x\in $ d-dimensional set (X$_{1}$,X$_{2}$,...,X$_{d})$ dominates $x' \in $ d-dimensional set, which is denoted by $x\succ \sim x^\prime {,}$ if for every dimension k = 1, 2,...d we have $x_k \ge x_k,$ [23].

Dominant rules The two rules ar, ar $^{\prime }$ belong to “$\textsc {r}$” which is the set of rules extracted. The dominant rule, according to the set of measures $\textsc {m}$, is defined as the following:

ar dominates ar $^{\prime }$ is noted as ar $\succ $ ar $^{\prime }$ if ar[m] $\ge $ ar $^{\prime }$[m] $\forall m\in $ $\textsc {m}$.

Statistically equivalent rules (SER) The two rules ar, ar $^{\prime }$ belong to “R” which is the set of rules extracted. The Statistically Equivalent Rules, according to the set of measures M, are defined as the following:

If ar $\succ $ ar $^{\prime }$ and ar $^{\prime }$ $\succ $ ar: ar[m] = ar $^{\prime }$[m] $\forall m\in $ M. Then, ar and ar $^{\prime }$ are Statistically Equivalent, and noted as: ar $\approx $ ar $^{\prime }$ [15].

Degree of similarity Let the two rules ar, ar $^{\prime }$ belong to “R” which is the set of rules extracted. The degree of similarity between both rules ar and ar $^{\prime }$ with respect to M is defined as follows:

$$\begin{aligned} \mathrm {Deg}_{\mathrm {Sim}} ({\mathrm {AR,AR}}^\prime )=\frac{\sum _{i=1}^k {\left| {\mathrm {AR}\left[ {m_i } \right] -\mathrm {AR}^\prime \left[ {m_i } \right] } \right| } }{k}. \end{aligned}$$

(1)

We understand from the information supplied in Table 1 that rules “ar $_{6}$”,“ar $_{7}$”,“ar $_{8}$” are statistically equivalent with respect to M = {Support, Confidence, Pearl}. Of the “14-Rule Set”, these SER make up more than 50%. In a case like this, the user may need help to decide which rules to keep and which to discard without losing relevant information, hence, the necessity of the integration of preferences within AR-Mining approaches.

2.2.3 Preference relationship

When you prefer some particular thing, you pick it up to show that it is the one you like in a group of things, for example, a customer is interested in buying a mobile phone that allows him to watch and/or download data (movie, interview...). The shop attendant offers three different mobile phones noted as “MP$_{i }$ with i $\in ${1, 2, 3}”:

MP$_{1}$: possibility to watch films, interviews...
MP$_{2}$: possibility to watch films, record interviews...
MP$_{3}$: possibility to watch and download films, interviews...

so MP3 is necessarily the preference and the choice one of the customer

User preference A preference p on a base relation R $_\mathrm{b}$ is a triple ($\sigma $, S, C), where $\sigma $ is a selection condition involving a set D of items from R $_\mathrm{b}$, S is a function defined on the cartesian product of a set D of items from R $_\mathrm{b}$, such that S: $\prod $ t $_{i}\in $D dom (t $_{i})\rightarrow $[0 1] and C $\in $ [0 1].

The meaning of preference p is that each tuple t$_{i}$ that belongs to the relation (R $_\mathrm{b})$ is associated with a score through a function S with confidence C. A tuple t $_{i}$ is preferred over a tuple t $_{j}$ if t $_{i}$ has a higher score than t $_{j}$.

Some qualitative approaches use the score functions to express preferences by associating a score to a tuple of products. Other algorithms such as CP-net and Rank-Voting are automatic learning techniques that mine user preferences in a shorter time compared to the manual handling of preference model.

Let I be a set of objects in a multidimensional space

D = D $_{1} \quad \otimes \, $D$_{2} \quad \otimes \cdots \otimes \, $ D $_\mathrm{d}$. I is either finite or infinite. A preference relationship is a strict partial order on the multidimensional space D noted by$\diamondsuit $.

Let i $_{1}\diamondsuit $ i $_{2}$ express that the user prefers i $_{1}$ to i $_{2}$.

To illustrate such preference, we have a set of three mobile phones {MP$_{1}$, MP$_{2}$, MP$_{3}$} above mentioned,

The user prefers MP$_{3}$ to MP$_{1} \quad \Rightarrow $ MP$_{3} \quad \diamondsuit \quad $ MP$_{1}$.
The user prefers MP$_{3}$ to MP$_{2} \quad \Rightarrow $ MP$_{3} \quad \diamondsuit \quad $ MP$_{2}$.

Given the problem of dimensionality, whereby the user may face a large number of rules, we suggest to limit and reduce the research space by defining the relevant frequent transactions (or items) among which the user may want to express his preferences.

To make the process fast, we arrange these frequent transactions (or items) in a matrix M $_{(n*n)}$ . This matrix is in fact a visual representation of the AR’s components, the user assigns scores a $_{ij} \quad \in $ [0 1], where this a $_{ij}$ represents a comparison of the two transactions (items) i and j: the user favors transactions i to transaction j, (t$_{i} \quad \diamondsuit $ t $_{j})$. a $_{ij}$ is the coefficient or score of this comparison. When j is the user’s preference, the score is as follows: a $_{ji}$ = 1 − a $_{ij}$ also we note that: a$_{ii}=\emptyset $:

$$\begin{aligned} M_n =\left[ {{\begin{array}{lllll} &{} {a_{1i} }&{} {a_{1j} }&{} \cdots &{} {a_{1n} } \\ {a_{i1} }&{} &{} {a_{ij} }&{} \ldots &{} \vdots \\ \vdots &{} {a_{ji} =1-a_{ij} }&{} &{} \vdots &{} \vdots \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \\ {a_{n1} }&{} \cdots &{} \cdots &{} \cdots &{} \\ \end{array} }} \right] . \end{aligned}$$

We suggest labeling the user preferences from P$_{1}$ to P$_{5}$, in such a way that the interval] 0 1[is subdivided into five equal sub-intervals.

Table 2

Mapping of user’s preferences

Preferences	Bituples
P$_{1}$	]0 0.2[
P$_{2}$	[0.2 0.4[
P$_{3}$	[0.4 0.6[
P$_{4}$	[0.6 0.8[
P$_{5}$	[0.8 1[

Table 2 presents a set of preference representing a mapping of preferences provided by the user about his/her preferences over transactions (t $_{i}$, t $_{j})$.

This mapping avoids the possible complexities of a statistically scoring, while it permits the knowledge of user preferences in regard to items in an Association Rule in such a way as to do without the computation of the average score. To be able to satisfy the major objective which is the mining of not only the dominant or the most dominant but also the most preferable ones responding to the user’s request, we insert the user preference column in the AR-set (Table 1). The integration of user preferences here means that each rule is assigned its convenient preferences. Worth recalling that is with the integration of user preference, Table 1 becomes Table 3 hereafter, where each rule ar $_{i}$ is described by four criteria, three are the statistical interestingness measures (Confidence, Support, and Pearl), and the last one is the preference criterion (the preferences covered by the said rule ar $_{i})$.

Table 3

Rules set with user’s preference

Rules	Measures confidence	Preferences	Support	Pearl
ar $_{1}$	0.66	0.20	0.02	(P$_{1}$, P$_{2})$
ar $_{ 2}$	0.66	0.20	0.05	(P$_{2})$
ar $_{ 3}$	0.66	0.20	0.02	(P$_{2})$
ar $_{ 4}$	0.4	0.20	0.05	(P$_{1}$, P$_{3})$
ar $_{ 5}$	0.4	0.20	0.10	(P$_{1}$, P$_{3})$
ar $_{ 6}$	0.33	0.20	0.02	(P$_{1}$, P$_{3})$
ar $_{ 7}$	0.33	0.20	0.01	(P$_{1}$, P$_{3})$
ar $_{ 8}$	0.33	0.20	0.10	(P$_{1}$, P$_{2})$
ar $_{ 9}$	0.33	0.10	0.03	(P$_{1}$, P$_{3})$
ar $_{ 10}$	0.66	0.20	0.05	(P$_{2}$, P$_{3})$
ar $_{ 11}$	0.16	0.10	0.02	(P$_{1}$, P$_{3})$
ar $_{ 12}$	0.50	0.10	0.02	(P$_{1}$, P$_{3})$
ar $_{ 13}$	0.50	0.10	0.00	(P$_{1}$, P$_{3})$
ar $_{ 14}$	0.50	0.10	0.04	(P$_{1}$, P$_{3})$

3 MDP$_{\mathrm {REF}}$ mechanism illustration

3.1 MDP$_{\mathrm {REF}}$ algorithm

Figure 1 shows a visual representation of the mining process of MDP$_\mathrm{REF}$ rules. Notice that it consists of three main operations the last of which is the concern of MDP$_\mathrm{REF}$ rules algorithm.

MDP$_{\mathrm \mathrm{REF}}$ is short for most dominant and preferential rules; it is threshold-free and it does not discard any measure, so more objective and contributes to solve the dimensionality more than other approaches without losing information [15].

3.2 MDP$_{\mathrm {REF}}$ algorithm tasks and its pseudocode

Create an imaginary referential rule (ar $^{T})$ which has the maximum measurements to dominate all the rules.

Calculate the degree of similarity of all the rules one by one with the referential rule (ar $^{T})$ ($Deg_{Sim} (AR,AR^{T}))$.

Determine the dominant real rule ar* having the lowest degree of similarity with ar $^{T}$.

Remove all the rules dominated by ar*.

(5)

Resort to the user’s preferences to determine which one to keep if two rules are statistically equivalent.

Keep both, if the decision maker is indifferent. Otherwise, we keep the one satisfying most preference.

Drop all rules where the user’s preferences are already covered by those previously handled.

Keep Rules covering the user’s preference other than those already covered by those previously selected.

For an algorithm to be effective, it has to be iterative without consuming much time. Iterativeness is necessary for accurate and reliable results. MDP$_{\mathrm {REF}}$ Algorithm processes rules iteratively and integrates a multithreading system for a concurrent processing which makes it faster and time-saving. The more tasks it performs, the less time it needs to finish the processing, and therefore, being iterative does not necessarily mean being time consuming. In our case, the fourth task is basically important, since it results in determining three groups of rules:

Dominant rules are stored.
Non-dominant rules are chased out.
Statistically Equivalent Rules—SER.

MDP$_{\mathrm {REF}}$ Algorithm focuses on SER and processes all SER-Rules, to mine those which cover the user’s preferences provided in advance by the user: tasks 6, 7, and 8.

The seventh task allows discarding preferentially redundant and/or overlapping rules. The performance of task 7 implies the performance of task 8. MDP$_{\mathrm {REF}}$ Algorithm tasks do not include learning user preferences; these were provided prior to processing—the fact which means that these do not have any influence on the processing time of MDP$_{\mathrm {REF}}$ Algorithm.

Table 3 shows a set of ARs on which MDP$_{\mathrm {REF}}$ Algorithm is applied and the obtained results are these two rules: ar $_{10}$ and ar $_{05}$ the most dominant and preferential rules (MDP$_{\mathrm {REF}}$ rules).

To evaluate experimentally the MDP$_{\mathrm {REF}}$ Algorithm’s efficiency, the MDP$_{\mathrm {REF}}$ Algorithm is further applied on a data set of mobile phones proposed to the customers, which includes a wide range of mobile brands launched in the Moroccan national market.

Table 4

Characteristics of AR-set (mobile phone)

Data set	#Items	#AR	#Transaction	Avg. MDP$_{\mathrm {REF}}$
Mobile phone	128	25000	326	14268

Table 5

Sample of mobile phone brands

ID	Brand	Design	Connectivity	Screen	Battery autonomy (h)	Camera (Mp)	Price (Euro)
I$_{1}$	Nokia	Monobloc	w-u-b$^{3}$	Tactile	6–8	2–5	>300
I$_{2}$	Samsung	Monobloc	u-b	Tactile	3–5	2–5	100–200
I$_{3}$	Samsung	Monobloc	w-u-b	Tactile	9–11	2–5	200–300
I$_{4}$	Sony Ericson	Monobloc	w-u-b	Tactile	9–11	10–14	>300
I$_{5}$	Sony Ericson	Monobloc	u-b	Tactile	3–5	6–9	>300
I$_{6}$	Samsung	Coulissant	u-b	Non tactile	3–5	2–5	<100
I$_{7}$	Samsung	Coulissant	b	Non tactile	3–5	2–5	100–200
I$_{8}$	LG	Monobloc	u-b	Non tactile	3–5	2–5	<100
I$_{9}$	LG	Coulissant	u-b	Non tactile	3–5	2–5	200–300
I$_{10}$	Nokia	Coulissant	u-b	Non tactile	3–5	2–5	100–200
I$_{11}$	Sony Ericson	Monobloc	w-u-b	Non tactile	9–11	2–5	100–200

$^{3}$ w-u-b wifi, USB, Bluetooth

Table 6

MDP$_{\mathrm {REF}}$ vs all rules and other ARM algorithm

Database/algorithm		Measures
		C, P, R	C, L, Zh	C, P, Zh, L	C, P, R, Zh, L$^2$
Mobile phone (10.00)	CprefMiner	20,000	18,500	16,000	20,750
	ProfMiner	18,250	16,250	13,500	19,000
	TB-R	22,500	20,750	18,750	21,750
	A-R	25,000	25,000	25,000	25,000
	SkyRule	11,250	13,750	12,500	10,500
	MDP$_{\mathrm {REF}}$	12,500	15,400	16,775	12,375

$^{2}$ C confidence, P pearl, R recal, Zh zhang, L loevinger

The characteristics of these mobile phones and there attributes are specified in Tables 4 and 5. The AR-set involved contains 25,000 rules corresponding to a set of some distinct mobile phones, described by a set of 326 transactions, representing a set of 128 distinct items. These 25,000 rules (which may not be big data) processed by MDP$_{\mathrm {REF}}$ Algorithm and the result is the generation of 14,268 rules representing only $\approx $57% of the original number.

As the other algorithms are based on thresholding, we are obliged to accept their optimal threshold only for reasons of comparison.

Table 6 describes the behavior of MDP$_{\mathrm {REF}}$ algorithm in comparison with others concerning the number of generated association rules. We notice the following:

In comparison with All Rules, TB-R, CprefMiner, and ProfMiner algorithms, MDP$_{\mathrm {REF}}$ algorithm steadily generates less rules and it minimizes the number of selected association rules into ($\approx $27%) as an average of reduction rate that varying between 12% as a lower bounded and 43% as an upper bounded, regardless of the nature and cardinality of measures; that is, the number of selected rules by MDP$_{\mathrm {REF}}$ is significantly reduced, from 25,000 rules to 12,500 for the measure sets {C, P, R}, from 25,000 to 15,400 for measure sets {C, L, Zh}, we notice that these latter sets have the same size which is three but the different size of MDP$_{REF }$Rules generated. from 25,000 rules to 16,775 for measure sets {C, P, Zh, L}, and from 25,000 to 12,375 for a set for measure sets {C, P, R, Zh, L}.

When compared MDP$_{\mathrm {REF}}$ algorithm to SkyRule algorithm, the first algorithm has a different behavior as it generates more rules for all interestingness measures. This particular behavior originates from the fact that MDP$_{\mathrm {REF}}$ algorithm recovers an average of 19% of association rules from those groundlessly rejected by SkyRule. Therefore, it keeps some SER that may cover a particular user’s preferences and having valuable information. Therefore, MDP$_{\mathrm {REF}}$ algorithm bypasses the losing information problem that suffer SkyRule algorithm, and it selects the AR responding to the requests and preferences expressed by the users. According to these last reasons, groundlessly discarded and loss of information problem, the MDP$_{\mathrm {REF}}$ is considered better than SkyRule algorithm.

The choice of measure sets—$\textsc {m}$ sets, not necessarily their size, affects the number of MDP$_{\mathrm {REF}}$ generated rules.

Table 6 allows us to predict that with a confidence level of 95%, MDP$_{\mathrm {REF}}$ will select an average of 14,268 ± (4275) rules.

4 Rank-sort-MDP$_{\mathrm {REF}}$ algorithm

4.1 Purpose

Given that the user’s preferences are provided prior to processing as well as a number of rules he prefers to get back. This number is noted “u”. On the basis of MDP$_{\mathrm {REF}}$ performance, our algorithm “Rank-Sort-MDP$_{\mathrm {REF}}$” processes the set of ARs (AR-set) and partitions it into subsets ($E_i$)i $\in $ {0,...n}, to sort them and to return their ranks. Then, it checks for the ARs taking into consideration the priority of MDP$_{\mathrm {REF}}$ rules, and stores the ARs in $E_i$, and these Association Rules members of Ei are intra-ranked from left to right. The original “AR-Set” is the sum total or union of subsets ($E_i$) which can be mathematically expressed as

$$\begin{aligned} \mathrm{AR}_{-}\mathrm{Set}=\mathop \oplus \limits _{i=1}^n E_i \,\quad \mathrm {or}\quad \mathrm{AR}_{-}\mathrm{Set}=\mathop \cup \limits _{i=1}^n E_i \end{aligned}$$

(2)

where “u” represents the size of rules that the user wishes to get back. This size can be expressed with the following algebraic formula:

$$\begin{aligned} u=\left| {\mathop \oplus \limits _{i=1}^j E_i } \right| _{j\le n} =\sum _{i=1}^j {\left| {E_i } \right| } _{j\le n}. \end{aligned}$$

(3)

The “u-rules” set is the union of E$_{i}$ subsets, such that $i\le n$, and E$_{i}$ is prior to E$_{j}$ when $i\le j$, the idea is that each time Rank-Sort-MDP$_{\mathrm {REF}}$ iterates, MDP$_{\mathrm {REF}}$ also iterates and the outcome is a subset E$_{i}$.

4.2 Pseudocode of “Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm”

Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm was coded OOP language programming and all tests were performed on a computer with the following specification: 1.73 GHz Intel processor with Windows 7 operating system and 2 GB as memory Capacity.

The Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm processes by stage, for instance:

At stage 1 (k = 0 + 1) (Line 6), the Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm call for MDP$_{\mathrm {REF}}$ algorithm to select the first subset association rules (E$_{1)}$ (Line 7) from the all Association Rules belonging to R $\ne $ Ø (Line 5) in our case, see Table 3, where R is the “14-Rules set”. The AR$_{10}$, AR$_{05}$ are the two first association rules selected at this stage and ranged in the E$_{1}$ that is considered as a first subset: {AR$_{10}$, AR$_{05}$}$\in $ E$_{1}$.

At stage 2 (k = 1 + 1), the Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm call for MDP$_{\mathrm {REF}}$ algorithm to select the second subset of association rules (E$_{2}$ = {AR$_{02}$, AR$_{08}$}) the E$_{2}$ succeeds the E$_{1}$, it is less good according to their members and ranked after the E$_{1}$.

Recursively, at each stage k + 1, the proposed algorithm call for the MDP$_{\mathrm {REF}}$ algorithm to select the new association rules succeeding those selected and ranked at the stage k. Then, the Association Rules set goes back before the one generated at the (k + 1)th stage. Consequently, all predecessor association rules are better classified and sorted than any association rules which belong to the successors set. Furthermore, the MDP$_{\mathrm {REF}}$ rules ranked at the same stage in moving order of their degree similarity and the covered user preferences. Finally, the Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm can be considered as a sound algorithm.

When the Association Rules set R becomes empty and as the Rank-Sort-MDP$_{\mathrm {REF}}$ terminates processing all association rules which are ranked and classified. This means that the Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm is complete.

We finally come to the conclusion that the Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm is sound and complete.

To show the performance of Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm, we applied it on the AR-set (in our case “14-Rule Set”), as shown in Table 3. It processed the said set and the result is the division into 7 subsets {E$_{1}$... E$_{7}$}, as summarized in Table 7.

The subset E$_{1}$ which contains two rules ar $_{10, }$ ar $_{05}$ is generated in the first iteration of Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm. Worth noticing is that ar $_{10, }$ ar $_{05}$ are themselves the rules generated by MDP$_{\mathrm {REF}}$ algorithm. Therefore, we reasonably conclude that the first generated subset E$_{1}$ by Rank-Sort-MDP$_{\mathrm {REF}}$ is also the result generated by MDP$_{\mathrm {REF}}$ applied on the entire AR-set (14-Rule Set).

E$_{2}$ is the Rank-Sort-MDP$_{\mathrm {REF}}$ extracted subset in the second iteration which concerns the database “AR-set$\backslash $E$_{1}$”. The member rules {ar $_{02, }$ ar $_{08}$} belonging to E$_{2}$ are the most dominant and preferential rules in “AR-set$\backslash $E$_{1}$”.

At the end of the seventh and final iterations of Rank-Sort-MDP$_{\mathrm {REF}}$, we get E$_{7}$.

The result we get after the seven iterations is seven subsets in which rules are ranked from top to bottom. Therefore, all the 14 rules are ordered.

By now, we are ready to respond to the user’s order. Whatever “u” may be seeing Table 8.

Table 7

Output of Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm

Set of rules	Rules	Preferences	Level
E$_{1}$	ar $_{ 10}$, ar $_{05}$	(P$_{1}$, P$_{2}$, P$_{3})$	1
E$_{2}$	ar $_{02}$, ar $_{08}$	(P$_{1}$, P$_{2})$	2
E$_{3}$	ar $_{01}$, ar $_{04}$	(P$_{1}$, P$_{2}$, P$_{3})$	3
E$_{4}$	ar $_{03}$, ar $_{09}$	(P$_{2}$, P$_{3}$, P$_{3})$	4
E$_{5}$	ar $_{13}$, ar $_{06}$	(P$_{1}$, P$_{3})$	5
E$_{6}$	ar $_{14}$, ar $_{07}$, ar $_{12}$	(P$_{1}$, P$_{3})$	6
E$_{7}$	ar $_{11}$	(P$_{1}$, P$_{3})$	7

Table 8

Order response mechanism

User’s order “u”	Response (subset/rules)
2	E$_{1}$
3	E$_{1}\, \oplus \, E_{2}{\backslash }\{ \textsc {ar}_{08}$}
4	E$_{1} \, \oplus \, $E$_{2}$
5	E$_{1} \, \oplus \, E_{2} \, \oplus \, E_{3}\backslash ${ ar $_{04}$}
7	E$_{1}\oplus $E$_{2}\, \oplus \, $ E$_{3} \oplus $E$_{4}\backslash ${ ar $_{09}$}

5 Performance of Rank-Sort-MDP$_{\mathrm {REF}}$

This section proposes to compare the proposed algorithm with related algorithms having the same goals: ranking and sorting the association rules.

The first related algorithm is Rank Rules that suggested by [4]’s authors to rank the association rules basing on the Skyline operator and founding on SkyRules algorithm’s performances which is called at each iteration to determine the undominated association rules. The second one is the Rule Rank-CBA [21] which is evolved by Genetic Network Programming, where the directed graphs are used as genes population to compute the fitness function allowing to rank and to sort the members of thr data set. The third one is the Hybrid-RuleRank [16] that couples the Genetic Algorithms and a probabilistic and meta-heuristic method searching to optimize and approximate global solution, this meta-heuristic method known as: Simulated Annealing (SA). Worth recalling that RuleRank-CBA combines arithmetically the historical interesting measure, support, and confidence to create a set of functions to optimize its fitness function and achieve the target objectives. Like RuleRank-CBA, the Hybrid-RuleRank algorithm sorts and ranks the association rules according to the support and confidence measures.

In addition, the execution time and accuracy indicators are utilized as tools to measure the Rank-Sort-MDP$_{\mathrm {REF}}$’s performances and to accomplish this comparison.

5.2 Execution time of Rank-Sort-MDP$_{\mathrm {REF}}$

To analyze, to study, and to interpret the execution time’s behavior of the proposed algorithm, as the input data size increases. We have arbitrarily taken from the AR-set (the mobile-phone database) some samples the different size on which we applied Rank-Sort-MDP$_{\mathrm {REF}}$. Both Figs. 2 and 3 illustrate the evolution of runtime (the execution time indicator) when changing the size of the sample and when varying also the measure cardinality.

From Fig. 2, we notice that the execution time indicator is linearly increasing with respect to the sample sizes whatever the measure cardinality; all indicators are increasing regardless of the measure cardinality. Likewise, the trend of the execution time indicator is lower, because Rank-Sort-MDP$_{\mathrm {REF}}$ calls MDP$_\mathrm{REF}$ which is coded in threads approach; that is, in the event that we take each particular indicator alone, we notice that the trend is lower. This postulate may be extend to a Big-Database (more than 25000 association rules), since the Rank-Sort-MDP$_\mathrm{REF}$ is an algorithm permitting to sort and to rank all given association rules with the straightforward time complexity basing on MDP$_{\mathrm {REF}}$ algorithm approach. Rank-Sort-MDP$_{\mathrm {REF}}$ relies on the output of MDP$_{\mathrm {REF}}$ algorithm which is successfully tested, and evaluated, and applied on different databases bigger than actual one. The MDP$_{\mathrm {REF}}$ algorithm’s results were the best performances that transmitted to Rank-Sort-MDP$_{\mathrm {REF}}$, in terms of accuracy; precision and execution time [for further information, see our previous work in the 15th reference]. While Fig. 3 depicts that when varying the measure (cardinality or nature), the average execution time indicator may decrease and/or increase. This movement depends on the size of MDP$_{\mathrm {REF}}$ rules set selected in the first iteration, since these selected rules are correlated with the employed and utilized measures.

Table 9

Simulation results compared to the previous algorithms

Database	Statistical indicators	Rank-sort-MDP$_{\mathrm {REF}}$	Rank rules	RuleRank-CBA [21]	Hybrid-RuleRank [16]
Mobile phone	Accuracy (%)	87.99 ± 0.33	87.98 ± 0.33	88.02 ± 0.29	89.11 ± 0.39
	Time (s)	1.97 ± 0.19	1.67 ± 0.97	50.59 ± 7.10	50.60 ± 7.10
Iris	Accuracy (%)	94.03 ± 1.97	94.00	94.13 ± 0.87	95.22 ± 4.50
	Time (s)	0.84 ± 0.024	1.02 ± 0.03	0.41 ± 0.01	0.41 ± 0.47
Flare	Accuracy (%)	82.26 ± 0.38	81.09 ± 0.32	84.21 ± 0.20	84.30 ± 0.62
	Time (s)	24.75 ± 1.5	3.12 ± 0.63	75.22 ± 3.55	75.30 ± 4.02
Average	Accuracy (%)	88.09 ± 0.28	87.69 ± 0.21	88.78 ± 0.45	89.54 ± 1.83
	Time (s)	9.18 ± 1.44	3.93 ± 0.54	42.07 ± 3.55	42.10 ± 3.86

Table 10

Characteristics of data sets

Database	#Items	#AR	#Transaction	Avg. MDP$_{\mathrm {REF}}$
Mobile phone	128	25,000	326	14,268
Flare	39	57,476	1389	2550
Iris	119	440	8124	259

We remark that the average execution time indicator decreases until a given measure cardinality (may be an optimal measure cardinality). Then, it increases. Hence, we intend to study the property of interesting measures belonging to measures sets.

5.3 Indicators tools: accuracy and execution time

Table 9 summarizes some statistical indicators: accuracy and the execution time, concerning the three different databases (Mobile phone, Iris, Flare) on which the four related approaches are applied. In this subsection, we compare, in terms of the execution time and accuracy indicator, the proposed approach known as: “Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm with “Rank Rules” [4] and “RuleRank-CBA” [21] and the Hybrid-RuleRank [16]. To evaluate the proposed approach’s performance and efficiency, we execute the aforementioned algorithms on other databases having different sizes and attributes (Mobile phone, Iris, Flare) which their characteristics are described in Table 10. To validate the obtained results and conduct a reliable comparison, the k-fold cross-validation technique is used, since it processes repeatedly each data set k-times. For getting accuracy, the compared algorithms are tested multiple times by running the k-fold cross validation technique on each data set, worth noting that the data set elements are rearranged and re-stratified before each round, and then, we keep the computed average accuracy of the multiple tests for each data set, (in our case: k = 10).

On the one hand, Rank-Sort-MDP$_{\mathrm {REF}}$ outperforms Rank Rules in terms of accuracy (88.09 vs 87.69%). However, in terms of execution time, the proposed algorithm is much longer than Rank Rules (9.18 vs 3.93 s), because the Rank Rules algorithm does not process reasonably the statistically equivalent rules—SER, the Rank Rules algorithm may rank two SER in different levels. Hence, it is probably having the wrong ranking of an SER-set.

On the other hand, Rank-Sort-MDP$_{\mathrm {REF}}$ is faster than RuleRank-CBA algorithm (9.18 vs 42.07 s) and it is, also, faster than the Hybrid-RuleRank (9.18 vs 42.10 s), since there are many redundant and repeated functions estimated and created in RuleRank-CBA. Meanwhile, in terms of accuracy, Rank-Sort-MDP$_{\mathrm {REF}}$ and RuleRank-CBA have approximately the same performance (88.09 vs 88.78 %). Finally, the Rank-Sort-MDP$_{\mathrm {REF}}$’s performances compared to those of the Hybrid-RuleRank algorithm show that the last algorithm “Hybrid-RuleRank” surpasses the proposed one “Rank-Sort-MDP$_{\mathrm {REF}}$” in terms of accuracy (89.54 vs 88.09).

Table 10 summarizes the characteristics of the data sets: database is the database appellation, # Items is the item count in the data set, # AR is the association rules count, and # Transaction is the transaction count in the data set and Avg. MDP$_{\mathrm {REF}}$ correspond to the average count of the association rules selected by the MDP$_{\mathrm {REF}}$ algorithm from each data set.

6 Conclusion and perspective

The Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm is introduced to supply the user with the requested rules via ranking and sorting all association rules of the original AR-set which is divided into subsets.

The proposed approach aims to rank and sort association rules and respond to a user’s request, basing on MDP$_{\mathrm {REF}}$ algorithm that claims minimizing dimensionality without losing any relevant information or ignoring the user’s preferences. The experimental evaluation of our approach shows satisfactory results concerning the target objectives. Further directions include: (1) the semantic analysis and the association rules components which we plan to deepen (2) will intend to study the property of interestingness measures belonging to measures sets.

Perfection never comes at once, and we promise to make significant endeavors to improve our techniques to achieve a higher quality analysis of data. We are also inspired and motivated to improve techniques to make our algorithm “Rank-Sort-MDP$_{\mathrm {REF}}$ algorithm” faster and faster so as to be able to work on big databases, the processing of which necessitates less time-consuming techniques.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Vorheriger Artikel On the possibility of correct concept learning in description logics

Nächster Artikel A novel hybrid algorithm for generalized traveling salesman problems in different environments

Ait-Mlouk, A., Gharnati, F., Agouti, T.: Multi-agent-based modeling for extracting relevant association rules using a multi-criteria analysis approach. Vietnam J. Comput. Sci. 3(4), 235–245 (2016). doi:10.1007/s40595-016-0070-4 CrossRef

Arvanitis, A., Koutrika, G.: PrefDB: supporting preferences as first-class citizens in relational databases. IEEE Trans. Knowl. Data Eng. 26(6), 1430–1446 (2014). doi:10.1109/TKDE.2013.28 CrossRef

Asha, P., Srinivasan, S.: Analysing the associations between infected genes using data mining techniques. Int. J. Data Mining Bioinf. 15(3), 250–271 (2016). doi:10.1504/IJDMB.2016.0770 CrossRef

Bouker, S., Saidi, R., Ben Yahia, S., Mephu Nguifo, E.: Mining undominated association rules through interestingness measures. Int J Artif. Intell. Tools. 23(4), 1460011 (2014). doi:10.1142/S0218213014600112 CrossRef

Branke, J., Corrente, S., Greco, S., Słowiński, R., Zielniewicz, P.: Using Choquet integral as preference model in interactive evolutionary multiobjective optimization. Eur. J. Oper. Res. 250(3), 884–901 (2016). doi:10.1016/j.ejor.2015.10.027 MathSciNetCrossRefMATH

Branke, J.: MCDA and multiobjective evolutionary algorithms. Multiple Criteria Decision Analysis, pp. 977–1008 (2016). doi:10.1007/978-1-4939-3094-4_23

De Amo, S., Saliou Diallo, M., Talibouya Diop, C., Giacometti, A., Li, D., Soulet, A.: Contextual preference mining for user profile construction. Inf. Syst. 49, 182–199 (2015). doi:10.1016/j.is.2014.11.009 CrossRef

Gheorghiu, R., Labrinidis, A., Chrysanthis, P.: Unifying Qualitative and Quantitative Database Preferences to Enhance Query Personalization. Proceedings of the Second International Workshop on Databases and the Web - ExploreDB’15, pp. 6–8 (2015). doi:10.1145/2795218.2795223

Gupta, G.: Introduction to data mining with case studies. PHI Learning Pvt, Ltd (2014)

10.

Jiang, B., Pei, J., L, X., Cheung, D., Han, J.: Mining preferences from superior and inferior examples. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp. 390–398 (2008)

11.

Kongchai, P., Kerdprasop, N., Kerdprasop, K.: Dissimilar Rule Mining and Ranking Technique for Associative Classification. Proceedings of the International MultiConference of Engineers and Computer Scientists 2013, IMECS 2013. 1 (2013)

12.

Mallik, S., Mukhopadhyay, A., Maulik, U.: RANWAR: Rank-based weighted association rule mining from gene expression and methylation data. IEEE Trans. NanoBiosci. 14(1), 59–66 (2015)

13.

Mehrotra, A., Hendley, R., Musolesi, M.: PrefMiner. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing-UbiComp ’16, pp. 1223–1234 (2016). doi:10.1145/2971648.2971747

14.

Miao, X., Gao, Y., Chen, G., Cui, H., Guo, C., Pan, W.: Si2p: a restaurant recommendation system using preference queries over incomplete information. Proc. VLDB Endow. 9(13), 1509–1512 (2016). doi:10.14778/3007263.3007296

15.

Mouhir, M., Gadi, T., Balouki, Y., El Far, M.: A new way to select the valuable association rules. 2015 7th International Conference on Knowledge and Smart Technology (KST), pp. 81–86 (2015). doi:10.1109/KST.2015.7051464

16.

Najeeb, M. M., El Sheikh, A., Nababteh, M.: A new rule ranking model for associative classification using a hybrid artificial intelligence technique. In: Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on IEEE, pp. 231–235 (2011)

17.

Rolfsnes, T., Moonen, L., Di Alesio, S., Behjati, R., Binkley, D.: Improving change recommendation using aggregated association rules. Proceedings of the 13th International Workshop on Mining Software Repositories—MSR ’16, pp. 73–84 (2016). doi:10.1145/2901739.2901756

18.

Shmueli, G., Peter Bruce, C., Nitin, Patel R.: Data mining for business analytics: concepts, techniques, and applications with XLMiner. Wiley, Hoboken (2016)

19.

Soulet, A., Raïssi, C., Plantevit, M., Cremilleux, B.: Mining Dominant Patterns in the Sky. 2011 IEEE 11th International Conference on Data Mining, pp. 655–664 (2011). doi:10.1109/ICDM.2011.100

20.

Ugarte, W., Boizumault, P., Loudni, S., Crémilleux, B., Lepailleur, A.: Mining (Soft-) skypatterns using constraint programming. Advances in Knowledge Discovery and Management, pp. 105–136 (2015). doi:10.1007/978-3-319-23751-0_6

21.

Yang, G., Mabu, S. M., Shimada, K., Gong, Y., Hirasawa, K.: Ranking association rules for classification based on genetic network programming. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation ACM, pp. 1917–1918 (2009)

22.

Zhang, J., Lin, Y., Lin, M., Liu, J.: An effective collaborative filtering algorithm based on user preference clustering. Appl. Intell. 45(2), 230–240 (2016). doi:10.1007/s10489-015-0756-9 CrossRef

23.

Zhang, J., Jiang, X., Ku, W.S., Qin, X.: Efficient parallel skyline evaluation using mapreduce. IEEE Trans. Parallel Distrib. Syst. 27(7), 1996–2009 (2016)CrossRef

24.

Zhu, H., Chen, E., Xiong, H., Yu, K., Cao, H., Tian, J.: Mining mobile user preferences for personalized context-aware recommendation. ACM Trans. Intell. Syst. Technol. 5(4), 1–27 (2014). doi:10.1145/253251 CrossRef

Titel: Towards an enhanced user’s preferences integration into ranking process using dominance approach
verfasst von: Mohammed Mouhir
Youssef Balouki
Taoufiq Gadi
Publikationsdatum: 15.07.2017
Verlag: Springer Berlin Heidelberg
Erschienen in: Vietnam Journal of Computer Science / Ausgabe 1/2018
Print ISSN: 2196-8888
Elektronische ISSN: 2196-8896
DOI: https://doi.org/10.1007/s40595-017-0098-0

Set of rules	Rules	Preferences	Level
E\(_{1}\)	ar \(_{ 10}\), ar \(_{05}\)	(P\(_{1}\), P\(_{2}\), P\(_{3})\)	1
E\(_{2}\)	ar \(_{02}\), ar \(_{08}\)	(P\(_{1}\), P\(_{2})\)	2
E\(_{3}\)	ar \(_{01}\), ar \(_{04}\)	(P\(_{1}\), P\(_{2}\), P\(_{3})\)	3
E\(_{4}\)	ar \(_{03}\), ar \(_{09}\)	(P\(_{2}\), P\(_{3}\), P\(_{3})\)	4
E\(_{5}\)	ar \(_{13}\), ar \(_{06}\)	(P\(_{1}\), P\(_{3})\)	5
E\(_{6}\)	ar \(_{14}\), ar \(_{07}\), ar \(_{12}\)	(P\(_{1}\), P\(_{3})\)	6
E\(_{7}\)	ar \(_{11}\)	(P\(_{1}\), P\(_{3})\)	7

User’s order “u”	Response (subset/rules)
2	E\(_{1}\)
3	E\(_{1}\, \oplus \, E_{2}{\backslash }\{ \textsc {ar}_{08}\)}
4	E\(_{1} \, \oplus \, \)E\(_{2}\)
5	E\(_{1} \, \oplus \, E_{2} \, \oplus \, E_{3}\backslash \){ ar \(_{04}\)}
7	E\(_{1}\oplus \)E\(_{2}\, \oplus \, \) E\(_{3} \oplus \)E\(_{4}\backslash \){ ar \(_{09}\)}

Springer Professional

Towards an enhanced user’s preferences integration into ranking process using dominance approach

Abstract

1 Introduction

2 Literature review and background

2.1 Literature review

2.2 Background and formalization

2.2.1 Association rules

2.2.2 Dominance relationship

2.2.3 Preference relationship

3 MDP\(_{\mathrm {REF}}\) mechanism illustration

3.1 MDP\(_{\mathrm {REF}}\) algorithm

3.2 MDP\(_{\mathrm {REF}}\) algorithm tasks and its pseudocode

4 Rank-sort-MDP\(_{\mathrm {REF}}\) algorithm

4.1 Purpose

4.2 Pseudocode of “Rank-Sort-MDP\(_{\mathrm {REF}}\) algorithm”

5 Performance of Rank-Sort-MDP\(_{\mathrm {REF}}\)

5.2 Execution time of Rank-Sort-MDP\(_{\mathrm {REF}}\)

5.3 Indicators tools: accuracy and execution time

6 Conclusion and perspective

Premium Partner

Rules	Measures confidence	Support	Pearl
a-Rules Set
ar \(_{1}\)	0.66	0.20	0.02
ar \(_{ 2}\)	0.66	0.20	0.05
ar \(_{ 3}\)	0.66	0.20	0.02
ar \(_{ 4}\)	0.4	0.20	0.05
ar \(_{ 5}\)	0.4	0.20	0.10
ar \(_{ 6}\)	0.33	0.20	0.02
ar \(_{ 7}\)	0.33	0.20	0.01
ar \(_{ 8}\)	0.33	0.20	0.10
ar \(_{ 9}\)	0.33	0.10	0.03
ar \(_{ 10}\)	0.66	0.20	0.05
ar \(_{ 11}\)	0.16	0.10	0.02
ar \(_{ 12}\)	0.50	0.10	0.02
ar \(_{ 13}\)	0.50	0.10	0.00
ar \(_{ 14}\)	0.50	0.10	0.04
Measures	Formula
(b-Measures)
Confidence (\(B\rightarrow H)\)	\(P(H/B)=\frac{P\left( {BH} \right) }{P\left( B \right) }\)
Support (\(B\rightarrow H)\)	\(P\left( {BH} \right) \)
Pearl (\(B\rightarrow H)\)	\(P\left( B \right) \times \left\| {P(H/B)-P\left( H \right) } \right\| \)
Recal (\(B\rightarrow H)\)	\(P(B/H)=\frac{P\left( {BH} \right) }{P\left( H \right) }\)
Zhang (\(B\rightarrow H)\)	\(\frac{P\left( {BH} \right) -P\left( B \right) P\left( H \right) }{\max \left\{ {P\left( {BH} \right) P\left( {\overline{H} } \right) ,\left. {P\left( H \right) P\left( {B\overline{H} } \right) } \right\} } \right. }\)
Loevinger (\(B\rightarrow H)\)	\(\frac{P(H/B)-P\left( H \right) }{1-P\left( B \right) }\)

Preferences	Bituples
P\(_{1}\)	]0 0.2[
P\(_{2}\)	[0.2 0.4[
P\(_{3}\)	[0.4 0.6[
P\(_{4}\)	[0.6 0.8[
P\(_{5}\)	[0.8 1[

ID	Brand	Design	Connectivity	Screen	Battery autonomy (h)	Camera (Mp)	Price (Euro)
I\(_{1}\)	Nokia	Monobloc	w-u-b\(^{3}\)	Tactile	6–8	2–5	>300
I\(_{2}\)	Samsung	Monobloc	u-b	Tactile	3–5	2–5	100–200
I\(_{3}\)	Samsung	Monobloc	w-u-b	Tactile	9–11	2–5	200–300
I\(_{4}\)	Sony Ericson	Monobloc	w-u-b	Tactile	9–11	10–14	>300
I\(_{5}\)	Sony Ericson	Monobloc	u-b	Tactile	3–5	6–9	>300
I\(_{6}\)	Samsung	Coulissant	u-b	Non tactile	3–5	2–5	<100
I\(_{7}\)	Samsung	Coulissant	b	Non tactile	3–5	2–5	100–200
I\(_{8}\)	LG	Monobloc	u-b	Non tactile	3–5	2–5	<100
I\(_{9}\)	LG	Coulissant	u-b	Non tactile	3–5	2–5	200–300
I\(_{10}\)	Nokia	Coulissant	u-b	Non tactile	3–5	2–5	100–200
I\(_{11}\)	Sony Ericson	Monobloc	w-u-b	Non tactile	9–11	2–5	100–200

Springer Professional

Abstract

1 Introduction

2 Literature review and background

2.1 Literature review

2.2 Background and formalization

2.2.1 Association rules

2.2.2 Dominance relationship

2.2.3 Preference relationship

3 MDP\(_{\mathrm {REF}}\) mechanism illustration

3.1 MDP\(_{\mathrm {REF}}\) algorithm

3.2 MDP\(_{\mathrm {REF}}\) algorithm tasks and its pseudocode

4 Rank-sort-MDP\(_{\mathrm {REF}}\) algorithm

4.1 Purpose

4.2 Pseudocode of “Rank-Sort-MDP\(_{\mathrm {REF}}\) algorithm”

5 Performance of Rank-Sort-MDP\(_{\mathrm {REF}}\)

5.1 The previous related algorithms

5.2 Execution time of Rank-Sort-MDP\(_{\mathrm {REF}}\)

5.3 Indicators tools: accuracy and execution time

6 Conclusion and perspective

Weitere Artikel der Ausgabe 1/2018

Empirical methods for computing phrasal and sentential semantics in Vietnamese

A novel hybrid algorithm for generalized traveling salesman problems in different environments

Channel quantization-based physical-layer network coding for two-way relay STBC system

On the possibility of correct concept learning in description logics

A bat-inspired algorithm for prioritizing test cases

From the Editor

Premium Partner