Abstract

In mobile computing, machine learning models for natural language processing (NLP) have become one of the most attractive focus areas in research. Association rules among attributes are common knowledge patterns, which can often provide potential and useful information such as mobile users' interests. Actually, almost each attribute is associated with a hierarchy of the domain. Given an relation and any cut on the hierarchy for every attribute a, there is another rough relation , where . This paper will establish the connection between the functional dependencies in R and , propose the method for extracting reducts in , and demonstrate the implementation of proposed method on an application in data mining of association rules. The method for acquiring association rules consists of the following three steps: (1) translating natural texts into relations, by NLP; (2) translating relations into rough ones, by attributes analysis or fuzzy k-means (FKM) clustering; and (3) extracting association rules from concept lattices, by formal concept analysis (FCA). Our experimental results show that the proposed methods, which can be applied directly to regular mobile data such as healthcare data, improved quality, and relevance of rules.

1. Introduction

With the rapid growth in use of mobile devices, more and more mobile generated data is in great need of processing. A large amount of valuable content exists in natural text such as web pages, news feeds, and Twitter/WeChat messages. Natural language processing (NLP) techniques have proven to be useful in dealing with the information overload problem in the mobile environment, for example, news summarization, question answering, and information extraction and retrieval. In these areas, machine learning models for NLP are one of the important research contents [1], in which association rules are common knowledge patterns. Association rules can often provide potential and useful information for mobile clients, contributing to automatically providing personalized recommended services.

In the process of knowledge acquisition from natural texts, we frequently encounter multivalued attributes such as spatial locations and security policies in mobile environments. Actually, for each multivalued attribute, there are different levels of partitions or fuzzy partitions in its domain, which are forbidden in the domain. If we take the different partitions as new values and then obtain attribute values of different granularity. It will be meaningful to discover the attribute dependence of different granularity.

Formal concept analysis (FCA) is an effective tool for knowledge representation, acquisition, and knowledge discovery. FCA focuses on the concept lattice induced by a binary relation between a pair of sets (called objects and attributes, respectively). A node of concept lattices is an objects/attributes pair, called a (formal) concept, consisting of two parts: the extent (objects the concept covers) and intent (attributes describing the concept). The line diagram corresponding to a concept lattice vividly unfolds generalization/specialization relationship among concepts [2]. Recently, concept lattices have already been successfully applied to a wide range of scientific disciplines including knowledge representation [35], knowledge discovery [68], knowledge reduction [911], hybrid relation analysis [12], wireless sensor network [13], and information retrieval [14].

In order to process natural texts, we firstly extract some formal objects and attributes using NLP, secondly translate the texts into relations, and thirdly process the relations using FCA. However, with the growth of the size of the relation, the number of concepts and association rules grows in an exponential manner. Hence, it is necessary to reduce contexts before applying FCA. Several researchers have used matrix approximation techniques such as singular value decomposition (SVD) [15] and nonnegative matrix factorization (NMF) [11] for reducing the size of the context. However, matrix DR methods are based on the expensive eigenvalue computations and hence are known for their high computational complexity. Cluster analysis can well be used as method for data reduction under the notion of concept decomposition (CD). With its lesser computational complexity, FKM [15] is proved to be an alternative method for reducing the dimensionality of context, thereby controlling the size of concept lattices. Actually, there are different levels of partitions or fuzzy partitions [16, 17] in each multivalued attribute domain. Therefore, each attribute is associated with a hierarchy of the domain of the attribute. Based on attribute analysis and FKM, this paper proposes a method of acquiring association rules, which can validly reduce the number of association rules. The method firstly translates attribute values into different abstract hierarchies, which are new attribute values, and then generates association rules with FCA. Experimental results on heart disease data show that the proposed method can improve quality and relevance of association rules, which can be applied directly to regular mobile data such as healthcare data and spatial locations, contributing to providing precise personalized recommended services.

The paper is organized as follows: the next section gives the basic notations of FCA and FKM; the third section gives the hierarchies of domains, discusses the possible connection between the functional dependencies of R and , and gives a sufficient and necessary condition for a functional dependence holding in both and if is below ; the fourth section provides some experiments, in order to verify the feasibility of the method. Finally, Section 5 concludes the paper.

2. Some Basic Notations

2.1. Translation from Natural Texts into Relations

Before using FCA to extract association rules, we need to translate natural texts into a knowledge frame and then merge the related frames into a relation. Our method can be described as follows: firstly, we create some attribute thesaurus. For example, we create an attribute thesaurus that describes people such as age, sex, height, weight, date of birth, hobbies, occupation, etc. Secondly, using a semiautomatic approach, we translate text knowledge into a frame and then merge some related frames into a relation.

2.2. Formal Concept Analysis

This section will introduce some basic notations in FCA [2]. A (formal) context is defined as a triple , where G and M are sets and is a binary relation. For any and , the pair is called a (formal) concept if (1) Y is the set of attributes common to the objects in X and (2) X is the set of objects having all attributes in Y. X and Y are called the (concept) extent and the (concept) intent of the concept, respectively. There are two kinds of special concepts: object concepts and property concepts. Given an object , the object concept of is the smallest concept having in its extent. Correspondingly, given an attribute , the attribute concept of m is the greatest concept having m in its intent.

The line diagram corresponding to a concept lattice can vividly unfold generalization-specialization relationship among concepts. The labeling can be simplified considerably by putting down each object and each attribute only once. Thus, the concept lattices can be described by the line diagrams with reduced labeling. In a line diagram, the name of an object is always attached to the circle representing the smallest concept with in its extent; dually the name of an attribute m is always attached to the circle representing the largest concept with m in its intent. This allows us to read the map I from the diagram: an object has an attribute m if and only if there is an ascending path from the circle labeled by to the circle labeled by m. The extent of a concept consists of all tuples whose labels are below in the diagram and the intent consists of all properties attached to concepts above in the hierarchy. Thus, we can easily extract association rules with 100% confidence from the line diagrams, and the stem base of the attribute implications is nonredundant and complete [2].

As many practical applications involve nonbinary data, multivalued contexts have been introduced in FCA. A multivalued context consists of sets G, M, W and a ternary relation I between G, M, and W for which it holds that and always imply . The elements of G, M, and W are called objects, attributes, and attribute values, respectively. A tuple is interpreted as object that has value for attribute m. Actually, a multivalued context can be regarded as a relation with the column containing the objects being a primary key. In the RDM, a relation is described by a relation schema , where () represent attributes. Each attribute m is associated with a domain , which is the set of possible values for the attribute m. A relation (R, A) is denoted by a set of tuples , where and g is a tuple such that for every , . An equivalent way to view such a tuple is as a map from A to such that [11]. Thus we can further represent a relation R by a triple (U, A, I), where I is a map from to such that, for any , . A relation (U, A, I) can be thought of as representing a table with rows corresponding to U, columns corresponding to A, and table entries at the intersection of rows and columns containing values in domains.

2.3. Fuzzy K-Means Clustering

Fuzzy k-means (FKM) [18] partitions a set of t-dimensional vectors into k-clusters, where represents the jth sample. For and the ith cluster center , there is a membership degree indicating to what degree sample belongs to , . Thus, there is a fuzzy partition matrix . The FKM algorithm is based on minimizing the objective function defined aswhere is the Euclidean distance between to the cluster center . The exponent m in (1) is called fuzzifier parameter and it defines the fuzziness of the clustering. The formulae of and arewhere and .

Based on the above discussion, the FKM algorithm can be summarized as follows.

Step 1. Choose the number of clusters k, degree of fuzziness m, and a threshold value e. Initialize the fuzzy partition matrix U.

Step 2. Compute the cluster centers (), according to (2).

Step 3. Compute the Euclidean distance from the sample to the cluster center according to the Euclidean distance. Then calculate all uji using (2) and update fuzzy partition matrix U.

Step 4. Compute the objective function using (1). Verify whether the function converges or the difference between the two adjacent values of objective function is less than the given threshold value e, then stop. Otherwise repeat from the Step 2.

3. Attribute Analysis and the Reducts in Rough Relations

3.1. Attribute Analysis

For attributes, there are often different criteria for division, for example, Chinese blood pressure categories and Chinese age categories (≥18 years old), which can be found in “Guidelines for the Prevention and Control of Hypertension in China" (2005 Revision).

Chinese blood pressure categories are described as shown in Table 1.

Chinese age categories are described as shown in Table 2.

Generally, attributes can be divided into the three types. Type 1: there is a category; type 2: there is no category, but there are reference criteria for the classification of attributes; type 3: attribute values are never category or classification criteria for reference. In Algorithm 1, the subscript of R in step 3 is not obvious. Therefore, we followed the methods of Lei et al. 2016 [19] and propose a method for extracting association rules, which has the following steps: (1) analyzing attribute types and the structures of domains, (2) generating different hierarchies of attribute values with FKM clustering, (3) translating original relations into rough ones, (4) generating the concept lattice, and (5) extracting the association rules from concept lattices. The method can be described as in Algorithm 1.

Algorithm for extracting associate rules from a relation
Input: a relation
Output: association rules satisfying given the minimum support and the minimum
confidence
Process
Step 1: For any attribute , generating a hierarchy of its domain by traditional
standards or FKM clustering method.
Step 2: Given a cut in , then there is a set
of cuts,
translating the relation into a rough relation , where in if in
.
Step 3: Translating into a binary relation ,
Step 4: Generating the concept lattice from ,
Step 5: Extracting association rules satisfying given the minimum support and the
minimum confidence.
3.2. The Reducts in Rough Relations

In a rough relation in the rough relation databases, each attribute is associated with a equivalence relation on domain . We denote the corresponding partition of on by : for some natural number k.

Definition 1 (a rough relation). A rough relation R is a subset of such that, for every , every , and every , , where is the power set of .

Hence, in a rough relation R, a tuple x takes multivalues at attributes, satisfying certain conditions, where the conditions are given in terms of equivalence relations on domains of attributes. A rough relation R is reduced to be a normal relation if every attribute has as the domain, instead of .

Given an attribute a, there is a hierarchy , where is a set of subsets of and is a binary relation on , such that (1) ; (2) for any , ; (3) () is a tree.

Definition 2. A cut is a subset of such that for any path from the root to a leaf, ; given two cuts and , we say that is above , denoted by , for any path from the root to a leaf, , where and are the unique ones in and , respectively.

Given a cut on , there is an equivalence relation on such that, for any u, , iff there is a unique such that u, . We use to denote the equivalence class of containing u.

Let be a cut vector (), where is a cut on . Given two cut vectors and , we say that is above , denoted by , if for every , . Given a relation R and a cut vector , there is a relation such that for any tuple and attribute , if in R then in . Given a relation R and a cut vector , define a relation such that, for any x, , iff and in and , where and in R.

Proposition 3. Given a relation R and a cut vector , is an equivalence relation on U.

Proposition 4. Given a relation R and two cut vectors and , if then is a refinement of .

Proof. For any x, , assume that . Then, for any , . Because is a refinement of , . That is, for any , , i.e., .

Definition 5. Given a relation R and subsets , if, for any , , if for every then for every , we say that C depends on B in R, denoted by .

Assume that . By the sense of functional dependencies, we define a function such that, for any , if there is a such that for every then , where u is defined as follows: for any , .

By the assumption that , f is a function. We say that f witnesses that . There are two special cut vectors and defined as follows: for any , , . It is clear that for any cut vector , .

Proposition 6. (i) . Therefore, for any , is a reduct of ; (ii) . Therefore, is a reduct of iff B is a reduct of R.

Given a relation R and two cut vectors and , if then, for any subsets , and are not related; i.e., it is possible that and ; or and . By Propositions 3 and 4, we have that there are R, , and such that , , and . Let , such that and . Because, for any B, , . We give the following example to show that there are R, , and such that , , and .

Example 7. Let , , and . Let , which is described in Table hierarchies(a), Let , and . Then, and are represented in Tables 1(b) and 3(c).
We have that and .

Definition 8. Given a relation R and two cut vectors and , assume that . Let such that , and f: witnesses that , where .
We say that is compatible with in R if for any , there is a such that . Let such that , and witness that . We say that is compatible with in R if, for any , there is a vector (: ) such that

Proposition 9. Given a relation R and two cut vectors and , if then for any subsets , , implies iff is compatible with in R.

Proof. () Assume that is compatible with in R, and . Then, for any x, , if, for every , , then, for every , . Assume that for every , , There are two cases.
Case 1. If for every , , then by the assumption that , for every , , so .
Case 2. If there is a such that then by , for every , .
() Assume that is not compatible with in R. Then, there is an such that for any (: ), does not hold. Then, as in Example 7, we can construct R, , and such that , implies .

4. Experimental Analysis

4.1. Data Availability

We use the experimental data for heart disease dataset in UCI database, which can be downloaded from the website http://archive.ics.uci.edu/ml/datasets/Heart+Disease. Concept lattice tools ConExp1.3 and lattice miner can be downloaded from the websites: https://sourceforge.net/projects/conexp/ and https://sourceforge.net/projects/lattice-miner/.

4.2. Data Preparation

The heart disease dataset in UCI database contains 303 objects and 14 available attributes. The main purpose of this paper is to analyze the connections between association rules in an original relation and ones in rough relations. To both verify the validity of the method and reduce the computational complexity, we select 24 objects and 5 attributes, as shown in Table 4. The 5 attributes are age: age in years; trestbps: resting blood pressure (in mm Hg on admission to the hospital); restecg: resting electrocardiographic results: Value 0: normal; Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV); Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria; thalach: maximum heart rate achieved; cp: chest pain type: value 1: typical angina; value 2: atypical angina; value 3: nonanginal pain; value 4: asymptomatic.

Table 4 can be converted to a binary relation , where D is the set of 24 objects, T= age, trestbps, restecg, thalach, chest pain type. The elements of T are abbreviated to a, d, , h and n, respectively. “a” and “d” belong to type 2 in Section 2.3. “” and “n” belong to type 1 in Section 2.3. “h” belongs to type 3 in Section 2.3, which is necessary to use the FKM clustering.

4.3. Experimental Procedure and Rule Acquisition

We made two groups experiments. The first experiment firstly analyzed hierarchies of attribute domains and then generalized to different levels of attribute values, in order to control the size of the concept lattice. The second experiment used fuzzy attribute values to control the size of the concept lattice.

The First Experiment. Using the concept lattice tool ConExp1.3, which can be downloaded from the website https://sourceforge.net/projects/conexp/, we extract association rules from context. The general form is “<N>P=[C]=><N′>C′”, where N is the number of objects satisfying the premise, P is a precondition, C is the confidence of association rules, N′ is the number of objects meeting the premise, and, in conclusion, and C′ is the conclusion. For example, the second association rule in Table 5, i.e., <4> xueya3 =75%=><3> age6, meaning that there are 4 objects satisfying the premise for the second level of blood pressure and three objects among them also meet the elderly early old age, and its confidence is 75%.

(1) The age values and blood pressure values are divided into 8 and 4 categories, respectively. Thus, we obtain a relation, called A8BP4. The concept lattice of A8BP4 is shown in Figure 1, from which we obtain some association rules, as shown in Table 5.

(2) The age values and blood pressure values are divided into 8 and 2 categories, respectively. Thus, we obtain a relation, called A8BP2. The concept lattice of A8BP2 is shown in Figure 2. From Figure 2, we obtain some association rules, as shown in Table 6.

(3) The age values and blood pressure values are divided into 3 and 4 categories, respectively. Thus, we obtain a relation, called A3BP4. The concept lattice of A3BP4 is shown in Figure 3. From Figure 3, we obtain some association rules, as shown in Table 7.

(4) The age values and blood pressure values are divided into 3 and 2 categories, respectively. Thus, we obtain a relation, called A3BP2. The concept lattice of A3BP2 is shown in Figure 4. From Figure 4, we obtain some association rules, as shown in Table 8.

For analyzing the concept lattices, we followed the methods of Lei et al. 2016 [16]. Table 9 describes the number of concepts, edges, height, width, and rules in Figures 14.

From the description above, we have the following results: (1) the higher the value levels are, the smaller the complexity of constructing concept lattice is, and the less the association rules are generated; (2) association rules often vary according to the level of abstraction of attribute values. The finer the granularity of value abstraction is, the more the general rules are, and the more the detailed rules are; and (3) the method can reduce relation, control the size of the concept lattice, and satisfy the purpose of different users. In addition, there is a certain relationship among the association rules, as shown in Table 10.

In Table 10, there are some rules which can be obtained at some value level, but not at the other level. There are some rules of implication when the fuzzy processing to different levels. For example, rule 3 contains rule 1; rule 4 contains rule 2.

The Second Experiment. This experiment is used to reduce the size of the context and control the number of concepts and improve the quality of the concept. By using the FKM algorithm, we can obtain some fuzzy values of attributes and hence obtain some rough relations from an original relation. Generally, the most common is to classify the age values into three categories and the blood pressure values into two categories. Therefore, Table 4 is converted to a binary relation, as shown in Table 11. The attributes are as follows: a1: young people, a2: middle-aged, a3: elderly, d1: hypertensive, d2: normotensive, g0: normal, g1: having ST-T wave abnormality, g2: probable or definite left ventricular hypertrophy by Estes' criteria, h1: maximum heart rate of low, h2: maximum heart rate of medium, and h3: maximum heart rate of high value, and n0-n4 are heart disease severity, 0 indicates normal, and 1 to 4 indicates serious degree, respectively.

By using lattice miner 1.4, which can be downloaded from the following website: https://sourceforge.net/projects/lattice-miner/, we obtain the concept lattice of Table 11, as shown in Figure 5. It contains 113 concepts and 283 edges, and its height is 6. We define the minimum support and the minimum confidence as 50% and 75%, respectively. Thus, we extract more than 900 rules, as shown in Table 12 (partly).

We select rules with larger values of support and confidence, as shown in Table 13.

To visualize these rules, we use column chart to show the support and confidence of the rules. The support and confidence of the rules in Table 13 are illustrated in Figure 6.

We can get a lot of useful information from Table 13. For example, rule 6 illustrates that the elderly have the larger probability of high blood pressure, and rule 3 illustrates that the elderly maximum heart rate is small. If we put a few rules together, we will obtain more valuable knowledge. For example, we can get a3 □ d1 from rules 6 and 12. From rules (2, 4, and 5) and rules (9, 10, and 11), we have the following result: in the current data, the young people and elderly have little difference in heart disease.

5. Conclusions

In order to process natural texts for storing mobile generated data, we firstly extract some formal objects and attributes using NLP, secondly translate the texts into relations, and thirdly process the relations using FCA. In this paper, we mainly discuss the third step. In order to reduce the number of association rules, we propose a method based concept lattice and attribute analysis. This paper establishes the connection between the functional dependencies in an original relation R and corresponding rough relations, proposes the method for extracting reducts in R, and demonstrates the implementation of proposed method on an application in data mining of associative rules. By using rough-values of attributes, we can control the number of concepts and hence improve the quality of the concept. Our experiments show that the method is feasible and effective, which can be applied directly to regular mobile data such as spatial locations and healthcare data. In the mobile computing, associative rules can provide potential and useful information for mobile clients.

In the mobile environment, there are the following interesting problems: (1) how to automatically translate natural texts into relations; (2) how to analyze those relations with columns having null values or more complex information; and (3) how to precisely capture mobile users' interests, in order to automatically provide corresponding recommended services.

Data Availability

We use the experimental data for heart disease dataset in UCI database, which can be downloaded from the website http://archive.ics.uci.edu/ml/datasets/Heart+Disease. Concept lattice tools can be downloaded from the following websites: https://sourceforge.net/projects/lattice-miner/ or https://sourceforge.net/projects/conexp/. If readers are interested in our results, they can contact us by email.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is partly supported by the Natural Science Foundation of China under Grants nos. 61572284 and 61502272 and the Promotive Research Fund for Excellent Young and Middle-Aged Scientists of Shandong Province under Grants nos. BS2014DX004 and BS2014DX005.