Skip to main content
main-content
Top

Hint

Swipe to navigate through the articles of this issue

Published in: International Journal of Data Science and Analytics 4/2022

20-07-2022 | Regular Paper

PARAS\(^{\mathrm{c}}\): a parameter space-driven approach for complete association rule mining

Authors: Xika Lin, Abhishek Mukherji, Elke A. Rundensteiner, Matthew O. Ward

Published in: International Journal of Data Science and Analytics | Issue 4/2022

Login to get access
share
SHARE

Abstract

To enable efficient association rule mining, existing techniques prestore intermediate results as itemsets. However, the actual rule generation is still performed at query-time. The response time thus tends to remain unacceptably long for interactive mining, especially when rule redundancy resolution is required. Further, the widespread restriction to only support positive rules can miss important insights and lead to misleading results. For this reason, the discovery of both negative and positive rules, which can be extremely revealing, is important. Unfortunately, the generation of negative rules slows down the mining process even further. To tackle these shortcomings, we introduce the parameter space model, called \({\textbf {PARAS}}^{{\textbf {c}}}\). \({\textbf {PARAS}}^{{\textbf {c}}}\) enables efficient mining of complete rules, i.e., both positive and negative rules, by precomputing and compactly maintaining the final rulesets. The \({\textbf {PARAS}}^{{\textbf {c}}}\) model is based on the stable region abstractions that form the coarse granularity ruleset space for managing complete rules. Based on new insights into the redundancy relationships among complete rules, \({\textbf {PARAS}}^{{\textbf {c}}}\) establishes a surprisingly compact representation of complex redundancy relationships while enabling efficient redundancy resolution for complete rules at query-time. \({\textbf {PARAS}}^{{\textbf {c}}}\) supports novel classes of exploratory queries that can be answered near real time. Our experimental evaluation demonstrates that \({\textbf {PARAS}}^{{\textbf {c}}}\) achieves 2–5 orders of magnitude improvement over existing techniques in rule mining.
Appendix
Available only for authorised users
Footnotes
1
Hindi name for a legendary philosopher’s stone said to be capable of turning base metals (lead, for example) into gold.
 
2
Multiple rules may map to the simple/strict dominating location that collectively represents them.
 
3
The detail of rules are listed in Fig. 9(a) and thus omitted in Fig. 8 for simplicity. The strict redundancy relationships across rules in these sets are also omitted for simplicity.
 
4
Rule \({\mathcal {R}}_{31}\) and its simple dominating rules \(\{{\mathcal {R}}_{31}^{\gg _{sim}}\}\) are also \({\mathcal {R}}_{16}\)’s strict dominating rules. That is, \(\{{\mathcal {R}}_{16}^{\gg _{str}}\} = \{{\mathcal {R}}_{31}\} \cup \{{\mathcal {R}}_{31}^{\gg _{sim}}\}\).
 
5
The number of strict dominating location(s) is up to 2 by our experiment.
 
6
Some dominating rules may not be valid because their lift values do not satisfy minimum lift.
 
7
An i-antecedent rule only has i-antecedent items out of n.
 
8
Some simple dominating rules may not be valid because their lift values do not satisfy minimum lift.
 
Literature
1.
go back to reference Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Databases, pp. 487–499 (1994) Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Databases, pp. 487–499 (1994)
2.
go back to reference Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proceedings of the 21th International Conference on Very Large Databases, pp. 420–431 (1995) Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proceedings of the 21th International Conference on Very Large Databases, pp. 420–431 (1995)
3.
go back to reference Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Webdocs: a real-life huge transactional dataset. In: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (2004) Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Webdocs: a real-life huge transactional dataset. In: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (2004)
4.
go back to reference Li, J., Le, T.D., Liu, L., Liu, J., Jin, Z., Sun, B., Ma, S.: From observational studies to causal rule mining. ACM Trans. Intell. Syst. Technol. 7(2), 14–11427 (2016) CrossRef Li, J., Le, T.D., Liu, L., Liu, J., Jin, Z., Sun, B., Ma, S.: From observational studies to causal rule mining. ACM Trans. Intell. Syst. Technol. 7(2), 14–11427 (2016) CrossRef
5.
go back to reference Simon, G.J., Caraballo, P.J., Therneau, T.M., Cha, S.S., Castro, M.R., Li, P.W.: Extending association rule summarization techniques to assess risk of diabetes mellitus. IEEE Trans. Knowl. Data Eng. 27(1), 130–141 (2015) CrossRef Simon, G.J., Caraballo, P.J., Therneau, T.M., Cha, S.S., Castro, M.R., Li, P.W.: Extending association rule summarization techniques to assess risk of diabetes mellitus. IEEE Trans. Knowl. Data Eng. 27(1), 130–141 (2015) CrossRef
6.
go back to reference Peng, M., Sundararajan, V., Williamson, T., Minty, E.P., Smith, T.C., Doktorchik, C.T.A., Quan, H.: Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data. J. Biomed. Inform. 79, 41–47 (2018) CrossRef Peng, M., Sundararajan, V., Williamson, T., Minty, E.P., Smith, T.C., Doktorchik, C.T.A., Quan, H.: Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data. J. Biomed. Inform. 79, 41–47 (2018) CrossRef
7.
go back to reference Abar, O., Charnigo, R.J., Rayapati, A., Kavuluru, R.: On interestingness measures for mining statistically significant and novel clinical associations from emrs. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. BCB ’16, pp. 587–594 (2016) Abar, O., Charnigo, R.J., Rayapati, A., Kavuluru, R.: On interestingness measures for mining statistically significant and novel clinical associations from emrs. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. BCB ’16, pp. 587–594 (2016)
8.
go back to reference Wong, P.-Y., Chan, T.-M., Wong, M.-H., Leung, K.-S.: Predicting approximate protein-dna binding cores using association rule mining. In: IEEE 28th International Conference on Data Engineering, pp. 965–976 (2012) Wong, P.-Y., Chan, T.-M., Wong, M.-H., Leung, K.-S.: Predicting approximate protein-dna binding cores using association rule mining. In: IEEE 28th International Conference on Data Engineering, pp. 965–976 (2012)
9.
go back to reference Aggarwal, C.C., Yu, P.S.: A new approach to online generation of association rules. IEEE Trans. Knowl. Data Eng. 13(4), 527–540 (2001) CrossRef Aggarwal, C.C., Yu, P.S.: A new approach to online generation of association rules. IEEE Trans. Knowl. Data Eng. 13(4), 527–540 (2001) CrossRef
10.
go back to reference Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.1–12 (2000) Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.1–12 (2000)
11.
go back to reference Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283–286 (1997) Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283–286 (1997)
12.
go back to reference Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 265–276 (1997) Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 265–276 (1997)
13.
go back to reference Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 18–24 (1998) Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 18–24 (1998)
14.
go back to reference Wu, X., Zhang, C., Zhang, S.: Efficient mining of both positive and negative association rules. ACM Trans. Inf. Syst. 22(3), 381–405 (2004) CrossRef Wu, X., Zhang, C., Zhang, S.: Efficient mining of both positive and negative association rules. ACM Trans. Inf. Syst. 22(3), 381–405 (2004) CrossRef
15.
go back to reference Dong, X., Hao, F., Zhao, L., Xu, T.: An efficient method for pruning redundant negative and positive association rules. Neurocomputing 393, 245–258 (2020) CrossRef Dong, X., Hao, F., Zhao, L., Xu, T.: An efficient method for pruning redundant negative and positive association rules. Neurocomputing 393, 245–258 (2020) CrossRef
16.
go back to reference Hämäläinen, W.: Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowl. Inf. Syst. 32(2), 383–414 (2012) Hämäläinen, W.: Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowl. Inf. Syst. 32(2), 383–414 (2012)
17.
go back to reference Hämäläinen, W., Webb, G.I.: Specious rules: an efficient and effective unifying method for removing misleading and uninformative patterns in association rule mining, In: Proceedings of the SIAM International Conference on Data Mining, pp. 309–317 (2017) Hämäläinen, W., Webb, G.I.: Specious rules: an efficient and effective unifying method for removing misleading and uninformative patterns in association rule mining, In: Proceedings of the SIAM International Conference on Data Mining, pp. 309–317 (2017)
18.
go back to reference Kubat, M., Hafez, A., Raghavan, V.V., Lekkala, J.R., Chen, W.K.: Itemset trees for targeted association querying. IEEE Trans. Knowl. Data Eng. 15(6), 1522–1534 (2003) CrossRef Kubat, M., Hafez, A., Raghavan, V.V., Lekkala, J.R., Chen, W.K.: Itemset trees for targeted association querying. IEEE Trans. Knowl. Data Eng. 15(6), 1522–1534 (2003) CrossRef
19.
go back to reference Kaya, M., Alhajj, R.: Online mining of fuzzy multidimensional weighted association rules. Appl. Intell. 29, 13–34 (2008) CrossRef Kaya, M., Alhajj, R.: Online mining of fuzzy multidimensional weighted association rules. Appl. Intell. 29, 13–34 (2008) CrossRef
20.
go back to reference Dong, X., Niu, Z., Shi, X., Zhang, X., Zhu, D.: Mining both positive and negative association rules from frequent and infrequent itemsets. Adv. Data Min. Appl. 4632, 122–133 (2007) Dong, X., Niu, Z., Shi, X., Zhang, X., Zhu, D.: Mining both positive and negative association rules from frequent and infrequent itemsets. Adv. Data Min. Appl. 4632, 122–133 (2007)
21.
go back to reference Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete Mathematics. Addison-Wesley, Boston (1994) MATH Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete Mathematics. Addison-Wesley, Boston (1994) MATH
22.
go back to reference Lin, X., Mukherji, A., Rundensteiner, E.A., Ruiz, C., Ward, M.O.: Paras: Parameter space framework for online association mining. In: Proceedings of the VLDB Endowment, pp. 193–204 (2013) Lin, X., Mukherji, A., Rundensteiner, E.A., Ruiz, C., Ward, M.O.: Paras: Parameter space framework for online association mining. In: Proceedings of the VLDB Endowment, pp. 193–204 (2013)
23.
go back to reference Lin, X., Mukherji, A., Rundensteiner, E.A., Ward, M.O.: SPIRE: supporting parameter-driven interactive rule mining and exploration. PVLDB 7(13), 1653–1656 (2014) Lin, X., Mukherji, A., Rundensteiner, E.A., Ward, M.O.: SPIRE: supporting parameter-driven interactive rule mining and exploration. PVLDB 7(13), 1653–1656 (2014)
24.
go back to reference Mukherji, A., Lin, X., Whitehouse, J., Botaish, C.R., Rundensteiner, E.A., Ward, M.O.: Fire: Interactive visual support for parameter space-driven rule mining. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 2447–2452 (2013) Mukherji, A., Lin, X., Whitehouse, J., Botaish, C.R., Rundensteiner, E.A., Ward, M.O.: Fire: Interactive visual support for parameter space-driven rule mining. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 2447–2452 (2013)
25.
go back to reference Mukherji, A., Lin, X., Toto, E., Botaish, C.R., Whitehouse, J., Rundensteiner, E.A., Ward, M.O.: Fire: a two-level interactive visualization for deep exploration of association rules. Int. J. Data Sci. Anal. 7, 201–226 (2019) Mukherji, A., Lin, X., Toto, E., Botaish, C.R., Whitehouse, J., Rundensteiner, E.A., Ward, M.O.: Fire: a two-level interactive visualization for deep exploration of association rules. Int. J. Data Sci. Anal. 7, 201–226 (2019)
26.
go back to reference Cornelis, C., Yan, P., Zhang, X., Chen, G.: Mining positive and negative association rules from large databases. In: 2006 IEEE Conference on Cybernetics and Intelligent Systems, pp. 1–6 (2006) Cornelis, C., Yan, P., Zhang, X., Chen, G.: Mining positive and negative association rules from large databases. In: 2006 IEEE Conference on Cybernetics and Intelligent Systems, pp. 1–6 (2006)
29.
go back to reference Savasere, A., Omiecinski, E., Navathe, S.: Mining for strong negative associations in a large database of customer transactions. In: ICDE, pp. 494–502 (1998) Savasere, A., Omiecinski, E., Navathe, S.: Mining for strong negative associations in a large database of customer transactions. In: ICDE, pp. 494–502 (1998)
30.
go back to reference Wang, S., Cao, L.: Inferring implicit rules by learning explicit and hidden item dependency. IEEE Trans. Syst. Man Cybern. Syst. 50(3), 935–946 (2020) CrossRef Wang, S., Cao, L.: Inferring implicit rules by learning explicit and hidden item dependency. IEEE Trans. Syst. Man Cybern. Syst. 50(3), 935–946 (2020) CrossRef
31.
go back to reference Cao, L.: Combined mining: analyzing object and pattern relations for discovering and constructing complex yet actionable patterns. WIREs Data Min. Knowl. Discov. 3(2), 140–155 (2013) CrossRef Cao, L.: Combined mining: analyzing object and pattern relations for discovering and constructing complex yet actionable patterns. WIREs Data Min. Knowl. Discov. 3(2), 140–155 (2013) CrossRef
32.
go back to reference Wu, T., Chen, Y., Han, J.: Re-examination of interestingness measures in pattern mining: a unified framework. Data Min. Knowl. Discov. 21(3), 371–397 (2010) MathSciNetCrossRef Wu, T., Chen, Y., Han, J.: Re-examination of interestingness measures in pattern mining: a unified framework. Data Min. Knowl. Discov. 21(3), 371–397 (2010) MathSciNetCrossRef
33.
go back to reference Sahar, S.: Interestingness preprocessing. In: Proceedings of IEEE International Conference on Data Mining, pp. 489–496 (2001) Sahar, S.: Interestingness preprocessing. In: Proceedings of IEEE International Conference on Data Mining, pp. 489–496 (2001)
34.
go back to reference Li, J., Wang, C., Cao, L., Yu, P.S.: Efficient selection of globally optimal rules on large imbalanced data based on rule coverage relationship analysis. In: Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 216–224 (2013) Li, J., Wang, C., Cao, L., Yu, P.S.: Efficient selection of globally optimal rules on large imbalanced data based on rule coverage relationship analysis. In: Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 216–224 (2013)
35.
go back to reference Duan, S., Thummala, V., Babu, S.: Tuning database configuration parameters with ituned. PVLDB 2(1), 1246–1257 (2009) Duan, S., Thummala, V., Babu, S.: Tuning database configuration parameters with ituned. PVLDB 2(1), 1246–1257 (2009)
36.
go back to reference Chaudhuri, S., Lee, H., Narasayya, V.R.: Variance aware optimization of parameterized queries. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp. 531–542 (2010) Chaudhuri, S., Lee, H., Narasayya, V.R.: Variance aware optimization of parameterized queries. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp. 531–542 (2010)
Metadata
Title
PARAS: a parameter space-driven approach for complete association rule mining
Authors
Xika Lin
Abhishek Mukherji
Elke A. Rundensteiner
Matthew O. Ward
Publication date
20-07-2022
Publisher
Springer International Publishing
Published in
International Journal of Data Science and Analytics / Issue 4/2022
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-022-00330-3

Other articles of this Issue 4/2022

International Journal of Data Science and Analytics 4/2022 Go to the issue

Premium Partner