Skip to main content
Top
Published in: Journal of Intelligent Information Systems 2/2019

06-04-2019

Rank correlated subgroup discovery

Authors: Mohamed Ali Hammal, Hélène Mathian, Luc Merchez, Marc Plantevit, Céline Robardet

Published in: Journal of Intelligent Information Systems | Issue 2/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Subgroup discovery (SD) and exceptional model mining (EMM), its generalization to handle more complex targets, are two mature fields at the frontier of data mining and machine learning. More precisely, EMM aims to find coherent subgroups of a dataset where multiple targets interact in an unusual way. Correlation model classes have already been defined to discover interesting subgroups when dealing with two numerical targets. However, in this supervised setting, the two numerical targets are fixed before the subgroup search. To make unsupervised exploration possible, we propose to search for arbitrary subsets of numerical targets whose correlation is exceptional for an automatically found subgroup. This involves solving two challenges: the definition of a model that evaluates the interest of a subgroup for a subset of numerical targets and the definition of a pattern language that enumerates both subgroups and targets and lends itself to effective research strategies. We propose an integrated solution to both challenges. We introduce the problem of rank-correlated subgroup discovery with an arbitrary subset of numerical targets. A rank-correlated subgroup is identified by both conditions on descriptive attributes, whether numeric or nominal, and a pattern on numeric attributes that captures (positive or negative) rank correlations based on a generalization of the Kendall’s τ. We define a new branch-and-bound algorithm that exploits some pruning properties based on two upper-bounds and a closure property. An empirical study on several datasets demonstrates the efficiency and the effectiveness of the algorithm.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Atzmüller, M., & Puppe, F. (2006). Sd-map - a fast algorithm for exhaustive subgroup discovery. In ECMLPKDD (pp. 6–17). Atzmüller, M., & Puppe, F. (2006). Sd-map - a fast algorithm for exhaustive subgroup discovery. In ECMLPKDD (pp. 6–17).
go back to reference Aumann, Y., & Lindell, Y. (1999). A statistical theory for quantitative association rules. In KDD. Citeseer, (Vol. 99 pp. 261–270). Aumann, Y., & Lindell, Y. (1999). A statistical theory for quantitative association rules. In KDD. Citeseer, (Vol. 99 pp. 261–270).
go back to reference Bay, S.D., & Pazzani, M.J. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.MATHCrossRef Bay, S.D., & Pazzani, M.J. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.MATHCrossRef
go back to reference Belfodil, A., Cazalens, S., Lamarre, P., Plantevit, M. (2017). Flash points: discovering exceptional pairwise behaviors in vote or rating data. In ECML PKDD (pp. 442–458).CrossRef Belfodil, A., Cazalens, S., Lamarre, P., Plantevit, M. (2017). Flash points: discovering exceptional pairwise behaviors in vote or rating data. In ECML PKDD (pp. 442–458).CrossRef
go back to reference Belfodil, A., Kuznetsov, S.O., Robardet, C., Kaytoue, M. (2017). Mining convex polygon patterns with formal concept analysis. In Proceedings of the 26th international joint conference on artificial intelligence, IJCAI. Melbourne, Australia August 19-25, 2017 (pp. 1425–1432). Belfodil, A., Kuznetsov, S.O., Robardet, C., Kaytoue, M. (2017). Mining convex polygon patterns with formal concept analysis. In Proceedings of the 26th international joint conference on artificial intelligence, IJCAI. Melbourne, Australia August 19-25, 2017 (pp. 1425–1432).
go back to reference Bendimerad, A.A., Plantevit, M., Robardet, C. (2016). Unsupervised exceptional attributed sub-graph mining in urban data. In ICDM (pp. 21–30). Bendimerad, A.A., Plantevit, M., Robardet, C. (2016). Unsupervised exceptional attributed sub-graph mining in urban data. In ICDM (pp. 21–30).
go back to reference Bendimerad, A.A., Cazabet, R., Plantevit, M., Robardet, C. (2017). Contextual subgraph discovery with mobility models. In Complex networks (pp. 477–489). Bendimerad, A.A., Cazabet, R., Plantevit, M., Robardet, C. (2017). Contextual subgraph discovery with mobility models. In Complex networks (pp. 477–489).
go back to reference Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446.MathSciNetMATHCrossRef Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446.MathSciNetMATHCrossRef
go back to reference Błaszczyński, J., Słowiński, R., Szelkag, M. (2011). Sequential covering rule induction algorithm for variable consistency rough set approaches. Information Sciences, 181(5), 987–1002.MathSciNetCrossRef Błaszczyński, J., Słowiński, R., Szelkag, M. (2011). Sequential covering rule induction algorithm for variable consistency rough set approaches. Information Sciences, 181(5), 987–1002.MathSciNetCrossRef
go back to reference Boley, M., Lucchese, C., Paurat, D., Gärtner, T. (2011). Direct local pattern sampling by efficient two-step random procedures. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21-24, 2011 (pp. 582–590). Boley, M., Lucchese, C., Paurat, D., Gärtner, T. (2011). Direct local pattern sampling by efficient two-step random procedures. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21-24, 2011 (pp. 582–590).
go back to reference Boley, M., Moens, S., Gärtner, T. (2012). Linear space direct pattern sampling using coupling from the past. In The 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, Beijing, China, August 12-16, 2012 (pp. 69–77). Boley, M., Moens, S., Gärtner, T. (2012). Linear space direct pattern sampling using coupling from the past. In The 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, Beijing, China, August 12-16, 2012 (pp. 69–77).
go back to reference Bosc, G., Golebiowski, J., Bensafi, M., Robardet, C., Plantevit, M., Boulicaut, J., Kaytoue, M. (2016). Local subgroup discovery for eliciting and understanding new structure-odor relationships. In DS (pp. 19–34).CrossRef Bosc, G., Golebiowski, J., Bensafi, M., Robardet, C., Plantevit, M., Boulicaut, J., Kaytoue, M. (2016). Local subgroup discovery for eliciting and understanding new structure-odor relationships. In DS (pp. 19–34).CrossRef
go back to reference Calders, T., Goethals, B., Jaroszewicz, S. (2006). Mining rank-correlated sets of numerical attributes. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 96–105): ACM. Calders, T., Goethals, B., Jaroszewicz, S. (2006). Mining rank-correlated sets of numerical attributes. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 96–105): ACM.
go back to reference Cerf, L., Besson, J., Robardet, C., Boulicaut, J. (2009). Closed patterns meet n-ary relations. TKDD, 3(1), 3:1–3:36.CrossRef Cerf, L., Besson, J., Robardet, C., Boulicaut, J. (2009). Closed patterns meet n-ary relations. TKDD, 3(1), 3:1–3:36.CrossRef
go back to reference Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J. (2008). ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Statistical Analysis and Data Mining, 1(2), 67–84.MathSciNetCrossRef Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J. (2008). ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Statistical Analysis and Data Mining, 1(2), 67–84.MathSciNetCrossRef
go back to reference de Sá, C.R., Duivesteijn, W., Soares, C., Knobbe, A.J. (2016). Exceptional preferences mining (pp. 3–18). de Sá, C.R., Duivesteijn, W., Soares, C., Knobbe, A.J. (2016). Exceptional preferences mining (pp. 3–18).
go back to reference Do, T.D.T., Laurent, A., Termier, A. (2010). PGLCM: efficient parallel mining of closed frequent gradual itemsets (pp. 138–147). Do, T.D.T., Laurent, A., Termier, A. (2010). PGLCM: efficient parallel mining of closed frequent gradual itemsets (pp. 138–147).
go back to reference Do, T.D.T., Termier, A., Laurent, A., Negrevergne, B., Omidvar-Tehrani, B., Amer-Yahia, S. (2015). Pglcm: efficient parallel mining of closed frequent gradual itemsets. Knowledge and Information Systems, 43(3), 497–527.CrossRef Do, T.D.T., Termier, A., Laurent, A., Negrevergne, B., Omidvar-Tehrani, B., Amer-Yahia, S. (2015). Pglcm: efficient parallel mining of closed frequent gradual itemsets. Knowledge and Information Systems, 43(3), 497–527.CrossRef
go back to reference Dong, G., & Li, J. (1999). Efficient mining of emerging patterns Discovering trends and differences. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 43–52): ACM. Dong, G., & Li, J. (1999). Efficient mining of emerging patterns Discovering trends and differences. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 43–52): ACM.
go back to reference Downar, L., & Duivesteijn, W. (2017). Exceptionally monotone models—the rank correlation model class for exceptional model mining. Knowledge and Information Systems, 51(2), 369–394.CrossRef Downar, L., & Duivesteijn, W. (2017). Exceptionally monotone models—the rank correlation model class for exceptional model mining. Knowledge and Information Systems, 51(2), 369–394.CrossRef
go back to reference Dua, D., & Graff, C. (2017). UCI machine learning repository. Dua, D., & Graff, C. (2017). UCI machine learning repository.
go back to reference Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M. (2010). Subgroup discovery meets bayesian networks – an exceptional model mining approach. In ICDM (pp. 158–167). Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M. (2010). Subgroup discovery meets bayesian networks – an exceptional model mining approach. In ICDM (pp. 158–167).
go back to reference Fan, Y.-N., Tseng, T.-L.B., Chern, C.-C., Huang, C.-C. (2009). Rule induction based on an incremental rough set. Expert Systems with Applications, 36(9), 11439–11450.CrossRef Fan, Y.-N., Tseng, T.-L.B., Chern, C.-C., Huang, C.-C. (2009). Rule induction based on an incremental rough set. Expert Systems with Applications, 36(9), 11439–11450.CrossRef
go back to reference Grosskreutz, H., & Rüping, S. (2009). On subgroup discovery in numerical domains. Data Mining and Knowledge Discovery, 19(2), 210–226.MathSciNetCrossRef Grosskreutz, H., & Rüping, S. (2009). On subgroup discovery in numerical domains. Data Mining and Knowledge Discovery, 19(2), 210–226.MathSciNetCrossRef
go back to reference Grosskreutz, H., Lang, B., Trabold, D. (2013). A relevance criterion for sequential patterns. In ECMLPKDD (pp. 369–384).CrossRef Grosskreutz, H., Lang, B., Trabold, D. (2013). A relevance criterion for sequential patterns. In ECMLPKDD (pp. 369–384).CrossRef
go back to reference Hüllermeier, E. (2002). Association rules for expressing gradual dependencies (pp. 200–211).CrossRef Hüllermeier, E. (2002). Association rules for expressing gradual dependencies (pp. 200–211).CrossRef
go back to reference Kaytoue, M., Kuznetsov, S.O., Napoli, A. (2011). Revisiting numerical pattern mining with formal concept analysis. In Walsh, T. (Ed.) IJCAI proceedings of the 22nd international joint conference on artificial intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011 (p. 2011): IJCAI/AAAI. Kaytoue, M., Kuznetsov, S.O., Napoli, A. (2011). Revisiting numerical pattern mining with formal concept analysis. In Walsh, T. (Ed.) IJCAI proceedings of the 22nd international joint conference on artificial intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011 (p. 2011): IJCAI/AAAI.
go back to reference Kaytoue, M., Plantevit, M., Zimmermann, A., Bendimerad, A., Robardet, C. (2017). Exceptional contextual subgraph mining. Machine Learning, 106(8), 1171–1211.MathSciNetMATHCrossRef Kaytoue, M., Plantevit, M., Zimmermann, A., Bendimerad, A., Robardet, C. (2017). Exceptional contextual subgraph mining. Machine Learning, 106(8), 1171–1211.MathSciNetMATHCrossRef
go back to reference Klosgen, W. (1996). Explora: a multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining (pp. 249–271). Klosgen, W. (1996). Explora: a multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining (pp. 249–271).
go back to reference Lavrač, N., Flach, P., Zupan, B. (1999). Rule evaluation measures: a unifying view. In Džeroski, S., & Flach, P. (Eds.) Inductive logic programming (pp. 174–185). Berlin: Springer. Lavrač, N., Flach, P., Zupan, B. (1999). Rule evaluation measures: a unifying view. In Džeroski, S., & Flach, P. (Eds.) Inductive logic programming (pp. 174–185). Berlin: Springer.
go back to reference Lavrač, N., Kavšek, B., Flach, P., Todorovski, L. (2004). Subgroup discovery with cn2-sd. Journal of Machine Learning Research, 5, 153–188.MathSciNet Lavrač, N., Kavšek, B., Flach, P., Todorovski, L. (2004). Subgroup discovery with cn2-sd. Journal of Machine Learning Research, 5, 153–188.MathSciNet
go back to reference Leman, D., Feelders, A., Knobbe, A. (2008). Exceptional model mining. In Joint European conference on machine learning and knowledge discovery in databases (pp. 1–16): Springer. Leman, D., Feelders, A., Knobbe, A. (2008). Exceptional model mining. In Joint European conference on machine learning and knowledge discovery in databases (pp. 1–16): Springer.
go back to reference Lemmerich, F., Atzmueller, M., Puppe, F. (2016). Fast exhaustive subgroup discovery with numerical target concepts. Data Mining and Knowledge Discovery, 30 (3), 711–762.MathSciNetMATHCrossRef Lemmerich, F., Atzmueller, M., Puppe, F. (2016). Fast exhaustive subgroup discovery with numerical target concepts. Data Mining and Knowledge Discovery, 30 (3), 711–762.MathSciNetMATHCrossRef
go back to reference Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86). Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86).
go back to reference Martínez-Ballesteros, M., Troncoso, A., Martínez-Álvarez, F., Riquelme, J. (2016). Obtaining optimal quality measures for quantitative association rules. Neurocomputing, 176, 36–47. Recent advancements in hybrid artificial intelligence systems and its application to real-world problems.CrossRef Martínez-Ballesteros, M., Troncoso, A., Martínez-Álvarez, F., Riquelme, J. (2016). Obtaining optimal quality measures for quantitative association rules. Neurocomputing, 176, 36–47. Recent advancements in hybrid artificial intelligence systems and its application to real-world problems.CrossRef
go back to reference Morishita, S., & Sese, J. (2000). Transversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS’00 (pp. 226–236). New York: ACM. Morishita, S., & Sese, J. (2000). Transversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS’00 (pp. 226–236). New York: ACM.
go back to reference Prado, A., Plantevit, M., Robardet, C., Boulicaut, J.-F., patterns. (2013). Mining graph topological Finding covariations among vertex descriptors. IEEE Transactions on Knowledge and Data Engineering, 25(9), 2090–2104.CrossRef Prado, A., Plantevit, M., Robardet, C., Boulicaut, J.-F., patterns. (2013). Mining graph topological Finding covariations among vertex descriptors. IEEE Transactions on Knowledge and Data Engineering, 25(9), 2090–2104.CrossRef
go back to reference Rückert, U., Richter, L., Kramer, S. (2004). Quantitative association rules based on half-spaces: an optimization approach. In Proceedings of the 4th IEEE international conference on data mining (ICDM 2004), 1-4 November 2004, Brighton, UK (pp. 507–510). Rückert, U., Richter, L., Kramer, S. (2004). Quantitative association rules based on half-spaces: an optimization approach. In Proceedings of the 4th IEEE international conference on data mining (ICDM 2004), 1-4 November 2004, Brighton, UK (pp. 507–510).
go back to reference Salleb-Aouissi, A., Vrain, C., Nortet, C., Kong, X., Rathod, V., Cassard, D. (2013). Quantminer for mining quantitative association rules. Journal of Machine Learning Research, 14(1), 3153–3157.MATH Salleb-Aouissi, A., Vrain, C., Nortet, C., Kong, X., Rathod, V., Cassard, D. (2013). Quantminer for mining quantitative association rules. Journal of Machine Learning Research, 14(1), 3153–3157.MATH
go back to reference Sikora, M., & Wróbel, Ł. (2010). Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Archives of Mining Sciences, 55(1), 91–114. Sikora, M., & Wróbel, Ł. (2010). Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Archives of Mining Sciences, 55(1), 91–114.
go back to reference Terada, A., Okada-Hatakeyama, M., Tsuda, K., Sese, J. (2013). Statistical significance of combinatorial regulations. Proceedings of the National Academy of Sciences, 110(32), 12996–13001.MathSciNetMATHCrossRef Terada, A., Okada-Hatakeyama, M., Tsuda, K., Sese, J. (2013). Statistical significance of combinatorial regulations. Proceedings of the National Academy of Sciences, 110(32), 12996–13001.MathSciNetMATHCrossRef
go back to reference Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley series in behavioral science : quantitative methods, Addison-Wesley. Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley series in behavioral science : quantitative methods, Addison-Wesley.
go back to reference van Leeuwen, M., & Knobbem, A.J. (2012). Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2), 208–242.MathSciNetCrossRef van Leeuwen, M., & Knobbem, A.J. (2012). Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2), 208–242.MathSciNetCrossRef
go back to reference Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In PKDD (pp. 78–87).CrossRef Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In PKDD (pp. 78–87).CrossRef
go back to reference Xin, D., Cheng, H., Yan, X., Han, J. (2006). Extracting redundancy-aware top-k patterns. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 444–453): ACM. Xin, D., Cheng, H., Yan, X., Han, J. (2006). Extracting redundancy-aware top-k patterns. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 444–453): ACM.
Metadata
Title
Rank correlated subgroup discovery
Authors
Mohamed Ali Hammal
Hélène Mathian
Luc Merchez
Marc Plantevit
Céline Robardet
Publication date
06-04-2019
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 2/2019
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-019-00555-y

Other articles of this Issue 2/2019

Journal of Intelligent Information Systems 2/2019 Go to the issue

Premium Partner