Top

Journal of Intelligent Information Systems

Published in:

06-04-2019

Rank correlated subgroup discovery

Authors: Mohamed Ali Hammal, Hélène Mathian, Luc Merchez, Marc Plantevit, Céline Robardet

Published in: Journal of Intelligent Information Systems | Issue 2/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Subgroup discovery (SD) and exceptional model mining (EMM), its generalization to handle more complex targets, are two mature fields at the frontier of data mining and machine learning. More precisely, EMM aims to find coherent subgroups of a dataset where multiple targets interact in an unusual way. Correlation model classes have already been defined to discover interesting subgroups when dealing with two numerical targets. However, in this supervised setting, the two numerical targets are fixed before the subgroup search. To make unsupervised exploration possible, we propose to search for arbitrary subsets of numerical targets whose correlation is exceptional for an automatically found subgroup. This involves solving two challenges: the definition of a model that evaluates the interest of a subgroup for a subset of numerical targets and the definition of a pattern language that enumerates both subgroups and targets and lends itself to effective research strategies. We propose an integrated solution to both challenges. We introduce the problem of rank-correlated subgroup discovery with an arbitrary subset of numerical targets. A rank-correlated subgroup is identified by both conditions on descriptive attributes, whether numeric or nominal, and a pattern on numeric attributes that captures (positive or negative) rank correlations based on a generalization of the Kendall’s τ. We define a new branch-and-bound algorithm that exploits some pruning properties based on two upper-bounds and a closure property. An empirical study on several datasets demonstrates the efficiency and the effectiveness of the algorithm.

previous article CBSSD: community-based semantic subgroup discovery

next article StreamPref: a query language for temporal conditional preferences on data streams

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

https://gitlab.liris.cnrs.fr/mahammal/rank-correlated-subgroup-discovery.git

https://web.stanford.edu/~hastie/ElemStatLearn/datasets/

Atzmüller, M., & Puppe, F. (2006). Sd-map - a fast algorithm for exhaustive subgroup discovery. In ECMLPKDD (pp. 6–17).

Aumann, Y., & Lindell, Y. (1999). A statistical theory for quantitative association rules. In KDD. Citeseer, (Vol. 99 pp. 261–270).

Bay, S.D., & Pazzani, M.J. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.MATHCrossRef

Belfodil, A., Cazalens, S., Lamarre, P., Plantevit, M. (2017). Flash points: discovering exceptional pairwise behaviors in vote or rating data. In ECML PKDD (pp. 442–458).CrossRef

Belfodil, A., Kuznetsov, S.O., Robardet, C., Kaytoue, M. (2017). Mining convex polygon patterns with formal concept analysis. In Proceedings of the 26th international joint conference on artificial intelligence, IJCAI. Melbourne, Australia August 19-25, 2017 (pp. 1425–1432).

Bendimerad, A.A., Plantevit, M., Robardet, C. (2016). Unsupervised exceptional attributed sub-graph mining in urban data. In ICDM (pp. 21–30).

Bendimerad, A.A., Cazabet, R., Plantevit, M., Robardet, C. (2017). Contextual subgraph discovery with mobility models. In Complex networks (pp. 477–489).

Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446.MathSciNetMATHCrossRef

Błaszczyński, J., Słowiński, R., Szelkag, M. (2011). Sequential covering rule induction algorithm for variable consistency rough set approaches. Information Sciences, 181(5), 987–1002.MathSciNetCrossRef

Boley, M., Lucchese, C., Paurat, D., Gärtner, T. (2011). Direct local pattern sampling by efficient two-step random procedures. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21-24, 2011 (pp. 582–590).

Boley, M., Moens, S., Gärtner, T. (2012). Linear space direct pattern sampling using coupling from the past. In The 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, Beijing, China, August 12-16, 2012 (pp. 69–77).

Bosc, G., Golebiowski, J., Bensafi, M., Robardet, C., Plantevit, M., Boulicaut, J., Kaytoue, M. (2016). Local subgroup discovery for eliciting and understanding new structure-odor relationships. In DS (pp. 19–34).CrossRef

Calders, T., Goethals, B., Jaroszewicz, S. (2006). Mining rank-correlated sets of numerical attributes. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 96–105): ACM.

Cerf, L., Besson, J., Robardet, C., Boulicaut, J. (2009). Closed patterns meet n-ary relations. TKDD, 3(1), 3:1–3:36.CrossRef

Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J. (2008). ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Statistical Analysis and Data Mining, 1(2), 67–84.MathSciNetCrossRef

de Sá, C.R., Duivesteijn, W., Soares, C., Knobbe, A.J. (2016). Exceptional preferences mining (pp. 3–18).

Do, T.D.T., Laurent, A., Termier, A. (2010). PGLCM: efficient parallel mining of closed frequent gradual itemsets (pp. 138–147).

Do, T.D.T., Termier, A., Laurent, A., Negrevergne, B., Omidvar-Tehrani, B., Amer-Yahia, S. (2015). Pglcm: efficient parallel mining of closed frequent gradual itemsets. Knowledge and Information Systems, 43(3), 497–527.CrossRef

Dong, G., & Li, J. (1999). Efficient mining of emerging patterns Discovering trends and differences. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 43–52): ACM.

Downar, L., & Duivesteijn, W. (2017). Exceptionally monotone models—the rank correlation model class for exceptional model mining. Knowledge and Information Systems, 51(2), 369–394.CrossRef

Dua, D., & Graff, C. (2017). UCI machine learning repository.

Duivesteijn, W., Feelders, A.J., Knobbe, A. (2016). Exceptional model mining. Data Mining and Knowledge Discovery, 30(1), 47–98.MathSciNetMATHCrossRef

Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M. (2010). Subgroup discovery meets bayesian networks – an exceptional model mining approach. In ICDM (pp. 158–167).

Fan, Y.-N., Tseng, T.-L.B., Chern, C.-C., Huang, C.-C. (2009). Rule induction based on an incremental rough set. Expert Systems with Applications, 36(9), 11439–11450.CrossRef

Grosskreutz, H., & Rüping, S. (2009). On subgroup discovery in numerical domains. Data Mining and Knowledge Discovery, 19(2), 210–226.MathSciNetCrossRef

Grosskreutz, H., Lang, B., Trabold, D. (2013). A relevance criterion for sequential patterns. In ECMLPKDD (pp. 369–384).CrossRef

Hüllermeier, E. (2002). Association rules for expressing gradual dependencies (pp. 200–211).CrossRef

Kaytoue, M., Kuznetsov, S.O., Napoli, A. (2011). Revisiting numerical pattern mining with formal concept analysis. In Walsh, T. (Ed.) IJCAI proceedings of the 22nd international joint conference on artificial intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011 (p. 2011): IJCAI/AAAI.

Kaytoue, M., Plantevit, M., Zimmermann, A., Bendimerad, A., Robardet, C. (2017). Exceptional contextual subgraph mining. Machine Learning, 106(8), 1171–1211.MathSciNetMATHCrossRef

Klosgen, W. (1996). Explora: a multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining (pp. 249–271).

Lavrač, N., Flach, P., Zupan, B. (1999). Rule evaluation measures: a unifying view. In Džeroski, S., & Flach, P. (Eds.) Inductive logic programming (pp. 174–185). Berlin: Springer.

Lavrač, N., Kavšek, B., Flach, P., Todorovski, L. (2004). Subgroup discovery with cn2-sd. Journal of Machine Learning Research, 5, 153–188.MathSciNet

Leman, D., Feelders, A., Knobbe, A. (2008). Exceptional model mining. In Joint European conference on machine learning and knowledge discovery in databases (pp. 1–16): Springer.

Lemmerich, F., Atzmueller, M., Puppe, F. (2016). Fast exhaustive subgroup discovery with numerical target concepts. Data Mining and Knowledge Discovery, 30 (3), 711–762.MathSciNetMATHCrossRef

Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86).

Martínez-Ballesteros, M., Troncoso, A., Martínez-Álvarez, F., Riquelme, J. (2016). Obtaining optimal quality measures for quantitative association rules. Neurocomputing, 176, 36–47. Recent advancements in hybrid artificial intelligence systems and its application to real-world problems.CrossRef

Morishita, S., & Sese, J. (2000). Transversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS’00 (pp. 226–236). New York: ACM.

Prado, A., Plantevit, M., Robardet, C., Boulicaut, J.-F., patterns. (2013). Mining graph topological Finding covariations among vertex descriptors. IEEE Transactions on Knowledge and Data Engineering, 25(9), 2090–2104.CrossRef

Rückert, U., Richter, L., Kramer, S. (2004). Quantitative association rules based on half-spaces: an optimization approach. In Proceedings of the 4th IEEE international conference on data mining (ICDM 2004), 1-4 November 2004, Brighton, UK (pp. 507–510).

Salleb-Aouissi, A., Vrain, C., Nortet, C., Kong, X., Rathod, V., Cassard, D. (2013). Quantminer for mining quantitative association rules. Journal of Machine Learning Research, 14(1), 3153–3157.MATH

Sikora, M., & Wróbel, Ł. (2010). Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Archives of Mining Sciences, 55(1), 91–114.

Terada, A., Okada-Hatakeyama, M., Tsuda, K., Sese, J. (2013). Statistical significance of combinatorial regulations. Proceedings of the National Academy of Sciences, 110(32), 12996–13001.MathSciNetMATHCrossRef

Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley series in behavioral science : quantitative methods, Addison-Wesley.

van Leeuwen, M., & Knobbem, A.J. (2012). Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2), 208–242.MathSciNetCrossRef

Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In PKDD (pp. 78–87).CrossRef

Xin, D., Cheng, H., Yan, X., Han, J. (2006). Extracting redundancy-aware top-k patterns. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 444–453): ACM.

Title: Rank correlated subgroup discovery
Authors: Mohamed Ali Hammal
Hélène Mathian
Luc Merchez
Marc Plantevit
Céline Robardet
Publication date: 06-04-2019
Publisher: Springer US
Published in: Journal of Intelligent Information Systems / Issue 2/2019
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI: https://doi.org/10.1007/s10844-019-00555-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2019

Ensembles of density estimators for positive-unlabeled learning

Density-based unsupervised ensemble learning methods for time series forecasting of aggregated or clustered electricity consumption

Effect of speech segment samples selection in stutter block detection and remediation

CBSSD: community-based semantic subgroup discovery

Semantic similarity aggregators for very short textual expressions: a case study on landmarks and point of interest

Database of speech and facial expressions recorded with optimized face motion capture settings

Premium Partner