Skip to main content
Erschienen in: Journal of Intelligent Information Systems 2/2014

01.04.2014

Semantic subgroup explanations

verfasst von: Anže Vavpetič, Vid Podpečan, Nada Lavrač

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Subgroup discovery (SD) methods can be used to find interesting subsets of objects of a given class. While subgroup describing rules are themselves good explanations of the subgroups, domain ontologies can provide additional descriptions to data and alternative explanations of the constructed rules. Such explanations in terms of higher level ontology concepts have the potential of providing new insights into the domain of investigation. We show that this additional explanatory power can be ensured by using recently developed semantic SD methods. We present a new approach to explaining subgroups through ontologies and demonstrate its utility on a motivational use case and on a gene expression profiling use case where groups of patients, identified through SD in terms of gene expression, are further explained through concepts from the Gene Ontology and KEGG orthology. We qualitatively compare the methodology with the supporting factors technique for characterizing subgroups. The developed tools are implemented within a new browser-based data mining platform ClowdFlows.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Angiulli, F., Fassetti, F., Palopoli, L. (2013). Discovering characterizations of the behavior of anomalous subpopulations. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1280–1292. doi:10.1109/TKDE.2012.58.CrossRef Angiulli, F., Fassetti, F., Palopoli, L. (2013). Discovering characterizations of the behavior of anomalous subpopulations. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1280–1292. doi:10.​1109/​TKDE.​2012.​58.CrossRef
Zurück zum Zitat Atzmüller, M., & Puppe, F. (2006). SD-Map—a fast algorithm for exhaustive subgroup discovery. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD ’06) (pp. 6–17). Springer. Atzmüller, M., & Puppe, F. (2006). SD-Map—a fast algorithm for exhaustive subgroup discovery. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD ’06) (pp. 6–17). Springer.
Zurück zum Zitat Bay, S.D., & Pazzani, M.J. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.CrossRefMATH Bay, S.D., & Pazzani, M.J. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.CrossRefMATH
Zurück zum Zitat Demšar, J., Zupan, B., Leban, G. (2004). Orange: from experimental machine learning to interactive data mining, white paper. Faculty of Computer and Information Science, University of Ljubljana. www.ailab.si/orange. Demšar, J., Zupan, B., Leban, G. (2004). Orange: from experimental machine learning to interactive data mining, white paper. Faculty of Computer and Information Science, University of Ljubljana. www.​ailab.​si/​orange.
Zurück zum Zitat Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: discovering trends and differences. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99) (pp. 43–52). Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: discovering trends and differences. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99) (pp. 43–52).
Zurück zum Zitat Elston, C.W., & Ellis, I.O. (1991). Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology, 19(5), 403–410.CrossRef Elston, C.W., & Ellis, I.O. (1991). Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology, 19(5), 403–410.CrossRef
Zurück zum Zitat Eronen, L., & Toivonen, H. (2012). Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinformatics, 13, 119.CrossRef Eronen, L., & Toivonen, H. (2012). Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinformatics, 13, 119.CrossRef
Zurück zum Zitat Galea, M., Blamey, R., Elston, C., Ellis, I. (1992). The Nottingham prognostic index in primary breast cancer. Breast Cancer Research and Treatment, 22, 207–219.CrossRef Galea, M., Blamey, R., Elston, C., Ellis, I. (1992). The Nottingham prognostic index in primary breast cancer. Breast Cancer Research and Treatment, 22, 207–219.CrossRef
Zurück zum Zitat Gamberger, D., & Lavrač, N. (2002). Expert-guided subgroup discovery: methodology and application. Journal of Artificial Intelligence Research (JAIR), 17, 501–527.MATH Gamberger, D., & Lavrač, N. (2002). Expert-guided subgroup discovery: methodology and application. Journal of Artificial Intelligence Research (JAIR), 17, 501–527.MATH
Zurück zum Zitat Gamberger, D., & Lavrač, N. (2003). Active subgroup mining: a case study in coronary heart disease risk group detection. Artificial Intelligence in Medicine, 28(1), 27–57.CrossRef Gamberger, D., & Lavrač, N. (2003). Active subgroup mining: a case study in coronary heart disease risk group detection. Artificial Intelligence in Medicine, 28(1), 27–57.CrossRef
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H. (2009). The WEKA data mining software: an update. SIGKDD Explor Newsl, 11, 10–18.CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H. (2009). The WEKA data mining software: an update. SIGKDD Explor Newsl, 11, 10–18.CrossRef
Zurück zum Zitat Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A. (2011). Ontology-based meta-mining of knowledge discovery workflows. In N. Jankowski, W. Duch, K. Grabczewski (Eds.), Meta-learning in computational intelligence, studies in computational intelligence (Vol. 358, pp. 273–315). Berlin Heidelberg: Springer.CrossRef Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A. (2011). Ontology-based meta-mining of knowledge discovery workflows. In N. Jankowski, W. Duch, K. Grabczewski (Eds.), Meta-learning in computational intelligence, studies in computational intelligence (Vol. 358, pp. 273–315). Berlin Heidelberg: Springer.CrossRef
Zurück zum Zitat Jovanoski, V., & Lavrač, N. (2001). Classification rule learning with APRIORI-C. In P. Brazdil, & A. Jorge (Eds.), EPIA, lecture notes in computer science (Vol. 2258, pp. 44–51). Berlin Heidelberg: Springer. Jovanoski, V., & Lavrač, N. (2001). Classification rule learning with APRIORI-C. In P. Brazdil, & A. Jorge (Eds.), EPIA, lecture notes in computer science (Vol. 2258, pp. 44–51). Berlin Heidelberg: Springer.
Zurück zum Zitat Kavšek, B., & Lavrač, N. (2006). APRIORI-SD: adapting association rule learning to subgroup discovery. Applied Artificial Intelligence, 20(7), 543–583.CrossRef Kavšek, B., & Lavrač, N. (2006). APRIORI-SD: adapting association rule learning to subgroup discovery. Applied Artificial Intelligence, 20(7), 543–583.CrossRef
Zurück zum Zitat Klösgen, W. (1996). Explora: a multipattern and multistrategy discovery assistant. In Advances in knowledge discovery and data mining, (pp. 249–271). Menlo Park: American Association for Artificial Intelligence. Klösgen, W. (1996). Explora: a multipattern and multistrategy discovery assistant. In Advances in knowledge discovery and data mining, (pp. 249–271). Menlo Park: American Association for Artificial Intelligence.
Zurück zum Zitat Kralj Novak, P., Lavrač, N., Webb, G.I. (2009). Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377–403.MATH Kralj Novak, P., Lavrač, N., Webb, G.I. (2009). Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377–403.MATH
Zurück zum Zitat Kranjc, J., Podpečan, V., Lavrač, N. (2012). Clowdflows: a cloud based scientific workflow platform. In P.A. Flach, T.D. Bie, N. Cristianini (Eds.), ECML/PKDD (2), lecture notes in computer science (Vol. 7524, pp. 816–819). Berlin Heidelberg: Springer. Kranjc, J., Podpečan, V., Lavrač, N. (2012). Clowdflows: a cloud based scientific workflow platform. In P.A. Flach, T.D. Bie, N. Cristianini (Eds.), ECML/PKDD (2), lecture notes in computer science (Vol. 7524, pp. 816–819). Berlin Heidelberg: Springer.
Zurück zum Zitat Langohr, L., Podpečan, V., Petek, M., Mozetič, I., Gruden, K., Lavrač, N., Toivonen, H. (2013). Contrasting subgroup discovery. Computer Journal, 56(3), 289–303.CrossRef Langohr, L., Podpečan, V., Petek, M., Mozetič, I., Gruden, K., Lavrač, N., Toivonen, H. (2013). Contrasting subgroup discovery. Computer Journal, 56(3), 289–303.CrossRef
Zurück zum Zitat Lavrač, N., Kavšek, B., Flach, P.A., Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–188. Lavrač, N., Kavšek, B., Flach, P.A., Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–188.
Zurück zum Zitat Lavrač, N., Vavpetič, A., Soldatova, L., Trajkovski, I., Kralj Novak, P. (2011). Using ontologies in semantic data mining with SEGS and g-SEGS. In Proceedings of the international conference on discovery science (DS ’11) (pp. 165–178). Springer. Lavrač, N., Vavpetič, A., Soldatova, L., Trajkovski, I., Kralj Novak, P. (2011). Using ontologies in semantic data mining with SEGS and g-SEGS. In Proceedings of the international conference on discovery science (DS ’11) (pp. 165–178). Springer.
Zurück zum Zitat Lawrynowicz, A., & Potoniec, J. (2011). Fr-ont: an algorithm for frequent concept mining with formal ontologies. In M. Kryszkiewicz, H. Rybinski, A. Skowron, Z.W. Ras (Eds.), ISMIS, lecture notes in computer science (Vol. 6804, pp. 428–437). Berlin Heidelberg: Springer. Lawrynowicz, A., & Potoniec, J. (2011). Fr-ont: an algorithm for frequent concept mining with formal ontologies. In M. Kryszkiewicz, H. Rybinski, A. Skowron, Z.W. Ras (Eds.), ISMIS, lecture notes in computer science (Vol. 6804, pp. 428–437). Berlin Heidelberg: Springer.
Zurück zum Zitat Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T. (2005). Entrez gene: gene-centered information at NCBI. Nucleic Acids Research, 33(Database issue). Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T. (2005). Entrez gene: gene-centered information at NCBI. Nucleic Acids Research, 33(Database issue).
Zurück zum Zitat McCall, M.N., Bolstad, B.M., Irizarry, R.A. (2010). Frozen robust multiarray analysis (fRMA). Biostatistics, 11(2), 242–253.CrossRef McCall, M.N., Bolstad, B.M., Irizarry, R.A. (2010). Frozen robust multiarray analysis (fRMA). Biostatistics, 11(2), 242–253.CrossRef
Zurück zum Zitat Podpečan, V., Juršič, M., žakova, M., Lavrač, N. (2009). Towards a service-oriented knowledge discovery platform. In V. Podpečan & N. Lavrač (Eds.), Third-generation data mining: towards service-oriented knowledge discovery (pp. 25–36). Podpečan, V., Juršič, M., žakova, M., Lavrač, N. (2009). Towards a service-oriented knowledge discovery platform. In V. Podpečan & N. Lavrač (Eds.), Third-generation data mining: towards service-oriented knowledge discovery (pp. 25–36).
Zurück zum Zitat Podpečan, V., Lavrač, N., Mozetič, I., Kralj Novak, P., Trajkovski, I., Langohr, L., Kulovesi, K., Toivonen, H., Petek, M., Motaln, H., Gruden, K. (2011a). SegMine workflows for semantic microarray data analysis in Orange4WS. BMC Bioinformatics, 12, 416.CrossRef Podpečan, V., Lavrač, N., Mozetič, I., Kralj Novak, P., Trajkovski, I., Langohr, L., Kulovesi, K., Toivonen, H., Petek, M., Motaln, H., Gruden, K. (2011a). SegMine workflows for semantic microarray data analysis in Orange4WS. BMC Bioinformatics, 12, 416.CrossRef
Zurück zum Zitat Podpečan, V., Zemenova, M., Lavrač, N. (2011b). Orange4WS environment for service-oriented data mining. The Computer Journal. doi:10.1093/comjnl/bxr077. Accessed 7 Aug 2011. Podpečan, V., Zemenova, M., Lavrač, N. (2011b). Orange4WS environment for service-oriented data mining. The Computer Journal. doi:10.​1093/​comjnl/​bxr077. Accessed 7 Aug 2011.
Zurück zum Zitat Robnik-Šikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 53, 23–69.CrossRefMATH Robnik-Šikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 53, 23–69.CrossRefMATH
Zurück zum Zitat Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M.J., Bergh, J., Piccart, M., Delorenzi, M. (2006). Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute, 98(4), 262–272. Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M.J., Bergh, J., Piccart, M., Delorenzi, M. (2006). Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute, 98(4), 262–272.
Zurück zum Zitat Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15,545–15,550.CrossRef Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15,545–15,550.CrossRef
Zurück zum Zitat Suzuki, E. (1997). Autonomous discovery of reliable exception rules. In Proceedings of the third international conference on knowledge discovery and data mining (pp. 259–262). Suzuki, E. (1997). Autonomous discovery of reliable exception rules. In Proceedings of the third international conference on knowledge discovery and data mining (pp. 259–262).
Zurück zum Zitat Suzuki, E. (2006). Data mining methods for discovering interesting exceptions from an unsupervised table. Journal of Universal Computer Science, 12(6), 627–653. Suzuki, E. (2006). Data mining methods for discovering interesting exceptions from an unsupervised table. Journal of Universal Computer Science, 12(6), 627–653.
Zurück zum Zitat Taminau, J., Steenhoff, D., Coletta, A., Meganck, S., Lazar, C., de Schaetzen, V., Duque, R., Molter, C., Bersini, H., Nowé, A., Weiss Solís, D.Y. (2011). InSilicoDB: an R/Bioconductor package for accessing human Affymetrix expert-curated datasets from GEO. Bioinformatics. doi:10.1093/bioinformatics/btr529. Taminau, J., Steenhoff, D., Coletta, A., Meganck, S., Lazar, C., de Schaetzen, V., Duque, R., Molter, C., Bersini, H., Nowé, A., Weiss Solís, D.Y. (2011). InSilicoDB: an R/Bioconductor package for accessing human Affymetrix expert-curated datasets from GEO. Bioinformatics. doi:10.​1093/​bioinformatics/​btr529.​
Zurück zum Zitat Trajkovski, I., Lavrač, N., Tolar, J. (2008). SEGS: search for enriched gene sets in microarray data. Journal of Biomedical Informatics, 41(4), 588–601.CrossRef Trajkovski, I., Lavrač, N., Tolar, J. (2008). SEGS: search for enriched gene sets in microarray data. Journal of Biomedical Informatics, 41(4), 588–601.CrossRef
Zurück zum Zitat Vavpetič, A., & Lavrač, N. (2013). Semantic subgroup discovery systems and workflows in the SDM-Toolkit. Computer Journal, 56(3), 304–320.CrossRef Vavpetič, A., & Lavrač, N. (2013). Semantic subgroup discovery systems and workflows in the SDM-Toolkit. Computer Journal, 56(3), 304–320.CrossRef
Zurück zum Zitat Vavpetič, A., Podpečan, V., Meganck, S., Lavrač, N. (2012). Explaining subgroups through ontologies. In P. Anthony, M. Ishizuka, D. Lukose (Eds.), Proceedings of PRICAI, lecture notes in computer science (Vol. 7458, pp. 625–636). Berlin Heidelberg: Springer. Vavpetič, A., Podpečan, V., Meganck, S., Lavrač, N. (2012). Explaining subgroups through ontologies. In P. Anthony, M. Ishizuka, D. Lukose (Eds.), Proceedings of PRICAI, lecture notes in computer science (Vol. 7458, pp. 625–636). Berlin Heidelberg: Springer.
Zurück zum Zitat Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N. (2013). Semantic data mining of financial news articles. In Proceedings of the international conference on discovery science (DS ’13). Springer. Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N. (2013). Semantic data mining of financial news articles. In Proceedings of the international conference on discovery science (DS ’13). Springer.
Zurück zum Zitat Webb, G.I., Butler, S.M., Newlands, D. (2003). On detecting differences between groups. In Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-03) (pp. 256–265). Webb, G.I., Butler, S.M., Newlands, D. (2003). On detecting differences between groups. In Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-03) (pp. 256–265).
Zurück zum Zitat Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Proceedings of the first European conference on principles of data mining and knowledge discovery (PKDD ’97) (pp. 78–87). Springer. Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Proceedings of the first European conference on principles of data mining and knowledge discovery (PKDD ’97) (pp. 78–87). Springer.
Zurück zum Zitat Žáková, M., Železný, F., García-Sedano, J.A., Tissot, C.M., Lavrač, N., Kremen, P., Molina, J. (2006). Relational data mining applied to virtual engineering of product designs. In Proceedings of the 16th international conference on inductive logic programming (ILP’06) (pp. 439–453). Berlin/Heidelberg, Germany, Santiago de Compostela, Spain: Springer-Verlag. Žáková, M., Železný, F., García-Sedano, J.A., Tissot, C.M., Lavrač, N., Kremen, P., Molina, J. (2006). Relational data mining applied to virtual engineering of product designs. In Proceedings of the 16th international conference on inductive logic programming (ILP’06) (pp. 439–453). Berlin/Heidelberg, Germany, Santiago de Compostela, Spain: Springer-Verlag.
Metadaten
Titel
Semantic subgroup explanations
verfasst von
Anže Vavpetič
Vid Podpečan
Nada Lavrač
Publikationsdatum
01.04.2014
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 2/2014
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-013-0292-1

Weitere Artikel der Ausgabe 2/2014

Journal of Intelligent Information Systems 2/2014 Zur Ausgabe