Skip to main content

2016 | OriginalPaper | Buchkapitel

Knowledge Discovery from Complex High Dimensional Data

verfasst von : Sangkyun Lee, Andreas Holzinger

Erschienen in: Solving Large Scale Learning Tasks. Challenges and Algorithms

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Modern data analysis is confronted by increasing dimensionality of problems, mainly contributed by higher resolutions available for data acquisition and by our use of larger models with more degrees of freedom to investigate complex systems deeper. High dimensionality constitutes one aspect of “big data”, which brings us not only computational but also statistical and perceptional challenges. Most data analysis problems are solved using techniques of optimization, where large-scale optimization requires faster algorithms and implementations. Computed solutions must be evaluated for statistical quality, since otherwise false discoveries can be made. Recent papers suggest to control and modify algorithms themselves for better statistical properties. Finally, human perception puts an inherent limit on our understanding to three dimensional spaces, making it almost impossible to grasp complex phenomena. For aid, we use dimensionality reduction or other techniques, but these usually do not capture relations between interesting objects. Here graph-based knowledge representation has lots of potential, for instance to create perceivable and interactive representations and to perform new types of analysis based on graph theory and network topology. In this article, we show glimpses of new developments in these aspects.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
The CEL files from the GEO were normalized and summarized for transcripts using the frozen RMA algorithm [55]. Then only the verified (grade A) genes were chosen for further analysis according to the NetAffx probeset annotation v33.1 of Affymetrix (\(n=20492\) afterwards). Also, microarrays with low quality according to the GNUSE [54] error scores \(>1\) were discarded (\(m=392\) afterwards).
 
5
National Cancer Institute, http://​www.​cancer.​gov.
 
7
HCI-KDD Network: www.​hci-kdd.​org.
 
Literatur
1.
Zurück zum Zitat Anderson, N.R., Lee, E.S., Brockenbrough, J.S., Minie, M.E., Fuller, S., Brinkley, J., Tarczy-Hornoch, P.: Issues in biomedical research data management and analysis: needs and barriers. J. Am. Med. Inform. Assoc. 14(4), 478–488 (2007)CrossRef Anderson, N.R., Lee, E.S., Brockenbrough, J.S., Minie, M.E., Fuller, S., Brinkley, J., Tarczy-Hornoch, P.: Issues in biomedical research data management and analysis: needs and barriers. J. Am. Med. Inform. Assoc. 14(4), 478–488 (2007)CrossRef
2.
Zurück zum Zitat Bach, F.R.: Bolasso: Model consistent Lasso estimation through the bootstrap. In: 25th International Conference on Machine Learning, pp. 33–40 (2008) Bach, F.R.: Bolasso: Model consistent Lasso estimation through the bootstrap. In: 25th International Conference on Machine Learning, pp. 33–40 (2008)
3.
Zurück zum Zitat Banerjee, O., Ghaoui, L.E., d’Aspremont, A.: Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Am. Med. Inform. Assoc. 9, 485–516 (2008)MathSciNetMATH Banerjee, O., Ghaoui, L.E., d’Aspremont, A.: Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Am. Med. Inform. Assoc. 9, 485–516 (2008)MathSciNetMATH
5.
Zurück zum Zitat Barabási, A., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Science 12(1), 56–68 (2011) Barabási, A., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Science 12(1), 56–68 (2011)
6.
Zurück zum Zitat Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. Science 23(4), 2037–2060 (2013)MathSciNetMATH Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. Science 23(4), 2037–2060 (2013)MathSciNetMATH
7.
Zurück zum Zitat Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. Science 2(1), 183–202 (2009)MathSciNetMATH Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. Science 2(1), 183–202 (2009)MathSciNetMATH
8.
Zurück zum Zitat Bogdan, M., van den Berg, E., Sabatti, C., Su, W., Candes, E.J.: SLOPE - adaptive variable selection via convex optimization. (2014). arXiv:1407.3824 Bogdan, M., van den Berg, E., Sabatti, C., Su, W., Candes, E.J.: SLOPE - adaptive variable selection via convex optimization. (2014). arXiv:​1407.​3824
9.
Zurück zum Zitat Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Science 3(1), 1–122 (2011)MathSciNetMATH Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Science 3(1), 1–122 (2011)MathSciNetMATH
10.
Zurück zum Zitat Bubenik, P., Kim, P.T.: A statistical approach to persistent homology. Science 9(2), 337–362 (2007)MathSciNetMATH Bubenik, P., Kim, P.T.: A statistical approach to persistent homology. Science 9(2), 337–362 (2007)MathSciNetMATH
11.
Zurück zum Zitat Castellana, B., Escuin, D., Peiró, G., Garcia-Valdecasas, B., Vázquez, T., Pons, C., Pérez-Olabarria, M., Barnadas, A., Lerma, E.: ASPN and GJB2 are implicated in the mechanisms of invasion of ductal breast carcinomas. Science 3, 175–183 (2012) Castellana, B., Escuin, D., Peiró, G., Garcia-Valdecasas, B., Vázquez, T., Pons, C., Pérez-Olabarria, M., Barnadas, A., Lerma, E.: ASPN and GJB2 are implicated in the mechanisms of invasion of ductal breast carcinomas. Science 3, 175–183 (2012)
12.
Zurück zum Zitat Cerri, A., Fabio, B.D., Ferri, M., Frosini, P., Landi, C.: Betti numbers in multidimensional persistent homology are stable functions. Science 36(12), 1543–1557 (2013)MathSciNetMATH Cerri, A., Fabio, B.D., Ferri, M., Frosini, P., Landi, C.: Betti numbers in multidimensional persistent homology are stable functions. Science 36(12), 1543–1557 (2013)MathSciNetMATH
13.
Zurück zum Zitat Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics 5(1), 147 (2004)CrossRef Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics 5(1), 147 (2004)CrossRef
14.
Zurück zum Zitat Cios, K.J., Moore, G.W.: Uniqueness of medical data mining. BMC Bioinformatics 26(1), 1–24 (2002) Cios, K.J., Moore, G.W.: Uniqueness of medical data mining. BMC Bioinformatics 26(1), 1–24 (2002)
15.
Zurück zum Zitat Cook, D.J., Holder, L.B.: Graph-based data mining. BMC Bioinformatics 15(2), 32–41 (2000) Cook, D.J., Holder, L.B.: Graph-based data mining. BMC Bioinformatics 15(2), 32–41 (2000)
16.
Zurück zum Zitat Cox, D.R., Oakes, D.: Analysis of Survival Data. Monographs on Statistics & Applied Probability. Chapman & Hall/CRC, London (1984) Cox, D.R., Oakes, D.: Analysis of Survival Data. Monographs on Statistics & Applied Probability. Chapman & Hall/CRC, London (1984)
17.
Zurück zum Zitat Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. BMC Bioinformatics 3(1), 7–36 (1999) Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. BMC Bioinformatics 3(1), 7–36 (1999)
18.
Zurück zum Zitat Iordache, O.: Methods. In: Iordache, O. (ed.) Polystochastic Models for Complexity. UCS, vol. 4, pp. 17–61. Springer, Heidelberg (2010)CrossRef Iordache, O.: Methods. In: Iordache, O. (ed.) Polystochastic Models for Complexity. UCS, vol. 4, pp. 17–61. Springer, Heidelberg (2010)CrossRef
19.
Zurück zum Zitat Dehmer, M., Basak, S.C.: Statistical and Machine Learning Approaches for Network Analysis. Wiley, Hoboken (2012)MATHCrossRef Dehmer, M., Basak, S.C.: Statistical and Machine Learning Approaches for Network Analysis. Wiley, Hoboken (2012)MATHCrossRef
20.
Zurück zum Zitat Donsa, K., Spat, S., Beck, P., Pieber, T.R., Holzinger, A.: Towards personalization of diabetes therapy using computerized decision support and machine learning: some open problems and challenges. In: Holzinger, A., Röcker, C., Ziefle, M. (eds.) Smart Health. LNCS, vol. 8700, pp. 237–260. Springer, Heidelberg (2015) Donsa, K., Spat, S., Beck, P., Pieber, T.R., Holzinger, A.: Towards personalization of diabetes therapy using computerized decision support and machine learning: some open problems and challenges. In: Holzinger, A., Röcker, C., Ziefle, M. (eds.) Smart Health. LNCS, vol. 8700, pp. 237–260. Springer, Heidelberg (2015)
21.
Zurück zum Zitat Dorogovtsev, S., Mendes, J.: Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford (2003)MATHCrossRef Dorogovtsev, S., Mendes, J.: Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford (2003)MATHCrossRef
22.
Zurück zum Zitat Duerr-Specht, M., Goebel, R., Holzinger, A.: Medicine and health care as a data problem: will computers become better medical doctors? In: Holzinger, A., Röcker, C., Ziefle, M. (eds.) Smart Health. LNCS, vol. 8700, pp. 21–39. Springer, Heidelberg (2015) Duerr-Specht, M., Goebel, R., Holzinger, A.: Medicine and health care as a data problem: will computers become better medical doctors? In: Holzinger, A., Röcker, C., Ziefle, M. (eds.) Smart Health. LNCS, vol. 8700, pp. 21–39. Springer, Heidelberg (2015)
23.
Zurück zum Zitat Epstein, C., Carlsson, G., Edelsbrunner, H.: Topological data analysis. BMC Bioinformatics 27(12), 120201 (2011) Epstein, C., Carlsson, G., Edelsbrunner, H.: Topological data analysis. BMC Bioinformatics 27(12), 120201 (2011)
24.
Zurück zum Zitat Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical Lasso. BMC Bioinformatics 9(3), 432–441 (2008)MATH Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical Lasso. BMC Bioinformatics 9(3), 432–441 (2008)MATH
25.
Zurück zum Zitat Golumbic, M.C.: Algorithmic Graph Theory and Perfect Graphs. Elsevier, Amsterdam (2004)MATH Golumbic, M.C.: Algorithmic Graph Theory and Perfect Graphs. Elsevier, Amsterdam (2004)MATH
26.
Zurück zum Zitat Henderson, B.E., Feigelson, H.S.: Hormonal carcinogenesis. Carcinogenesis 21(3), 427–433 (2000)CrossRef Henderson, B.E., Feigelson, H.S.: Hormonal carcinogenesis. Carcinogenesis 21(3), 427–433 (2000)CrossRef
27.
Zurück zum Zitat Holzinger, A.: Human-Computer Interaction and Knowledge Discovery (HCI-KDD): what is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)CrossRef Holzinger, A.: Human-Computer Interaction and Knowledge Discovery (HCI-KDD): what is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)CrossRef
28.
Zurück zum Zitat Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics - state-of-the-art, future challenges and research directions. BMC Bioinformatics 15(Suppl 6), I1 (2014)CrossRef Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics - state-of-the-art, future challenges and research directions. BMC Bioinformatics 15(Suppl 6), I1 (2014)CrossRef
29.
Zurück zum Zitat Holzinger, A., Jurisica, I. (eds.): Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges, vol. 8401. Springer, Heidelberg (2014) Holzinger, A., Jurisica, I. (eds.): Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges, vol. 8401. Springer, Heidelberg (2014)
30.
Zurück zum Zitat Holzinger, A., Jurisica, I.: Knowledge discovery and data mining in biomedical informatics: the future is in integrative, interactive machine learning solutions. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 1–18. Springer, Heidelberg (2014)CrossRef Holzinger, A., Jurisica, I.: Knowledge discovery and data mining in biomedical informatics: the future is in integrative, interactive machine learning solutions. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 1–18. Springer, Heidelberg (2014)CrossRef
31.
Zurück zum Zitat Holzinger, A., Malle, B., Giuliani, N.: On graph extraction from image data. In: Ślȩzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS, vol. 8609, pp. 552–563. Springer, Heidelberg (2014) Holzinger, A., Malle, B., Giuliani, N.: On graph extraction from image data. In: Ślȩzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS, vol. 8609, pp. 552–563. Springer, Heidelberg (2014)
32.
Zurück zum Zitat Holzinger, A., Ofner, B., Dehmer, M.: Multi-touch graph-based interaction for knowledge discovery on mobile devices: state-of-the-art and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 241–254. Springer, Heidelberg (2014)CrossRef Holzinger, A., Ofner, B., Dehmer, M.: Multi-touch graph-based interaction for knowledge discovery on mobile devices: state-of-the-art and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 241–254. Springer, Heidelberg (2014)CrossRef
33.
Zurück zum Zitat Holzinger, A., Ofner, B., Stocker, C., Calero Valdez, A., Schaar, A.K., Ziefle, M., Dehmer, M.: On graph entropy measures for knowledge discovery from publication network data. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 354–362. Springer, Heidelberg (2013)CrossRef Holzinger, A., Ofner, B., Stocker, C., Calero Valdez, A., Schaar, A.K., Ziefle, M., Dehmer, M.: On graph entropy measures for knowledge discovery from publication network data. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 354–362. Springer, Heidelberg (2013)CrossRef
34.
Zurück zum Zitat Holzinger, A., Stocker, C., Dehmer, M.: Big complex biomedical data: towards a taxonomy of data. In: Obaidat, M.S., Filipe, J. (eds.) Communications in Computer and Information Science CCIS 455, pp. 3–18. Springer, Heidelberg (2014) Holzinger, A., Stocker, C., Dehmer, M.: Big complex biomedical data: towards a taxonomy of data. In: Obaidat, M.S., Filipe, J. (eds.) Communications in Computer and Information Science CCIS 455, pp. 3–18. Springer, Heidelberg (2014)
35.
Zurück zum Zitat Huppertz, B., Holzinger, A.: Biobanks – a source of large biological data sets: open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 317–330. Springer, Heidelberg (2014)CrossRef Huppertz, B., Holzinger, A.: Biobanks – a source of large biological data sets: open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 317–330. Springer, Heidelberg (2014)CrossRef
36.
Zurück zum Zitat Jacob, L., Obozinski, G., Vert, J.P.: Group Lasso with overlap and graph Lasso. In: Proceedings of the 26th International Conference on Machine Learning (ICML), pp. 433–440 (2009) Jacob, L., Obozinski, G., Vert, J.P.: Group Lasso with overlap and graph Lasso. In: Proceedings of the 26th International Conference on Machine Learning (ICML), pp. 433–440 (2009)
37.
Zurück zum Zitat Javanmard, A., Montanari, A.: Model selection for high-dimensional regression under the generalized irrepresentability condition. BMC Bioinformatics 26, 3012–3020 (2013) Javanmard, A., Montanari, A.: Model selection for high-dimensional regression under the generalized irrepresentability condition. BMC Bioinformatics 26, 3012–3020 (2013)
38.
Zurück zum Zitat Joachims, T., Finley, T., Yu, C.N.: Cutting-plane training of structural SVMs. BMC Bioinformatics 77(1), 27–59 (2009)MATH Joachims, T., Finley, T., Yu, C.N.: Cutting-plane training of structural SVMs. BMC Bioinformatics 77(1), 27–59 (2009)MATH
39.
Zurück zum Zitat Kleinberg, J.: Navigation in a small world. Nature 406(6798), 845–845 (2000)CrossRef Kleinberg, J.: Navigation in a small world. Nature 406(6798), 845–845 (2000)CrossRef
40.
Zurück zum Zitat Klopocki, E., Kristiansen, G., Wild, P.J., Klaman, I., Castanos-Velez, E., Singer, G., Stöhr, R., Simon, R., Sauter, G., Leibiger, H., Essers, L., Weber, B., Hermann, K., Rosenthal, A., Hartmann, A., Dahl, E.: Loss of SFRP1 is associated with breast cancer progression and poor prognosis in early stage tumors. Nature 25(3), 641–649 (2004) Klopocki, E., Kristiansen, G., Wild, P.J., Klaman, I., Castanos-Velez, E., Singer, G., Stöhr, R., Simon, R., Sauter, G., Leibiger, H., Essers, L., Weber, B., Hermann, K., Rosenthal, A., Hartmann, A., Dahl, E.: Loss of SFRP1 is associated with breast cancer progression and poor prognosis in early stage tumors. Nature 25(3), 641–649 (2004)
42.
Zurück zum Zitat Koontz, W., Narendra, P., Fukunaga, K.: A graph-theoretic approach to nonparametric cluster analysis. Nature 100(9), 936–944 (1976)MathSciNetMATH Koontz, W., Narendra, P., Fukunaga, K.: A graph-theoretic approach to nonparametric cluster analysis. Nature 100(9), 936–944 (1976)MathSciNetMATH
43.
Zurück zum Zitat Kumpulainen, S., Jarvelin, K.: Barriers to task-based information access in molecular medicine. Nature 63(1), 86–97 (2012) Kumpulainen, S., Jarvelin, K.: Barriers to task-based information access in molecular medicine. Nature 63(1), 86–97 (2012)
44.
Zurück zum Zitat Kurgan, L.A., Musilek, P.: A survey of knowledge discovery and data mining process models. Nature 21(01), 1–24 (2006) Kurgan, L.A., Musilek, P.: A survey of knowledge discovery and data mining process models. Nature 21(01), 1–24 (2006)
45.
Zurück zum Zitat Lauritzen, S.L.: Graphical Models. Oxford University Press, Oxford (1996)MATH Lauritzen, S.L.: Graphical Models. Oxford University Press, Oxford (1996)MATH
46.
Zurück zum Zitat Law, V., Knox, C., Djoumbou, Y., Jewison, T., Guo, A.C., Liu, Y.F., Maciejewski, A., Arndt, D., Wilson, M., Neveu, V., Tang, A., Gabriel, G., Ly, C., Adamjee, S., Dame, Z.T., Han, B.S., Zhou, Y., Wishart, D.S.: Drugbank 4.0: shedding new light on drug metabolism. Nature 42(D1), D1091–D1097 (2014) Law, V., Knox, C., Djoumbou, Y., Jewison, T., Guo, A.C., Liu, Y.F., Maciejewski, A., Arndt, D., Wilson, M., Neveu, V., Tang, A., Gabriel, G., Ly, C., Adamjee, S., Dame, Z.T., Han, B.S., Zhou, Y., Wishart, D.S.: Drugbank 4.0: shedding new light on drug metabolism. Nature 42(D1), D1091–D1097 (2014)
47.
Zurück zum Zitat Lee, S.: Sparse inverse covariance estimation for graph representation of feature structure. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 227–240. Springer, Heidelberg (2014)CrossRef Lee, S.: Sparse inverse covariance estimation for graph representation of feature structure. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 227–240. Springer, Heidelberg (2014)CrossRef
48.
Zurück zum Zitat Lee, S.: Signature selection for grouped features with a case study on exon microarrays. In: Stańczyk, U., Jain, L.C. (eds.) Feature Selection for Data and Pattern Classification, pp. 329–349. Springer, Heidelberg (2015) Lee, S.: Signature selection for grouped features with a case study on exon microarrays. In: Stańczyk, U., Jain, L.C. (eds.) Feature Selection for Data and Pattern Classification, pp. 329–349. Springer, Heidelberg (2015)
49.
Zurück zum Zitat Lee, S., Wright, S.J.: Manifold identification in dual averaging methods for regularized stochastic online learning. Nature 13, 1705–1744 (2012)MathSciNetMATH Lee, S., Wright, S.J.: Manifold identification in dual averaging methods for regularized stochastic online learning. Nature 13, 1705–1744 (2012)MathSciNetMATH
50.
Zurück zum Zitat Lilla, C., Koehler, T., Kropp, S., Wang-Gohrke, S., Chang-Claude, J.: Alcohol dehydrogenase 1B (ADH1B) genotype, alcohol consumption and breast cancer risk by age 50 years in a german case-control study. Nature 92(11), 2039–2041 (2005) Lilla, C., Koehler, T., Kropp, S., Wang-Gohrke, S., Chang-Claude, J.: Alcohol dehydrogenase 1B (ADH1B) genotype, alcohol consumption and breast cancer risk by age 50 years in a german case-control study. Nature 92(11), 2039–2041 (2005)
51.
Zurück zum Zitat Lodhi, H., Saunders, C., Shawe-Taylor, J., Watkins, N.C.C.: Text classification using string kernels. Nature 2, 419–444 (2002)MATH Lodhi, H., Saunders, C., Shawe-Taylor, J., Watkins, N.C.C.: Text classification using string kernels. Nature 2, 419–444 (2002)MATH
52.
Zurück zum Zitat Ma, K.L., Muelder, C.W.: Large-scale graph visualization and analytics. Nature 46(7), 39–46 (2013) Ma, K.L., Muelder, C.W.: Large-scale graph visualization and analytics. Nature 46(7), 39–46 (2013)
53.
Zurück zum Zitat Mattmann, C.A.: Computing: a vision for data science. Nature 493(7433), 473–475 (2013)CrossRef Mattmann, C.A.: Computing: a vision for data science. Nature 493(7433), 473–475 (2013)CrossRef
54.
Zurück zum Zitat McCall, M., Murakami, P., Lukk, M., Huber, W., Irizarry, R.: Assessing affymetrix genechip microarray quality. BMC Bioinformatics 12(1), 137 (2011)CrossRef McCall, M., Murakami, P., Lukk, M., Huber, W., Irizarry, R.: Assessing affymetrix genechip microarray quality. BMC Bioinformatics 12(1), 137 (2011)CrossRef
55.
Zurück zum Zitat McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). BMC Bioinformatics 11(2), 242–253 (2010) McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). BMC Bioinformatics 11(2), 242–253 (2010)
56.
Zurück zum Zitat Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the Lasso. BMC Bioinformatics 34, 1436–1462 (2006)MathSciNetMATH Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the Lasso. BMC Bioinformatics 34, 1436–1462 (2006)MathSciNetMATH
57.
Zurück zum Zitat Meinshausen, N., Bühlmann, P.: Stability selection. BMC Bioinformatics 72(4), 417–473 (2010)MathSciNet Meinshausen, N., Bühlmann, P.: Stability selection. BMC Bioinformatics 72(4), 417–473 (2010)MathSciNet
58.
Zurück zum Zitat Müller, R.: Medikamente und Richtwerte in der Notfallmedizin, 11th edn. Ralf Müller Verlag, Graz (2012) Müller, R.: Medikamente und Richtwerte in der Notfallmedizin, 11th edn. Ralf Müller Verlag, Graz (2012)
59.
Zurück zum Zitat Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate \(o(1/k^2)\). Soviet Math. Dokl. 27(2), 372–376 (1983)MATH Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate \(o(1/k^2)\). Soviet Math. Dokl. 27(2), 372–376 (1983)MATH
60.
Zurück zum Zitat Niakšu, O., Kurasova, O.: Data mining applications in healthcare: research vs practice. In: Databases and Information Systems Baltic DB & IS 2012, p. 58 (2012) Niakšu, O., Kurasova, O.: Data mining applications in healthcare: research vs practice. In: Databases and Information Systems Baltic DB & IS 2012, p. 58 (2012)
61.
Zurück zum Zitat Otasek, D., Pastrello, C., Holzinger, A., Jurisica, I.: Visual data mining: effective exploration of the biological universe. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 19–33. Springer, Heidelberg (2014)CrossRef Otasek, D., Pastrello, C., Holzinger, A., Jurisica, I.: Visual data mining: effective exploration of the biological universe. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 19–33. Springer, Heidelberg (2014)CrossRef
62.
Zurück zum Zitat Preuß, M., Dehmer, M., Pickl, S., Holzinger, A.: On terrain coverage optimization by using a network approach for universal graph-based data mining and knowledge discovery. In: Ślȩzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS, vol. 8609, pp. 564–573. Springer, Heidelberg (2014) Preuß, M., Dehmer, M., Pickl, S., Holzinger, A.: On terrain coverage optimization by using a network approach for universal graph-based data mining and knowledge discovery. In: Ślȩzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS, vol. 8609, pp. 564–573. Springer, Heidelberg (2014)
63.
Zurück zum Zitat Schoenauer, M., Akrour, R., Sebag, M., Souplet, J.C.: Programming by feedback. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 1503–1511 (2014) Schoenauer, M., Akrour, R., Sebag, M., Souplet, J.C.: Programming by feedback. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 1503–1511 (2014)
64.
Zurück zum Zitat Spinrad, N.: Google car takes the test. Nature 514(7523), 528–528 (2014)CrossRef Spinrad, N.: Google car takes the test. Nature 514(7523), 528–528 (2014)CrossRef
65.
Zurück zum Zitat Strogatz, S.: Exploring complex networks. Nature 410(6825), 268–276 (2001)CrossRef Strogatz, S.: Exploring complex networks. Nature 410(6825), 268–276 (2001)CrossRef
66.
Zurück zum Zitat Tibshirani, R.: Regression shrinkage and selection via the Lasso. Nature 58, 267–288 (1996)MathSciNetMATH Tibshirani, R.: Regression shrinkage and selection via the Lasso. Nature 58, 267–288 (1996)MathSciNetMATH
67.
Zurück zum Zitat Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. Nature 109(3), 475–494 (2001)MathSciNetMATH Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. Nature 109(3), 475–494 (2001)MathSciNetMATH
68.
Zurück zum Zitat Vandenberghe, L., Boyd, S., Wu, S.P.: Determinant maximization with linear matrix inequality constraints. Nature 19(2), 499–533 (1998)MathSciNetMATH Vandenberghe, L., Boyd, S., Wu, S.P.: Determinant maximization with linear matrix inequality constraints. Nature 19(2), 499–533 (1998)MathSciNetMATH
69.
Zurück zum Zitat Wagner, H., Dłotko, P., Mrozek, M.: Computational topology in text mining. In: Ferri, M., Frosini, P., Landi, C., Cerri, A., Di Fabio, B. (eds.) CTIC 2012. LNCS, vol. 7309, pp. 68–78. Springer, Heidelberg (2012)CrossRef Wagner, H., Dłotko, P., Mrozek, M.: Computational topology in text mining. In: Ferri, M., Frosini, P., Landi, C., Cerri, A., Di Fabio, B. (eds.) CTIC 2012. LNCS, vol. 7309, pp. 68–78. Springer, Heidelberg (2012)CrossRef
70.
Zurück zum Zitat Washio, T., Motoda, H.: State of the art of graph-based data mining. Nature 5(1), 59 (2003) Washio, T., Motoda, H.: State of the art of graph-based data mining. Nature 5(1), 59 (2003)
71.
Zurück zum Zitat Wishart, D.S., Knox, C., Guo, A.C., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z., Woolsey, J.: Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nature 34, D668–D672 (2006) Wishart, D.S., Knox, C., Guo, A.C., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z., Woolsey, J.: Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nature 34, D668–D672 (2006)
72.
Zurück zum Zitat Wittkop, T., Emig, D., Truss, A., Albrecht, M., Boecker, S., Baumbach, J.: Comprehensive cluster analysis with transitivity clustering. Nature 6(3), 285–295 (2011) Wittkop, T., Emig, D., Truss, A., Albrecht, M., Boecker, S., Baumbach, J.: Comprehensive cluster analysis with transitivity clustering. Nature 6(3), 285–295 (2011)
73.
Zurück zum Zitat Yoshida, K., Motoda, H., Indurkhya, N.: Graph-based induction as a unified learning framework. Nature 4(3), 297–316 (1994) Yoshida, K., Motoda, H., Indurkhya, N.: Graph-based induction as a unified learning framework. Nature 4(3), 297–316 (1994)
74.
Zurück zum Zitat Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Nature 68, 49–67 (2006)MathSciNetMATH Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Nature 68, 49–67 (2006)MathSciNetMATH
75.
76.
77.
Zurück zum Zitat Zhengxiang, Z., Jifa, G., Wenxin, Y., Xingsen, L.: Toward domain-driven data mining. In: International Symposium on Intelligent Information Technology Application Workshops, pp. 44–48 (2008) Zhengxiang, Z., Jifa, G., Wenxin, Y., Xingsen, L.: Toward domain-driven data mining. In: International Symposium on Intelligent Information Technology Application Workshops, pp. 44–48 (2008)
78.
Zurück zum Zitat Zhu, X.: Persistent homology: an introduction and a new text representation for natural language processing. In: IJCAI, IJCAI/AAAI (2013) Zhu, X.: Persistent homology: an introduction and a new text representation for natural language processing. In: IJCAI, IJCAI/AAAI (2013)
79.
Zurück zum Zitat Zou, H.: The adaptive Lasso and its Oracle properties. Biometrika 101(476), 1418–1429 (2006)MathSciNetMATH Zou, H.: The adaptive Lasso and its Oracle properties. Biometrika 101(476), 1418–1429 (2006)MathSciNetMATH
80.
Zurück zum Zitat Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Biometrika 67, 301–320 (2005)MathSciNetMATH Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Biometrika 67, 301–320 (2005)MathSciNetMATH
81.
Zurück zum Zitat Zudilova-Seinstra, E., Adriaansen, T.: Visualisation and interaction for scientific exploration and knowledge discovery. Biometrika 13(2), 115–117 (2007) Zudilova-Seinstra, E., Adriaansen, T.: Visualisation and interaction for scientific exploration and knowledge discovery. Biometrika 13(2), 115–117 (2007)
Metadaten
Titel
Knowledge Discovery from Complex High Dimensional Data
verfasst von
Sangkyun Lee
Andreas Holzinger
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-41706-6_7