Skip to main content
Top

2015 | OriginalPaper | Chapter

Data Driven Geometry for Learning

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

High dimensional covariate information provides a detailed description of any individuals involved in a machine learning and classification problem. The inter-dependence patterns among these covariate vectors may be unknown to researchers. This fact is not well recognized in classic and modern machine learning literature; most model-based popular algorithms are implemented using some version of the dimension-reduction approach or even impose a built-in complexity penalty. This is a defensive attitude toward the high dimensionality. In contrast, an accommodating attitude can exploit such potential inter-dependence patterns embedded within the high dimensionality. In this research, we implement this latter attitude throughout by first computing the similarity between data nodes and then discovering pattern information in the form of Ultrametric tree geometry among almost all the covariate dimensions involved. We illustrate with real Microarray datasets, where we demonstrate that such dual-relationships are indeed class specific, each precisely representing the discovery of a biomarker. The whole collection of computed biomarkers constitutes a global feature-matrix, which is then shown to give rise to a very effective learning algorithm.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. 96(12), 6745–6750 (1999)CrossRef Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. 96(12), 6745–6750 (1999)CrossRef
2.
go back to reference Bagirov, A.M., Ferguson, B., Ivkovic, S., Saunders, G., Yearwood, J.: New algorithms for multi-class cancer diagnosis using tumor gene expression signatures. Bioinformatics 19(14), 1800–1807 (2003)CrossRef Bagirov, A.M., Ferguson, B., Ivkovic, S., Saunders, G., Yearwood, J.: New algorithms for multi-class cancer diagnosis using tumor gene expression signatures. Bioinformatics 19(14), 1800–1807 (2003)CrossRef
3.
go back to reference Basford, K.E., McLachlan, G.J., Rathnayake, S.I.: On the classification of microarray gene-expression data. Briefings Bioinf. 14(4), 402–410 (2013)CrossRef Basford, K.E., McLachlan, G.J., Rathnayake, S.I.: On the classification of microarray gene-expression data. Briefings Bioinf. 14(4), 402–410 (2013)CrossRef
4.
go back to reference Ben-Dor, A., Bruhn, L., Laboratories, A., Friedman, N., Schummer, M., Nachman, I., Washington, U., Washington, U., Yakhini, Z.: Tissue classification with gene expression profiles. J. Comput. Biol. 7, 559–584 (2000)CrossRef Ben-Dor, A., Bruhn, L., Laboratories, A., Friedman, N., Schummer, M., Nachman, I., Washington, U., Washington, U., Yakhini, Z.: Tissue classification with gene expression profiles. J. Comput. Biol. 7, 559–584 (2000)CrossRef
5.
go back to reference Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6(3–4), 281–297 (1999)CrossRef Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6(3–4), 281–297 (1999)CrossRef
6.
go back to reference Bicciato, S., Luchini, A., Di Bello, C.: PCA disjoint models for multiclass cancer analysis using gene expression data. Bioinf. 19(5), 571–578 (2003)CrossRef Bicciato, S., Luchini, A., Di Bello, C.: PCA disjoint models for multiclass cancer analysis using gene expression data. Bioinf. 19(5), 571–578 (2003)CrossRef
7.
go back to reference Chen, C.P., Fushing, H., Atwill, R., Koehl, P.: biDCG: a new method for discovering global features of dna microarray data via an iterative re-clustering procedure. PloS One 9(7), 102445 (2014)CrossRef Chen, C.P., Fushing, H., Atwill, R., Koehl, P.: biDCG: a new method for discovering global features of dna microarray data via an iterative re-clustering procedure. PloS One 9(7), 102445 (2014)CrossRef
8.
go back to reference Chen, L., Yang, J., Li, J., Wang, X.: Multinomial regression with elastic net penalty and its grouping effect in gene selection. Abstr. Appl. Anal. 2014, 1–7 (2014)MathSciNet Chen, L., Yang, J., Li, J., Wang, X.: Multinomial regression with elastic net penalty and its grouping effect in gene selection. Abstr. Appl. Anal. 2014, 1–7 (2014)MathSciNet
9.
go back to reference Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network classification models: a methodology review. J. Biomed. Inf. 35(5–6), 352–359 (2002)CrossRef Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network classification models: a methodology review. J. Biomed. Inf. 35(5–6), 352–359 (2002)CrossRef
10.
go back to reference Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. PNAS 95(25), 14863–14868 (1998)CrossRef Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. PNAS 95(25), 14863–14868 (1998)CrossRef
11.
go back to reference Fushing, H., McAssey, M.P.: Time, temperature, and data cloud geometry. Phys. Rev. E 82(6), 061110 (2010)CrossRef Fushing, H., McAssey, M.P.: Time, temperature, and data cloud geometry. Phys. Rev. E 82(6), 061110 (2010)CrossRef
12.
go back to reference Fushing, H., Wang, H., Vanderwaal, K., McCowan, B., Koehl, P.: Multi-scale clustering by building a robust and self correcting ultrametric topology on data points. PLoS ONE 8(2), e56259 (2013)CrossRef Fushing, H., Wang, H., Vanderwaal, K., McCowan, B., Koehl, P.: Multi-scale clustering by building a robust and self correcting ultrametric topology on data points. PLoS ONE 8(2), e56259 (2013)CrossRef
13.
go back to reference Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. USA 97(22), 12079–12084 (2000)CrossRef Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. USA 97(22), 12079–12084 (2000)CrossRef
14.
go back to reference Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRef Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRef
15.
go back to reference Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17), 4963–4967 (2002) Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17), 4963–4967 (2002)
16.
go back to reference Hedenfalk, I.A., Ringnér, M., Trent, J.M., Borg, A.: Gene expression in inherited breast cancer. Adv. Cancer Res. 84, 1–34 (2002)CrossRef Hedenfalk, I.A., Ringnér, M., Trent, J.M., Borg, A.: Gene expression in inherited breast cancer. Adv. Cancer Res. 84, 1–34 (2002)CrossRef
17.
go back to reference Huynh-Thu, V.A., Saeys, Y., Wehenkel, L., Geurts, P.: Statistical interpretation of machine learning-based feature importance scores for biomarker discovery. Bioinformatics 28(13), 1766–1774 (2012)CrossRef Huynh-Thu, V.A., Saeys, Y., Wehenkel, L., Geurts, P.: Statistical interpretation of machine learning-based feature importance scores for biomarker discovery. Bioinformatics 28(13), 1766–1774 (2012)CrossRef
18.
go back to reference Khan, J., Wei, J.S., Ringnér, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)CrossRef Khan, J., Wei, J.S., Ringnér, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)CrossRef
19.
go back to reference Liao, J., Chin, K.V.: Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15), 1945–1951 (2007)CrossRef Liao, J., Chin, K.V.: Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15), 1945–1951 (2007)CrossRef
20.
go back to reference Mahmoud, A.M., Maher, B.A., El-Horbaty, E.S.M., Salem, A.B.M.: Analysis of machine learning techniques for gene selection and classification of microarray data. In: The 6th International Conference on Information Technology (2013) Mahmoud, A.M., Maher, B.A., El-Horbaty, E.S.M., Salem, A.B.M.: Analysis of machine learning techniques for gene selection and classification of microarray data. In: The 6th International Conference on Information Technology (2013)
21.
go back to reference Nguyen, D.V., Rocke, D.M.: Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics 18(9), 1216–1226 (2002)CrossRef Nguyen, D.V., Rocke, D.M.: Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics 18(9), 1216–1226 (2002)CrossRef
22.
go back to reference Saber, H.B., Elloumi, M., Nadif, M.: Clustering Algorithms of Microarray Data. In: Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, pp. 557–568 (2013) Saber, H.B., Elloumi, M., Nadif, M.: Clustering Algorithms of Microarray Data. In: Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, pp. 557–568 (2013)
23.
go back to reference Shevade, S.K., Keerthi, S.S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17), 2246–2253 (2003)CrossRef Shevade, S.K., Keerthi, S.S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17), 2246–2253 (2003)CrossRef
24.
go back to reference Thalamuthu, A., Mukhopadhyay, I., Zheng, X., Tseng, G.C.: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19), 2405–2412 (2006)CrossRef Thalamuthu, A., Mukhopadhyay, I., Zheng, X., Tseng, G.C.: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19), 2405–2412 (2006)CrossRef
25.
go back to reference Wasson, J.H., Sox, H.C., Neff, R.K., Goldman, L.: Clinical prediction rules. Applications and methodological standards. New Engl. J. Med. 313(13), 793–799 (1985). PMID: 3897864CrossRef Wasson, J.H., Sox, H.C., Neff, R.K., Goldman, L.: Clinical prediction rules. Applications and methodological standards. New Engl. J. Med. 313(13), 793–799 (1985). PMID: 3897864CrossRef
26.
go back to reference Zhou, X., Liu, K.Y., Wong, S.T.: Cancer classification and prediction using logistic regression with bayesian gene selection. J. Biomed. Inform. 37(4), 249–259 (2004)CrossRef Zhou, X., Liu, K.Y., Wong, S.T.: Cancer classification and prediction using logistic regression with bayesian gene selection. J. Biomed. Inform. 37(4), 249–259 (2004)CrossRef
Metadata
Title
Data Driven Geometry for Learning
Author
Elizabeth P. Chou
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-21024-7_27

Premium Partner