Skip to main content

2017 | OriginalPaper | Buchkapitel

Characterising the Influence of Rule-Based Knowledge Representations in Biological Knowledge Extraction from Transcriptomics Data

verfasst von : Simon Baron, Nicola Lazzarini, Jaume Bacardit

Erschienen in: Applications of Evolutionary Computation

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Currently, there is a wealth of biotechnologies (e.g. sequencing, proteomics, lipidomics) able to generate a broad range of data types out of biological samples. However, the knowledge gained from such data sources is constrained by the limitations of the analytics techniques. The state-of-the-art machine learning algorithms are able to capture complex patterns with high prediction capacity. However, often it is very difficult if not impossible to extract human-understandable knowledge out of these patterns. In recent years evolutionary machine learning techniques have shown that they are competent methods for biological/biomedical data analytics. They are able to generate interpretable prediction models and, beyond just prediction models, they are able to extract useful knowledge in the form of biomarkers or biological networks.
The focus of this paper is to thoroughly characterise the impact that a core component of the evolutionary machine learning process, its knowledge representations, has in the process of extracting biologically-useful knowledge out of transcriptomics datasets. Using the FuNeL evolutionary machine learning-based network inference method, we evaluate several variants of rule knowledge representations on a range of transcriptomics datasets to quantify the volume and complementarity of the knowledge that each of them can extract. Overall we show that knowledge representations, often considered a minor detail, greatly impact on the downstream biological knowledge extraction process.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bacardit, J., Burke, E.K., Krasnogor, N.: Improving the scalability of rule-based evolutionary learning. Memet. Comput. 1, 55–67 (2009)CrossRef Bacardit, J., Burke, E.K., Krasnogor, N.: Improving the scalability of rule-based evolutionary learning. Memet. Comput. 1, 55–67 (2009)CrossRef
2.
Zurück zum Zitat Bacardit, J., Garrell, J.M.: Bloat control and generalization pressure using the minimum description length principle for a Pittsburgh approach learning classifier system. In: Kovacs, T., Llorà, X., Takadama, K., Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2003–2005. LNCS (LNAI), vol. 4399, pp. 59–79. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71231-2_5CrossRef Bacardit, J., Garrell, J.M.: Bloat control and generalization pressure using the minimum description length principle for a Pittsburgh approach learning classifier system. In: Kovacs, T., Llorà, X., Takadama, K., Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2003–2005. LNCS (LNAI), vol. 4399, pp. 59–79. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-71231-2_​5CrossRef
3.
Zurück zum Zitat Bacardit, J., Krasnogor, N.: Empirical evaluation of ensemble techniques for a Pittsburgh learning classifier system. In: Bacardit, J., Bernadó-Mansilla, E., Butz, M.V., Kovacs, T., Llorà, X., Takadama, K. (eds.) IWLCS 2006–2007. LNCS (LNAI), vol. 4998, pp. 255–268. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88138-4_15CrossRef Bacardit, J., Krasnogor, N.: Empirical evaluation of ensemble techniques for a Pittsburgh learning classifier system. In: Bacardit, J., Bernadó-Mansilla, E., Butz, M.V., Kovacs, T., Llorà, X., Takadama, K. (eds.) IWLCS 2006–2007. LNCS (LNAI), vol. 4998, pp. 255–268. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-88138-4_​15CrossRef
4.
Zurück zum Zitat Bacardit, J., Stout, M., Hirst, J.D., Valencia, A., Smith, R.E., Krasnogor, N.: Automated alphabet reduction for protein datasets. BMC Bioinform. 10, 6 (2009)CrossRef Bacardit, J., Stout, M., Hirst, J.D., Valencia, A., Smith, R.E., Krasnogor, N.: Automated alphabet reduction for protein datasets. BMC Bioinform. 10, 6 (2009)CrossRef
5.
Zurück zum Zitat Bacardit, J., Widera, P., Márquez-Chamorro, A., Divina, F., Aguilar-Ruiz, J.S., Krasnogor, N.: Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features. Bioinformatics 28(19), 2441–2448 (2012)CrossRef Bacardit, J., Widera, P., Márquez-Chamorro, A., Divina, F., Aguilar-Ruiz, J.S., Krasnogor, N.: Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features. Bioinformatics 28(19), 2441–2448 (2012)CrossRef
6.
Zurück zum Zitat Bassel, G.W., Glaab, E., Marquez, J., Holdsworth, M.J., Bacardit, J.: Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets. Plant Cell 23(9), 3101–3116 (2011)CrossRef Bassel, G.W., Glaab, E., Marquez, J., Holdsworth, M.J., Bacardit, J.: Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets. Plant Cell 23(9), 3101–3116 (2011)CrossRef
7.
Zurück zum Zitat Beer, D.G., Kardia, S.L.R., Huang, C.C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J.M.G., Iannettoni, M.D., Orringer, M.B., Hanash, S.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8(8), 816–824 (2002)CrossRef Beer, D.G., Kardia, S.L.R., Huang, C.C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J.M.G., Iannettoni, M.D., Orringer, M.B., Hanash, S.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8(8), 816–824 (2002)CrossRef
8.
Zurück zum Zitat Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., Yu, J., Wang, Y., Mazumder, A.: Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. J. Mol. Diagn.: JMD 8(1), 31–39 (2006)CrossRef Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., Yu, J., Wang, Y., Mazumder, A.: Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. J. Mol. Diagn.: JMD 8(1), 31–39 (2006)CrossRef
9.
Zurück zum Zitat Fainberg, H.P., Bodley, K., Bacardit, J., Li, D., Wessely, F., Mongan, N.P., Symonds, M.E., Clarke, L., Mostyn, A.: Reduced neonatal mortality in Meishan piglets: a role for hepatic fatty acids? PLoS One 7(11), 1–9 (2012)CrossRef Fainberg, H.P., Bodley, K., Bacardit, J., Li, D., Wessely, F., Mongan, N.P., Symonds, M.E., Clarke, L., Mostyn, A.: Reduced neonatal mortality in Meishan piglets: a role for hepatic fatty acids? PLoS One 7(11), 1–9 (2012)CrossRef
10.
Zurück zum Zitat Glaab, E., Bacardit, J., Garibaldi, J.M., Krasnogor, N.: Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS One 7(7), e39932 (2012)CrossRef Glaab, E., Bacardit, J., Garibaldi, J.M., Krasnogor, N.: Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS One 7(7), e39932 (2012)CrossRef
11.
Zurück zum Zitat Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRef Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRef
12.
Zurück zum Zitat Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17), 4963–4967 (2002) Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17), 4963–4967 (2002)
13.
Zurück zum Zitat Hemberg, E., Veeramachaneni, K., Dernoncourt, F., Wagy, M., O’Reilly, U.M.: Efficient training set use for blood pressure prediction in a large scale learning classifier system. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO 2013 Companion, pp. 1267–1274. ACM, New York (2013) Hemberg, E., Veeramachaneni, K., Dernoncourt, F., Wagy, M., O’Reilly, U.M.: Efficient training set use for blood pressure prediction in a large scale learning classifier system. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO 2013 Companion, pp. 1267–1274. ACM, New York (2013)
14.
Zurück zum Zitat Lazzarini, N., Widera, P., Williamson, S., Heer, R., Krasnogor, N., Bacardit, J.: Functional networks inference from rule-based machine learning models. BioData Min. 9(1), 28 (2016)CrossRef Lazzarini, N., Widera, P., Williamson, S., Heer, R., Krasnogor, N., Bacardit, J.: Functional networks inference from rule-based machine learning models. BioData Min. 9(1), 28 (2016)CrossRef
15.
Zurück zum Zitat Marcozzi, M., Divina, F., Aguilar-Ruiz, J.S., Vanhoof, W.: A novel probabilistic encoding for EAs applied to biclustering of microarray data. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, GECCO 2011, pp. 339–346. ACM, New York (2011) Marcozzi, M., Divina, F., Aguilar-Ruiz, J.S., Vanhoof, W.: A novel probabilistic encoding for EAs applied to biclustering of microarray data. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, GECCO 2011, pp. 339–346. ACM, New York (2011)
16.
Zurück zum Zitat Martinez-Ballesteros, M., Nepomuceno-Chamorro, I.A., Riquelme, J.C.: Discovering gene association networks by multi-objective evolutionary quantitative association rules. J. Comput. Syst. Sci. 80, 118–136 (2013)MathSciNetCrossRefMATH Martinez-Ballesteros, M., Nepomuceno-Chamorro, I.A., Riquelme, J.C.: Discovering gene association networks by multi-objective evolutionary quantitative association rules. J. Comput. Syst. Sci. 80, 118–136 (2013)MathSciNetCrossRefMATH
17.
Zurück zum Zitat Mi, H., Poudel, S., Muruganujan, A., Casagrande, J.T., Thomas, P.D.: Panther version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 44(D1), D336–D342 (2016)CrossRef Mi, H., Poudel, S., Muruganujan, A., Casagrande, J.T., Thomas, P.D.: Panther version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 44(D1), D336–D342 (2016)CrossRef
18.
Zurück zum Zitat Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)CrossRef Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)CrossRef
19.
20.
Zurück zum Zitat Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C.T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S., Lander, E.S., Aster, J.C., Golub, T.R.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8(1), 68–74 (2002)CrossRef Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C.T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S., Lander, E.S., Aster, J.C., Golub, T.R.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8(1), 68–74 (2002)CrossRef
21.
Zurück zum Zitat Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)CrossRef Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)CrossRef
22.
Zurück zum Zitat Swan, A.L., Stekel, D.J., Hodgman, C., Allaway, D., Alqahtani, M.H., Mobasheri, A., Bacardit, J.: A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data. BMC Genom. 16(1), S2 (2015)CrossRef Swan, A.L., Stekel, D.J., Hodgman, C., Allaway, D., Alqahtani, M.H., Mobasheri, A., Bacardit, J.: A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data. BMC Genom. 16(1), S2 (2015)CrossRef
23.
Zurück zum Zitat Urbanowicz, R.J., Granizo-Mackenzie, A., Moore, J.H.: An analysis pipeline with statistical and visualization-guided knowledge discovery for Michigan-style learning classifier systems. IEEE Comp. Int. Mag. 7(4), 35–45 (2012)CrossRef Urbanowicz, R.J., Granizo-Mackenzie, A., Moore, J.H.: An analysis pipeline with statistical and visualization-guided knowledge discovery for Michigan-style learning classifier systems. IEEE Comp. Int. Mag. 7(4), 35–45 (2012)CrossRef
24.
Zurück zum Zitat Urbanowicz, R.J., Andrew, A.S., Karagas, M.R., Moore, J.H.: Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach. J. Am. Med. Inform. Assoc. 20(4), 603612 (2013)CrossRef Urbanowicz, R.J., Andrew, A.S., Karagas, M.R., Moore, J.H.: Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach. J. Am. Med. Inform. Assoc. 20(4), 603612 (2013)CrossRef
25.
Zurück zum Zitat Venturini, G.: SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 280–296. Springer, Heidelberg (1993). doi:10.1007/3-540-56602-3_142CrossRef Venturini, G.: SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 280–296. Springer, Heidelberg (1993). doi:10.​1007/​3-540-56602-3_​142CrossRef
26.
Zurück zum Zitat Yagi, T., Morimoto, A., Eguchi, M., Hibi, S., Sako, M., Ishii, E., Mizutani, S., Imashuku, S., Ohki, M., Ichikawa, H.: Identification of a gene expression signature associated with pediatric AML prognosis. Blood 102(5), 1849–1856 (2003)CrossRef Yagi, T., Morimoto, A., Eguchi, M., Hibi, S., Sako, M., Ishii, E., Mizutani, S., Imashuku, S., Ohki, M., Ichikawa, H.: Identification of a gene expression signature associated with pediatric AML prognosis. Blood 102(5), 1849–1856 (2003)CrossRef
Metadaten
Titel
Characterising the Influence of Rule-Based Knowledge Representations in Biological Knowledge Extraction from Transcriptomics Data
verfasst von
Simon Baron
Nicola Lazzarini
Jaume Bacardit
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-55849-3_9