Skip to main content
Top
Published in: Journal of Intelligent Information Systems 2/2018

20-04-2018

Network representation with clustering tree features

Authors: Konstantinos Pliakos, Celine Vens

Published in: Journal of Intelligent Information Systems | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Representing and inferring interaction networks is a challenging and long-standing problem. Modern technological advances have led to a great increase in both volume and complexity of generated network data. The size of networks such as drug protein interaction networks or gene regulatory networks is constantly growing and multiple sources of information are exploited to extract features describing the nodes in such networks. Modern information systems need therefore methods that are able to mine these networks and exploit the available features. Here, a novel data mining framework for network representation and mining is proposed. It is based on decision tree learning and ensembles of trees. The proposed scheme introduces an efficient network data representation, capable of addressing different data types, tackling as well data volume and complexity. The learning process follows the inductive setup and it can be performed in both a supervised or unsupervised manner. Experiments were conducted on six biomedical network datasets. The experimental evaluation demonstrates the merits of the proposed approach, confirming its efficiency.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G. (2008). Finding high-quality content in social media. In Proceedings of ACM international conference on Web search and data mining (pp. 183–194). Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G. (2008). Finding high-quality content in social media. In Proceedings of ACM international conference on Web search and data mining (pp. 183–194).
go back to reference Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.MATHCrossRef Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.MATHCrossRef
go back to reference Bleakley, K., Biau, G., Vert, J.P. (2007). Supervised reconstruction of biological networks with local models. Bioinformatics, 23(13), i57–i65.CrossRef Bleakley, K., Biau, G., Vert, J.P. (2007). Supervised reconstruction of biological networks with local models. Bioinformatics, 23(13), i57–i65.CrossRef
go back to reference Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1), 285–297.MathSciNetMATHCrossRef Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1), 285–297.MathSciNetMATHCrossRef
go back to reference Blockeel, H., De Raedt, L., Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (pp. 55–63). Blockeel, H., De Raedt, L., Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (pp. 55–63).
go back to reference Burges, C.J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.CrossRef Burges, C.J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.CrossRef
go back to reference Cai, H., Zheng, V.W., Chang, K. (2018). A comprehensive survey of graph embedding: problems, techniques and applications. IEEE Transactions on Knowledge and Data Engineering. Cai, H., Zheng, V.W., Chang, K. (2018). A comprehensive survey of graph embedding: problems, techniques and applications. IEEE Transactions on Knowledge and Data Engineering.
go back to reference Cao, L.J., Chua, K.S., Chong, W.K., Lee, H.P., Gu, Q.M. (2003). A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing, 55(1), 321–336. Cao, L.J., Chua, K.S., Chong, W.K., Lee, H.P., Gu, Q.M. (2003). A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing, 55(1), 321–336.
go back to reference Ceci, M., Pio, G., Kuzmanovski, V., Džeroski, S. (2015). Semi-supervised multi-view learning for gene network reconstruction. PloS One, 10(12), e0144031.CrossRef Ceci, M., Pio, G., Kuzmanovski, V., Džeroski, S. (2015). Semi-supervised multi-view learning for gene network reconstruction. PloS One, 10(12), e0144031.CrossRef
go back to reference Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S. (2007). Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5(1), e8.CrossRef Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S. (2007). Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5(1), e8.CrossRef
go back to reference Geurts, P., Ernst, D., Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.MATHCrossRef Geurts, P., Ernst, D., Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.MATHCrossRef
go back to reference Geurts, P., Irrthum, A., Wehenkel, L. (2009). Supervised learning with decision tree-based methods in computational and systems biology. Molecular Biosystems, 5 (12), 1593–1605.CrossRef Geurts, P., Irrthum, A., Wehenkel, L. (2009). Supervised learning with decision tree-based methods in computational and systems biology. Molecular Biosystems, 5 (12), 1593–1605.CrossRef
go back to reference Hase, T., Ghosh, S., Yamanaka, R., Kitano, H. (2013). Harnessing diversity towards the reconstructing of large scale gene regulatory networks. PLoS Computational Biology, 9(11), e1003361.CrossRef Hase, T., Ghosh, S., Yamanaka, R., Kitano, H. (2013). Harnessing diversity towards the reconstructing of large scale gene regulatory networks. PLoS Computational Biology, 9(11), e1003361.CrossRef
go back to reference He, H, & Garcia, E.A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge Data Engineering, 21(9), 1263—1284. He, H, & Garcia, E.A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge Data Engineering, 21(9), 1263—1284.
go back to reference Irrthum, A., Wehenkel, L., Geurts, P. (2010). Inferring regulatory networks from expression data using tree-based methods. PloS One, 5(9), e12776.CrossRef Irrthum, A., Wehenkel, L., Geurts, P. (2010). Inferring regulatory networks from expression data using tree-based methods. PloS One, 5(9), e12776.CrossRef
go back to reference Joly, A., Geurts, P., Wehenkel, L. (2014). Random forests with random projections of the output space for high dimensional multi-label classification. In Machine learning and knowledge discovery in databases (ECML PKDD) (pp. 607–622). Nancy. Joly, A., Geurts, P., Wehenkel, L. (2014). Random forests with random projections of the output space for high dimensional multi-label classification. In Machine learning and knowledge discovery in databases (ECML PKDD) (pp. 607–622). Nancy.
go back to reference Kocev, D., & Ceci, M. (2015). Ensembles of extremely randomized trees for multi-target regression. In Japkowicz, N., & Matwin, S. (Eds.) Discovery science. Lecture notes in computer science, Vol. 9356. Cham: Springer. Kocev, D., & Ceci, M. (2015). Ensembles of extremely randomized trees for multi-target regression. In Japkowicz, N., & Matwin, S. (Eds.) Discovery science. Lecture notes in computer science, Vol. 9356. Cham: Springer.
go back to reference Kocev, D., Vens, C., Struyf, J., Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.CrossRef Kocev, D., Vens, C., Struyf, J., Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.CrossRef
go back to reference Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.MathSciNetMATH Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.MathSciNetMATH
go back to reference MacIsaac, K.D., Wang, T., Gordon, D.B., Gifford, D.K., Stormo, G.D., Fraenkel, E. (2006). An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics, 7(1), 1.CrossRef MacIsaac, K.D., Wang, T., Gordon, D.B., Gifford, D.K., Stormo, G.D., Fraenkel, E. (2006). An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics, 7(1), 1.CrossRef
go back to reference Maetschke, S.R., Madhamshettiwar, P.B., Davis, M.J., Ragan, M.A. (2014). Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Briefings in Bioinformatics, 15(2), 195–211.CrossRef Maetschke, S.R., Madhamshettiwar, P.B., Davis, M.J., Ragan, M.A. (2014). Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Briefings in Bioinformatics, 15(2), 195–211.CrossRef
go back to reference Marbach, D., Costello, J.C., Küffner, R., Vega, N.M., Prill, R.J., Camacho, D.M., Allison, K. (2012). The DREAM5 Consortium, Kellis M., Collins J. J., Stolovitzky G.: Wisdom of crowds for robust gene network inference. Nature Methods, 9(8), 796–804.CrossRef Marbach, D., Costello, J.C., Küffner, R., Vega, N.M., Prill, R.J., Camacho, D.M., Allison, K. (2012). The DREAM5 Consortium, Kellis M., Collins J. J., Stolovitzky G.: Wisdom of crowds for robust gene network inference. Nature Methods, 9(8), 796–804.CrossRef
go back to reference Moosmann, F., Triggs, B., Jurie, F. (2006). Fast discriminative visual codebooks using randomized clustering forests. In Proceedings of the 20th conference on neural information processing systems (NIPS) (pp. 985–992). Moosmann, F., Triggs, B., Jurie, F. (2006). Fast discriminative visual codebooks using randomized clustering forests. In Proceedings of the 20th conference on neural information processing systems (NIPS) (pp. 985–992).
go back to reference Moosmann, F., Triggs, B., Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646.CrossRef Moosmann, F., Triggs, B., Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646.CrossRef
go back to reference Park, Y., & Marcotte, E.M. (2012). Flaws in evaluation schemes for pair-input computational predictions. Nature Methods, 9(12), 1134–1136.CrossRef Park, Y., & Marcotte, E.M. (2012). Flaws in evaluation schemes for pair-input computational predictions. Nature Methods, 9(12), 1134–1136.CrossRef
go back to reference Pio, G., Ceci, M., Malerba, D., D’Elia, D. (2015). ComiRNet: a web-based system for the analysis of miRNA-gene regulatory networks. BMC Bioinformatics, 16 (9), S7.CrossRef Pio, G., Ceci, M., Malerba, D., D’Elia, D. (2015). ComiRNet: a web-based system for the analysis of miRNA-gene regulatory networks. BMC Bioinformatics, 16 (9), S7.CrossRef
go back to reference Pliakos, K., & Vens, C. (2017). Feature induction and network mining with clustering tree ensembles. New frontiers in mining complex patterns. (NFMCP 2016). Lecture Notes in Computer Science, 10312, 3–18.CrossRef Pliakos, K., & Vens, C. (2017). Feature induction and network mining with clustering tree ensembles. New frontiers in mining complex patterns. (NFMCP 2016). Lecture Notes in Computer Science, 10312, 3–18.CrossRef
go back to reference Pliakos, K., & Vens, C. (2018). Mining Features for Biomedical Data using Clustering Tree Ensembles (under review). Pliakos, K., & Vens, C. (2018). Mining Features for Biomedical Data using Clustering Tree Ensembles (under review).
go back to reference Roweis, S.T., & Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRef Roweis, S.T., & Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRef
go back to reference Schölkopf, B., Smola, A., Müller, K.R. (1997). Kernel principal component analysis. In International conference on artificial neural networks (pp. 583–588). Schölkopf, B., Smola, A., Müller, K.R. (1997). Kernel principal component analysis. In International conference on artificial neural networks (pp. 583–588).
go back to reference Schrynemackers, M., Kuener, R., Geurts, P. (2013). On protocols and measures for the validation of supervised methods for the inference of biological networks. Frontiers in Genetics, 4, 262.CrossRef Schrynemackers, M., Kuener, R., Geurts, P. (2013). On protocols and measures for the validation of supervised methods for the inference of biological networks. Frontiers in Genetics, 4, 262.CrossRef
go back to reference Schrynemackers, M., Wehenkel, L., Babu, M.M., Geurts, P. (2015). Classifying pairs with trees for supervised biological network inference. Molecular BioSystems, 11 (8), 2116–2125.CrossRef Schrynemackers, M., Wehenkel, L., Babu, M.M., Geurts, P. (2015). Classifying pairs with trees for supervised biological network inference. Molecular BioSystems, 11 (8), 2116–2125.CrossRef
go back to reference Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.MATHCrossRef Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.MATHCrossRef
go back to reference Stojanova, D., Ceci, M., Malerba, D., Dzeroski, S. (2013). Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction. BMC Bioinformatics, 14(1), 285.CrossRef Stojanova, D., Ceci, M., Malerba, D., Dzeroski, S. (2013). Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction. BMC Bioinformatics, 14(1), 285.CrossRef
go back to reference Sun, Y., & Han, J. (2012). Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, 3 (2), 1–159.CrossRef Sun, Y., & Han, J. (2012). Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, 3 (2), 1–159.CrossRef
go back to reference Sun, Y., & Han, J. (2013). Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explorations Newsletter, 14(2), 20–28.CrossRef Sun, Y., & Han, J. (2013). Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explorations Newsletter, 14(2), 20–28.CrossRef
go back to reference Tenenbaum, J.B., De Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.CrossRef Tenenbaum, J.B., De Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.CrossRef
go back to reference Van Der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.MATH Van Der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.MATH
go back to reference Van Der Maaten, L., Postma, E., Van den Herik, J. (2009). Dimensionality reduction: a comparative review. Journal of Machine Learning Research, 10, 66–71. Van Der Maaten, L., Postma, E., Van den Herik, J. (2009). Dimensionality reduction: a comparative review. Journal of Machine Learning Research, 10, 66–71.
go back to reference Vens, C., & Costa, F. (2011). Random forest based feature induction. In Proceedings of IEEE 11th international conference on data mining (ICDM) (pp. 744–753). Vens, C., & Costa, F. (2011). Random forest based feature induction. In Proceedings of IEEE 11th international conference on data mining (ICDM) (pp. 744–753).
go back to reference Vert, J.P. (2010). Reconstruction of biological networks by supervised machine learning approaches. In Elements of computational systems biology (pp. 165–188). Oxford: Wiley. Vert, J.P. (2010). Reconstruction of biological networks by supervised machine learning approaches. In Elements of computational systems biology (pp. 165–188). Oxford: Wiley.
go back to reference Vert, J.P., Qiu, J., Noble, W.S. (2007). A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8(10), 1. Vert, J.P., Qiu, J., Noble, W.S. (2007). A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8(10), 1.
go back to reference Wang, Y.R., & Huang, H. (2014). Review on statistical methods for gene network reconstruction using expression data. Journal of Theoretical Biology, 362, 53–61.MATHCrossRef Wang, Y.R., & Huang, H. (2014). Review on statistical methods for gene network reconstruction using expression data. Journal of Theoretical Biology, 362, 53–61.MATHCrossRef
go back to reference Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., Kanehisa, M. (2008). Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232–i240.CrossRef Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., Kanehisa, M. (2008). Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232–i240.CrossRef
go back to reference Yan, S., Xu, D., Zhang, B., Zhang, H.J., Yang, Q., Lin, S. (2007). Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 40–51.CrossRef Yan, S., Xu, D., Zhang, B., Zhang, H.J., Yang, Q., Lin, S. (2007). Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 40–51.CrossRef
go back to reference Zhang, M., & Wu, L. (2015). LIFT: Multi-Label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120.CrossRef Zhang, M., & Wu, L. (2015). LIFT: Multi-Label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120.CrossRef
Metadata
Title
Network representation with clustering tree features
Authors
Konstantinos Pliakos
Celine Vens
Publication date
20-04-2018
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 2/2018
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-018-0506-7

Other articles of this Issue 2/2018

Journal of Intelligent Information Systems 2/2018 Go to the issue

Premium Partner