Skip to main content
Erschienen in: Journal of Intelligent Information Systems 2/2018

20.04.2018

Network representation with clustering tree features

verfasst von: Konstantinos Pliakos, Celine Vens

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Representing and inferring interaction networks is a challenging and long-standing problem. Modern technological advances have led to a great increase in both volume and complexity of generated network data. The size of networks such as drug protein interaction networks or gene regulatory networks is constantly growing and multiple sources of information are exploited to extract features describing the nodes in such networks. Modern information systems need therefore methods that are able to mine these networks and exploit the available features. Here, a novel data mining framework for network representation and mining is proposed. It is based on decision tree learning and ensembles of trees. The proposed scheme introduces an efficient network data representation, capable of addressing different data types, tackling as well data volume and complexity. The learning process follows the inductive setup and it can be performed in both a supervised or unsupervised manner. Experiments were conducted on six biomedical network datasets. The experimental evaluation demonstrates the merits of the proposed approach, confirming its efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G. (2008). Finding high-quality content in social media. In Proceedings of ACM international conference on Web search and data mining (pp. 183–194). Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G. (2008). Finding high-quality content in social media. In Proceedings of ACM international conference on Web search and data mining (pp. 183–194).
Zurück zum Zitat Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.MATHCrossRef Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.MATHCrossRef
Zurück zum Zitat Bleakley, K., Biau, G., Vert, J.P. (2007). Supervised reconstruction of biological networks with local models. Bioinformatics, 23(13), i57–i65.CrossRef Bleakley, K., Biau, G., Vert, J.P. (2007). Supervised reconstruction of biological networks with local models. Bioinformatics, 23(13), i57–i65.CrossRef
Zurück zum Zitat Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1), 285–297.MathSciNetMATHCrossRef Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1), 285–297.MathSciNetMATHCrossRef
Zurück zum Zitat Blockeel, H., De Raedt, L., Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (pp. 55–63). Blockeel, H., De Raedt, L., Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (pp. 55–63).
Zurück zum Zitat Burges, C.J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.CrossRef Burges, C.J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.CrossRef
Zurück zum Zitat Cai, H., Zheng, V.W., Chang, K. (2018). A comprehensive survey of graph embedding: problems, techniques and applications. IEEE Transactions on Knowledge and Data Engineering. Cai, H., Zheng, V.W., Chang, K. (2018). A comprehensive survey of graph embedding: problems, techniques and applications. IEEE Transactions on Knowledge and Data Engineering.
Zurück zum Zitat Cao, L.J., Chua, K.S., Chong, W.K., Lee, H.P., Gu, Q.M. (2003). A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing, 55(1), 321–336. Cao, L.J., Chua, K.S., Chong, W.K., Lee, H.P., Gu, Q.M. (2003). A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing, 55(1), 321–336.
Zurück zum Zitat Ceci, M., Pio, G., Kuzmanovski, V., Džeroski, S. (2015). Semi-supervised multi-view learning for gene network reconstruction. PloS One, 10(12), e0144031.CrossRef Ceci, M., Pio, G., Kuzmanovski, V., Džeroski, S. (2015). Semi-supervised multi-view learning for gene network reconstruction. PloS One, 10(12), e0144031.CrossRef
Zurück zum Zitat Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S. (2007). Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5(1), e8.CrossRef Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S. (2007). Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5(1), e8.CrossRef
Zurück zum Zitat Geurts, P., Ernst, D., Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.MATHCrossRef Geurts, P., Ernst, D., Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.MATHCrossRef
Zurück zum Zitat Geurts, P., Irrthum, A., Wehenkel, L. (2009). Supervised learning with decision tree-based methods in computational and systems biology. Molecular Biosystems, 5 (12), 1593–1605.CrossRef Geurts, P., Irrthum, A., Wehenkel, L. (2009). Supervised learning with decision tree-based methods in computational and systems biology. Molecular Biosystems, 5 (12), 1593–1605.CrossRef
Zurück zum Zitat Hase, T., Ghosh, S., Yamanaka, R., Kitano, H. (2013). Harnessing diversity towards the reconstructing of large scale gene regulatory networks. PLoS Computational Biology, 9(11), e1003361.CrossRef Hase, T., Ghosh, S., Yamanaka, R., Kitano, H. (2013). Harnessing diversity towards the reconstructing of large scale gene regulatory networks. PLoS Computational Biology, 9(11), e1003361.CrossRef
Zurück zum Zitat He, H, & Garcia, E.A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge Data Engineering, 21(9), 1263—1284. He, H, & Garcia, E.A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge Data Engineering, 21(9), 1263—1284.
Zurück zum Zitat Irrthum, A., Wehenkel, L., Geurts, P. (2010). Inferring regulatory networks from expression data using tree-based methods. PloS One, 5(9), e12776.CrossRef Irrthum, A., Wehenkel, L., Geurts, P. (2010). Inferring regulatory networks from expression data using tree-based methods. PloS One, 5(9), e12776.CrossRef
Zurück zum Zitat Joly, A., Geurts, P., Wehenkel, L. (2014). Random forests with random projections of the output space for high dimensional multi-label classification. In Machine learning and knowledge discovery in databases (ECML PKDD) (pp. 607–622). Nancy. Joly, A., Geurts, P., Wehenkel, L. (2014). Random forests with random projections of the output space for high dimensional multi-label classification. In Machine learning and knowledge discovery in databases (ECML PKDD) (pp. 607–622). Nancy.
Zurück zum Zitat Kocev, D., & Ceci, M. (2015). Ensembles of extremely randomized trees for multi-target regression. In Japkowicz, N., & Matwin, S. (Eds.) Discovery science. Lecture notes in computer science, Vol. 9356. Cham: Springer. Kocev, D., & Ceci, M. (2015). Ensembles of extremely randomized trees for multi-target regression. In Japkowicz, N., & Matwin, S. (Eds.) Discovery science. Lecture notes in computer science, Vol. 9356. Cham: Springer.
Zurück zum Zitat Kocev, D., Vens, C., Struyf, J., Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.CrossRef Kocev, D., Vens, C., Struyf, J., Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.CrossRef
Zurück zum Zitat Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.MathSciNetMATH Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.MathSciNetMATH
Zurück zum Zitat MacIsaac, K.D., Wang, T., Gordon, D.B., Gifford, D.K., Stormo, G.D., Fraenkel, E. (2006). An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics, 7(1), 1.CrossRef MacIsaac, K.D., Wang, T., Gordon, D.B., Gifford, D.K., Stormo, G.D., Fraenkel, E. (2006). An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics, 7(1), 1.CrossRef
Zurück zum Zitat Maetschke, S.R., Madhamshettiwar, P.B., Davis, M.J., Ragan, M.A. (2014). Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Briefings in Bioinformatics, 15(2), 195–211.CrossRef Maetschke, S.R., Madhamshettiwar, P.B., Davis, M.J., Ragan, M.A. (2014). Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Briefings in Bioinformatics, 15(2), 195–211.CrossRef
Zurück zum Zitat Marbach, D., Costello, J.C., Küffner, R., Vega, N.M., Prill, R.J., Camacho, D.M., Allison, K. (2012). The DREAM5 Consortium, Kellis M., Collins J. J., Stolovitzky G.: Wisdom of crowds for robust gene network inference. Nature Methods, 9(8), 796–804.CrossRef Marbach, D., Costello, J.C., Küffner, R., Vega, N.M., Prill, R.J., Camacho, D.M., Allison, K. (2012). The DREAM5 Consortium, Kellis M., Collins J. J., Stolovitzky G.: Wisdom of crowds for robust gene network inference. Nature Methods, 9(8), 796–804.CrossRef
Zurück zum Zitat Moosmann, F., Triggs, B., Jurie, F. (2006). Fast discriminative visual codebooks using randomized clustering forests. In Proceedings of the 20th conference on neural information processing systems (NIPS) (pp. 985–992). Moosmann, F., Triggs, B., Jurie, F. (2006). Fast discriminative visual codebooks using randomized clustering forests. In Proceedings of the 20th conference on neural information processing systems (NIPS) (pp. 985–992).
Zurück zum Zitat Moosmann, F., Triggs, B., Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646.CrossRef Moosmann, F., Triggs, B., Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646.CrossRef
Zurück zum Zitat Park, Y., & Marcotte, E.M. (2012). Flaws in evaluation schemes for pair-input computational predictions. Nature Methods, 9(12), 1134–1136.CrossRef Park, Y., & Marcotte, E.M. (2012). Flaws in evaluation schemes for pair-input computational predictions. Nature Methods, 9(12), 1134–1136.CrossRef
Zurück zum Zitat Pio, G., Ceci, M., Malerba, D., D’Elia, D. (2015). ComiRNet: a web-based system for the analysis of miRNA-gene regulatory networks. BMC Bioinformatics, 16 (9), S7.CrossRef Pio, G., Ceci, M., Malerba, D., D’Elia, D. (2015). ComiRNet: a web-based system for the analysis of miRNA-gene regulatory networks. BMC Bioinformatics, 16 (9), S7.CrossRef
Zurück zum Zitat Pliakos, K., & Vens, C. (2017). Feature induction and network mining with clustering tree ensembles. New frontiers in mining complex patterns. (NFMCP 2016). Lecture Notes in Computer Science, 10312, 3–18.CrossRef Pliakos, K., & Vens, C. (2017). Feature induction and network mining with clustering tree ensembles. New frontiers in mining complex patterns. (NFMCP 2016). Lecture Notes in Computer Science, 10312, 3–18.CrossRef
Zurück zum Zitat Pliakos, K., & Vens, C. (2018). Mining Features for Biomedical Data using Clustering Tree Ensembles (under review). Pliakos, K., & Vens, C. (2018). Mining Features for Biomedical Data using Clustering Tree Ensembles (under review).
Zurück zum Zitat Roweis, S.T., & Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRef Roweis, S.T., & Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRef
Zurück zum Zitat Schölkopf, B., Smola, A., Müller, K.R. (1997). Kernel principal component analysis. In International conference on artificial neural networks (pp. 583–588). Schölkopf, B., Smola, A., Müller, K.R. (1997). Kernel principal component analysis. In International conference on artificial neural networks (pp. 583–588).
Zurück zum Zitat Schrynemackers, M., Kuener, R., Geurts, P. (2013). On protocols and measures for the validation of supervised methods for the inference of biological networks. Frontiers in Genetics, 4, 262.CrossRef Schrynemackers, M., Kuener, R., Geurts, P. (2013). On protocols and measures for the validation of supervised methods for the inference of biological networks. Frontiers in Genetics, 4, 262.CrossRef
Zurück zum Zitat Schrynemackers, M., Wehenkel, L., Babu, M.M., Geurts, P. (2015). Classifying pairs with trees for supervised biological network inference. Molecular BioSystems, 11 (8), 2116–2125.CrossRef Schrynemackers, M., Wehenkel, L., Babu, M.M., Geurts, P. (2015). Classifying pairs with trees for supervised biological network inference. Molecular BioSystems, 11 (8), 2116–2125.CrossRef
Zurück zum Zitat Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.MATHCrossRef Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.MATHCrossRef
Zurück zum Zitat Stojanova, D., Ceci, M., Malerba, D., Dzeroski, S. (2013). Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction. BMC Bioinformatics, 14(1), 285.CrossRef Stojanova, D., Ceci, M., Malerba, D., Dzeroski, S. (2013). Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction. BMC Bioinformatics, 14(1), 285.CrossRef
Zurück zum Zitat Sun, Y., & Han, J. (2012). Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, 3 (2), 1–159.CrossRef Sun, Y., & Han, J. (2012). Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, 3 (2), 1–159.CrossRef
Zurück zum Zitat Sun, Y., & Han, J. (2013). Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explorations Newsletter, 14(2), 20–28.CrossRef Sun, Y., & Han, J. (2013). Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explorations Newsletter, 14(2), 20–28.CrossRef
Zurück zum Zitat Tenenbaum, J.B., De Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.CrossRef Tenenbaum, J.B., De Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.CrossRef
Zurück zum Zitat Van Der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.MATH Van Der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.MATH
Zurück zum Zitat Van Der Maaten, L., Postma, E., Van den Herik, J. (2009). Dimensionality reduction: a comparative review. Journal of Machine Learning Research, 10, 66–71. Van Der Maaten, L., Postma, E., Van den Herik, J. (2009). Dimensionality reduction: a comparative review. Journal of Machine Learning Research, 10, 66–71.
Zurück zum Zitat Vens, C., & Costa, F. (2011). Random forest based feature induction. In Proceedings of IEEE 11th international conference on data mining (ICDM) (pp. 744–753). Vens, C., & Costa, F. (2011). Random forest based feature induction. In Proceedings of IEEE 11th international conference on data mining (ICDM) (pp. 744–753).
Zurück zum Zitat Vert, J.P. (2010). Reconstruction of biological networks by supervised machine learning approaches. In Elements of computational systems biology (pp. 165–188). Oxford: Wiley. Vert, J.P. (2010). Reconstruction of biological networks by supervised machine learning approaches. In Elements of computational systems biology (pp. 165–188). Oxford: Wiley.
Zurück zum Zitat Vert, J.P., Qiu, J., Noble, W.S. (2007). A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8(10), 1. Vert, J.P., Qiu, J., Noble, W.S. (2007). A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8(10), 1.
Zurück zum Zitat Wang, Y.R., & Huang, H. (2014). Review on statistical methods for gene network reconstruction using expression data. Journal of Theoretical Biology, 362, 53–61.MATHCrossRef Wang, Y.R., & Huang, H. (2014). Review on statistical methods for gene network reconstruction using expression data. Journal of Theoretical Biology, 362, 53–61.MATHCrossRef
Zurück zum Zitat Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., Kanehisa, M. (2008). Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232–i240.CrossRef Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., Kanehisa, M. (2008). Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232–i240.CrossRef
Zurück zum Zitat Yan, S., Xu, D., Zhang, B., Zhang, H.J., Yang, Q., Lin, S. (2007). Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 40–51.CrossRef Yan, S., Xu, D., Zhang, B., Zhang, H.J., Yang, Q., Lin, S. (2007). Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 40–51.CrossRef
Zurück zum Zitat Zhang, M., & Wu, L. (2015). LIFT: Multi-Label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120.CrossRef Zhang, M., & Wu, L. (2015). LIFT: Multi-Label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120.CrossRef
Metadaten
Titel
Network representation with clustering tree features
verfasst von
Konstantinos Pliakos
Celine Vens
Publikationsdatum
20.04.2018
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 2/2018
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-018-0506-7

Weitere Artikel der Ausgabe 2/2018

Journal of Intelligent Information Systems 2/2018 Zur Ausgabe