Skip to main content
Top

2011 | OriginalPaper | Chapter

16. Graph Classification Methods in Chemoinformatics

Author : Koji Tsuda

Published in: Handbook of Statistical Bioinformatics

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Graphs are general and powerful data structures that can be used to represent diverse kinds of molecular objects such as chemical compounds, proteins, and RNAs. In recent years, computational analysis of tens of thousands of labeled graphs has become possible by advanced graph mining methods. For example, frequent pattern mining methods such as gSpan can enumerate all frequent subgraphs in a graph database efficiently. This chapter reviews basics of graph mining methodology and its application to chemoinformatics and bioinformatics. Graph classification and regression techniques based on subgraph patterns are also reviewed extensively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of VLDB 1994 (pp. 487–499). Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of VLDB 1994 (pp. 487–499).
2.
go back to reference Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., & Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In Proceedings of 2nd SIAM data mining conference (SDM) (pp. 158–174). Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., & Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In Proceedings of 2nd SIAM data mining conference (SDM) (pp. 158–174).
3.
go back to reference Boley, M., & Grosskreutz, H. (2008). A randomized approach for approximating the number of frequent sets. In Proceedings of the 8th IEEE international conference on data mining (pp. 43–52). Boley, M., & Grosskreutz, H. (2008). A randomized approach for approximating the number of frequent sets. In Proceedings of the 8th IEEE international conference on data mining (pp. 43–52).
4.
go back to reference Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S. V. N., Smola, A. J., & Kriegel, H.-P. (2006). Protein function prediction via graph kernels. Bioinformatics, 21(Suppl. 1), i47–i56. Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S. V. N., Smola, A. J., & Kriegel, H.-P. (2006). Protein function prediction via graph kernels. Bioinformatics, 21(Suppl. 1), i47–i56.
5.
go back to reference Cheng, H., Lo, D., Zhou, Y., Wang, X., & Yan, X. (2009). Identifying bug signatures using discriminative graph mining. In Proceedings of the 18th international symposium on software testing and analysis (pp. 141–152). Cheng, H., Lo, D., Zhou, Y., Wang, X., & Yan, X. (2009). Identifying bug signatures using discriminative graph mining. In Proceedings of the 18th international symposium on software testing and analysis (pp. 141–152).
6.
go back to reference Demiriz, A., Bennet, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46(1–3), 225–254.MATHCrossRef Demiriz, A., Bennet, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46(1–3), 225–254.MATHCrossRef
7.
go back to reference Deshpande, M., Kuramochi, M., Wale, N., & Karypis, G. (2005). Frequent sub-structure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1036–1050.CrossRef Deshpande, M., Kuramochi, M., Wale, N., & Karypis, G. (2005). Frequent sub-structure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1036–1050.CrossRef
8.
go back to reference du Merle, O., Villeneuve, D., Desrosiers, J., & Hansen, P. (1999). Stabilized column generation. Discrete Mathematics, 194, 229–237.MathSciNetMATHCrossRef du Merle, O., Villeneuve, D., Desrosiers, J., & Hansen, P. (1999). Stabilized column generation. Discrete Mathematics, 194, 229–237.MathSciNetMATHCrossRef
9.
go back to reference Eichinger, F., Böhm, K., & Huber, M. (2008). Mining edge-weighted call graphs to localise software bugs. In Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD) (pp. 333–348). Eichinger, F., Böhm, K., & Huber, M. (2008). Mining edge-weighted call graphs to localise software bugs. In Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD) (pp. 333–348).
10.
go back to reference Gasteiger, J., & Engel, T. (2003). Chemoinformatics: A textbook. Weinheim, Germany: Wiley-VCH.CrossRef Gasteiger, J., & Engel, T. (2003). Chemoinformatics: A textbook. Weinheim, Germany: Wiley-VCH.CrossRef
11.
go back to reference Guyon, I., Weston, J., Bahnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422.MATHCrossRef Guyon, I., Weston, J., Bahnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422.MATHCrossRef
12.
go back to reference Hamada, M., Tsuda, K., Kudo, T., Kin, T., & Asai, K. (2006). Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics, 22, 2480–2487.CrossRef Hamada, M., Tsuda, K., Kudo, T., Kin, T., & Asai, K. (2006). Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics, 22, 2480–2487.CrossRef
13.
go back to reference Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann. Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann.
14.
go back to reference Helma, C., Cramer, T., Kramer, S., & Raedt, L. D. (2004). Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. Journal of Chemical Information Computer Science, 44, 1402–1411. Helma, C., Cramer, T., Kramer, S., & Raedt, L. D. (2004). Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. Journal of Chemical Information Computer Science, 44, 1402–1411.
15.
go back to reference Inokuchi, A. (2005). Mining generalized substructures from a set of labeled graphs. In Proceedings of the 4th IEEE internatinal conference on data mining (pp. 415–418). Los Alamitos, CA: IEEE Computer Society. Inokuchi, A. (2005). Mining generalized substructures from a set of labeled graphs. In Proceedings of the 4th IEEE internatinal conference on data mining (pp. 415–418). Los Alamitos, CA: IEEE Computer Society.
16.
go back to reference Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the 21st international conference on machine learning (pp. 321–328). New York: AAAI. Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the 21st international conference on machine learning (pp. 321–328). New York: AAAI.
17.
go back to reference Kazius, J., Nijssen, S., Kok, J., Bäck, T., & Ijzerman, A. P. (2006). Substructure mining using elaborate chemical representation. Journal of Chemical Information Modeling, 46, 597–605.CrossRef Kazius, J., Nijssen, S., Kok, J., Bäck, T., & Ijzerman, A. P. (2006). Substructure mining using elaborate chemical representation. Journal of Chemical Information Modeling, 46, 597–605.CrossRef
18.
go back to reference Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 1–2, 273–324.CrossRef Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 1–2, 273–324.CrossRef
19.
go back to reference Kudo, T., Maeda, E., & Matsumoto, Y. (2005). An application of boosting to graph classification. In Advances in neural information processing systems (Vol. 17, pp. 729–736). Cambridge, MA: MIT. Kudo, T., Maeda, E., & Matsumoto, Y. (2005). An application of boosting to graph classification. In Advances in neural information processing systems (Vol. 17, pp. 729–736). Cambridge, MA: MIT.
20.
go back to reference Luenberger, D. G. (1969). Optimization by vector space methods. New York: Wiley.MATH Luenberger, D. G. (1969). Optimization by vector space methods. New York: Wiley.MATH
21.
go back to reference Mahé, P., Ueda, N., Akutsu, T., Perret, J.-L., & Vert, J.-P. (2005). Graph kernels for molecular structure – activity relationship analysis with support vector machines. Journal of Chemical and Information Modeling, 45, 939–951.CrossRef Mahé, P., Ueda, N., Akutsu, T., Perret, J.-L., & Vert, J.-P. (2005). Graph kernels for molecular structure – activity relationship analysis with support vector machines. Journal of Chemical and Information Modeling, 45, 939–951.CrossRef
22.
go back to reference Morishita, S. (2001). Computing optimal hypotheses efficiently for boosting. In Discovery science (pp. 471–481). Morishita, S. (2001). Computing optimal hypotheses efficiently for boosting. In Discovery science (pp. 471–481).
23.
go back to reference Morishita, S., & Sese, J. (2000). Traversing itemset lattices with statistical metric pruning. In Proceedings of ACM SIGACT-SIGMOD-SIGART symposium on database systems (PODS) (pp. 226–236). Morishita, S., & Sese, J. (2000). Traversing itemset lattices with statistical metric pruning. In Proceedings of ACM SIGACT-SIGMOD-SIGART symposium on database systems (PODS) (pp. 226–236).
24.
go back to reference Nijssen, S., & Kok, J. N. (2004). A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 647–652). New York: ACM Press. Nijssen, S., & Kok, J. N. (2004). A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 647–652). New York: ACM Press.
25.
go back to reference Nowozin, S., Tsuda, K., Uno, T., Kudo, T., & Bakir, G. (2007). Weighted substructure mining for image analysis. In IEEE computer society conference on computer vision and pattern recognition (CVPR). Los Alamitos, CA: IEEE Computer Society. Nowozin, S., Tsuda, K., Uno, T., Kudo, T., & Bakir, G. (2007). Weighted substructure mining for image analysis. In IEEE computer society conference on computer vision and pattern recognition (CVPR). Los Alamitos, CA: IEEE Computer Society.
26.
go back to reference Pei, J., Han, J., Mortazavi-asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. (2004). Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424–1440.CrossRef Pei, J., Han, J., Mortazavi-asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. (2004). Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424–1440.CrossRef
27.
go back to reference Rätsch, G., Mika, S., Schölkopf, B., & Müller, K.-R. (2002). Constructing boosting algorithms from SVMs: An application to one-class classification. IEEE Transactions on Pattern Analysis Machine Intelligence, 24(9), 1184–1199.CrossRef Rätsch, G., Mika, S., Schölkopf, B., & Müller, K.-R. (2002). Constructing boosting algorithms from SVMs: An application to one-class classification. IEEE Transactions on Pattern Analysis Machine Intelligence, 24(9), 1184–1199.CrossRef
28.
go back to reference Rosipal, R., & Krämer, N. (2006). Overview and recent advances in partial least squares. In Subspace, latent structure and feature selection techniques (pp. 34–51). Springer. Rosipal, R., & Krämer, N. (2006). Overview and recent advances in partial least squares. In Subspace, latent structure and feature selection techniques (pp. 34–51). Springer.
29.
go back to reference Saigo, H., Krämer, N., & Tsuda, K. (2008). Partial least squares regression for graph mining. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 578–586). Saigo, H., Krämer, N., & Tsuda, K. (2008). Partial least squares regression for graph mining. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 578–586).
30.
go back to reference Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., & Tsuda, K. (2008). GBoost: A mathematical programming approach to graph classification and regression. Machine Learning. Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., & Tsuda, K. (2008). GBoost: A mathematical programming approach to graph classification and regression. Machine Learning.
31.
go back to reference Sanfeliu, A., & Fu, K. S. (1983). A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on System, Man and Cybernetics, 13, 353–362.MATHCrossRef Sanfeliu, A., & Fu, K. S. (1983). A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on System, Man and Cybernetics, 13, 353–362.MATHCrossRef
32.
go back to reference Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT. Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT.
33.
go back to reference Tsuda, K. (2007). Entire regularization paths for graph data. In Proceedings of the 24th international conference on machine learning (pp. 919–926). Tsuda, K. (2007). Entire regularization paths for graph data. In Proceedings of the 24th international conference on machine learning (pp. 919–926).
34.
go back to reference Tsuda, K., & Kudo, T. (2006). Clustering graphs by weighted substructure mining. In Proceedings of the 23rd international conference on machine learning (pp. 953–960). New York: ACM. Tsuda, K., & Kudo, T. (2006). Clustering graphs by weighted substructure mining. In Proceedings of the 23rd international conference on machine learning (pp. 953–960). New York: ACM.
35.
go back to reference Tsuda, K., & Kurihara, K. (2008). Graph mining with variational dirichlet process mixture models. In SIAM Conference on Data Mining (SDM). Tsuda, K., & Kurihara, K. (2008). Graph mining with variational dirichlet process mixture models. In SIAM Conference on Data Mining (SDM).
36.
go back to reference Wale, N., & Karypis, G. (2006). Comparison of descriptor spaces for chemical compound retrieval and classification. In Proceedings of the 2006 IEEE international conference on data mining (pp. 678–689). Wale, N., & Karypis, G. (2006). Comparison of descriptor spaces for chemical compound retrieval and classification. In Proceedings of the 2006 IEEE international conference on data mining (pp. 678–689).
37.
go back to reference Yan, X., Cheng, H., Han, J., & Yu, P. S. (2008). Mining significant graph patterns by leap search. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 433–444). Yan, X., Cheng, H., Han, J., & Yu, P. S. (2008). Mining significant graph patterns by leap search. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 433–444).
38.
go back to reference Yan, X., & Han, J. (2002). gSpan: Graph-based substructure pattern mining. In Proceedings of the 2002 IEEE international conference on data mining (pp. 721–724). Los Alamitos, CA: IEEE Computer Society. Yan, X., & Han, J. (2002). gSpan: Graph-based substructure pattern mining. In Proceedings of the 2002 IEEE international conference on data mining (pp. 721–724). Los Alamitos, CA: IEEE Computer Society.
39.
go back to reference Zaki, M., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In KDD 1997 (pp. 283–286). Zaki, M., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In KDD 1997 (pp. 283–286).
Metadata
Title
Graph Classification Methods in Chemoinformatics
Author
Koji Tsuda
Copyright Year
2011
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-16345-6_16

Premium Partner