Skip to main content
Erschienen in: Knowledge and Information Systems 1/2017

19.03.2016 | Regular Paper

Boosting for graph classification with universum

verfasst von: Shirui Pan, Jia Wu, Xingquan Zhu, Guodong Long, Chengqi Zhang

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent years have witnessed extensive studies of graph classification due to the rapid increase in applications involving structural data and complex relationships. To support graph classification, all existing methods require that training graphs should be relevant (or belong) to the target class, but cannot integrate graphs irrelevant to the class of interest into the learning process. In this paper, we study a new universum graph classification framework which leverages additional “non-example” graphs to help improve the graph classification accuracy. We argue that although universum graphs do not belong to the target class, they may contain meaningful structure patterns to help enrich the feature space for graph representation and classification. To support universum graph classification, we propose a mathematical programming algorithm, ugBoost, which integrates discriminative subgraph selection and margin maximization into a unified framework to fully exploit the universum. Because informative subgraph exploration in a universum setting requires the search of a large space, we derive an upper bound discriminative score for each subgraph and employ a branch-and-bound scheme to prune the search space. By using the explored subgraphs, our graph classification model intends to maximize the margin between positive and negative graphs and minimize the loss on the universum graph examples simultaneously. The subgraph exploration and the learning are integrated and performed iteratively so that each can be beneficial to the other. Experimental results and comparisons on real-world dataset demonstrate the performance of our algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
We use bold-faced letters (\(\varvec{w,\xi }\)) to indicate a vector, normal letters \(w_i\) and \(\xi _i\) to represent scalar values.
 
2
The derivation from Eqs. (7) to (8) is illustrated in “Appendix.”
 
3
We can obtain the dual solutions of Eq. (8) immediately after solving Eq. (7) by using CVX package, available from http://​cvxr.​com/​cvx/​.
 
Literatur
1.
Zurück zum Zitat Aggarwal C (2011) On classification of graph streams. In: Proceeding of the SDM. Arizona, USA Aggarwal C (2011) On classification of graph streams. In: Proceeding of the SDM. Arizona, USA
2.
Zurück zum Zitat Bai X, Cherkassky V (2008) Gender classification of human faces using inference through contradictions. In: IJCNN, pp 746–750 Bai X, Cherkassky V (2008) Gender classification of human faces using inference through contradictions. In: IJCNN, pp 746–750
3.
Zurück zum Zitat Chen S, Zhang C (2009) Selecting informative universum sample for semi-supervised learning. IJCAI 6:1016–1021 Chen S, Zhang C (2009) Selecting informative universum sample for semi-supervised learning. IJCAI 6:1016–1021
4.
Zurück zum Zitat Demiriz A, Bennett K, Shawe-Taylor J (2002) Linear programming boosting via column generation. Mach Learn 46:225–254CrossRefMATH Demiriz A, Bennett K, Shawe-Taylor J (2002) Linear programming boosting via column generation. Mach Learn 46:225–254CrossRefMATH
5.
Zurück zum Zitat Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17:1036–1050CrossRef Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17:1036–1050CrossRef
6.
Zurück zum Zitat Fei H, Huan J (2008) Structure feature selection for graph classification. In: Proceedings of the ACM CIKM, California, USA Fei H, Huan J (2008) Structure feature selection for graph classification. In: Proceedings of the ACM CIKM, California, USA
7.
Zurück zum Zitat Fei H, Huan J (2010) Boosting with structure information in the functional space: an application to graph classification. In: Proceedings of the ACM SIGKDD, Washington DC, USA Fei H, Huan J (2010) Boosting with structure information in the functional space: an application to graph classification. In: Proceedings of the ACM SIGKDD, Washington DC, USA
8.
Zurück zum Zitat Gaüzere B, Brun L, Villemin D (2012) Two new graphs kernels in chemoinformatics. Pattern Recognit Lett 33(15):2038–2047CrossRef Gaüzere B, Brun L, Villemin D (2012) Two new graphs kernels in chemoinformatics. Pattern Recognit Lett 33(15):2038–2047CrossRef
9.
Zurück zum Zitat Guo T, Zhu X (2013) Understanding the roles of sub-graph features for graph classification: an empirical study perspective. In: Proceedings of the ACM CIKM Conference, pp 817–822. ACM Guo T, Zhu X (2013) Understanding the roles of sub-graph features for graph classification: an empirical study perspective. In: Proceedings of the ACM CIKM Conference, pp 817–822. ACM
10.
Zurück zum Zitat Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36CrossRef Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36CrossRef
11.
Zurück zum Zitat Jiang C, Coenen F, Sanderson R, Zito M (2010) Text classification using graph mining-based feature extraction. Knowl Based Syst 23(4):302–308CrossRef Jiang C, Coenen F, Sanderson R, Zito M (2010) Text classification using graph mining-based feature extraction. Knowl Based Syst 23(4):302–308CrossRef
12.
Zurück zum Zitat Jin N, Young C, Wang W (2009) Graph classification based on pattern co-occurrence. In: Proceedings of the ACM CIKM, Hong Kong, China Jin N, Young C, Wang W (2009) Graph classification based on pattern co-occurrence. In: Proceedings of the ACM CIKM, Hong Kong, China
13.
Zurück zum Zitat Jin N, Young C, Wang W (2010) GAIA: graph classification using evolutionary computation. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp 879–890. ACM Jin N, Young C, Wang W (2010) GAIA: graph classification using evolutionary computation. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp 879–890. ACM
14.
Zurück zum Zitat Joachims T (2006) Training linear svms in linear time. In: KDD, pp 217–226 Joachims T (2006) Training linear svms in linear time. In: KDD, pp 217–226
15.
Zurück zum Zitat Kashima H, Tsuda K, Inokuchi A (2004) Kernels for Graphs, chap. In: Schlkopf B, Tsuda K, Vert JP (eds) Kernel methods in computational biology. MIT Press, Cambridge Kashima H, Tsuda K, Inokuchi A (2004) Kernels for Graphs, chap. In: Schlkopf B, Tsuda K, Vert JP (eds) Kernel methods in computational biology. MIT Press, Cambridge
16.
Zurück zum Zitat Kong X, Philip SY (2012) gMLC: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31(2):281–305CrossRef Kong X, Philip SY (2012) gMLC: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31(2):281–305CrossRef
17.
Zurück zum Zitat Kong X, Yu P (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the ACM SIGKDD, Washington, DC, USA Kong X, Yu P (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the ACM SIGKDD, Washington, DC, USA
18.
Zurück zum Zitat Luenberger D (1997) Optimization by vector space methods. Wiley, New YorkMATH Luenberger D (1997) Optimization by vector space methods. Wiley, New YorkMATH
19.
Zurück zum Zitat Nash S, Sofer A (1996) Linear and nonlinear programming. McGraw-Hill, New York Nash S, Sofer A (1996) Linear and nonlinear programming. McGraw-Hill, New York
21.
Zurück zum Zitat Pan S, Wu J, Zhu X, Long G, Zhang C (2015) Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recognit 48(11):3783–3796CrossRef Pan S, Wu J, Zhu X, Long G, Zhang C (2015) Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recognit 48(11):3783–3796CrossRef
22.
Zurück zum Zitat Pan S, Wu J, Zhu X, Zhang C (2015) Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Trans Cybern 45(5):940–954 Pan S, Wu J, Zhu X, Zhang C (2015) Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Trans Cybern 45(5):940–954
23.
24.
Zurück zum Zitat Pan S, Zhu X (2013) Graph classification with imbalanced class distributions and noise. In: IJCAI Pan S, Zhu X (2013) Graph classification with imbalanced class distributions and noise. In: IJCAI
25.
Zurück zum Zitat Pan S, Zhu X, Zhang C, Yu PS (2013) Graph stream classification using labeled and unlabeled graphs. In: International Conference on Data Engineering (ICDE), IEEE Pan S, Zhu X, Zhang C, Yu PS (2013) Graph stream classification using labeled and unlabeled graphs. In: International Conference on Data Engineering (ICDE), IEEE
26.
Zurück zum Zitat Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef
27.
Zurück zum Zitat Peng B, Qian G, Ma Y (2008) View-invariant pose recognition using multilinear analysis and the universum. In: Advances in visual computing, pp 581–591. Springer Peng B, Qian G, Ma Y (2008) View-invariant pose recognition using multilinear analysis and the universum. In: Advances in visual computing, pp 581–591. Springer
28.
Zurück zum Zitat Peng B, Qian G, Ma Y (2009) Recognizing body poses using multilinear analysis and semi-supervised learning. Pattern Recognit Lett 30(14):1289–1294CrossRef Peng B, Qian G, Ma Y (2009) Recognizing body poses using multilinear analysis and semi-supervised learning. Pattern Recognit Lett 30(14):1289–1294CrossRef
29.
Zurück zum Zitat Prakash BA, Vreeken J, Faloutsos C (2014) Efficiently spotting the starting points of an epidemic in a large graph. Knowl Inf Syst 38(1):35–59CrossRef Prakash BA, Vreeken J, Faloutsos C (2014) Efficiently spotting the starting points of an epidemic in a large graph. Knowl Inf Syst 38(1):35–59CrossRef
30.
Zurück zum Zitat Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th international conference on machine learning. ACM, pp 759–766 Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th international conference on machine learning. ACM, pp 759–766
31.
Zurück zum Zitat Ranu S, Singh A (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: Proceedings of the ICDE, IEEE, pp 844–855 Ranu S, Singh A (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: Proceedings of the ICDE, IEEE, pp 844–855
32.
Zurück zum Zitat Riesen K, Bunke H (2009) Graph classification by means of Lipschitz embedding. IEEE Trans SMC B 39:1472–1483 Riesen K, Bunke H (2009) Graph classification by means of Lipschitz embedding. IEEE Trans SMC B 39:1472–1483
33.
Zurück zum Zitat Russom CL, Bradbury SP, Broderius SJ, Hammermeister DE, Drummond RA (1997) Predicting modes of toxic action from chemical structure: acute toxicity in the fathead minnow (Pimephales promelas). Environ Toxicol Chem 16(5):948–967CrossRef Russom CL, Bradbury SP, Broderius SJ, Hammermeister DE, Drummond RA (1997) Predicting modes of toxic action from chemical structure: acute toxicity in the fathead minnow (Pimephales promelas). Environ Toxicol Chem 16(5):948–967CrossRef
34.
Zurück zum Zitat Saigo H, Nowozin S, Kadowaki T, Kudo T, Tsuda K (2009) gboost: a mathematical programming approach to graph classification and regression. Mach Learn 75:69–89CrossRef Saigo H, Nowozin S, Kadowaki T, Kudo T, Tsuda K (2009) gboost: a mathematical programming approach to graph classification and regression. Mach Learn 75:69–89CrossRef
35.
Zurück zum Zitat Shen C, Wang P, Shen F, Wang H (2012) Uboost: boosting with the universum. IEEE Trans Pattern Anal Mach Intell 34(4):825–832CrossRef Shen C, Wang P, Shen F, Wang H (2012) Uboost: boosting with the universum. IEEE Trans Pattern Anal Mach Intell 34(4):825–832CrossRef
36.
Zurück zum Zitat Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) Weisfeiler-lehman graph kernels. J Mach Learn Res 12:2539–2561MathSciNetMATH Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) Weisfeiler-lehman graph kernels. J Mach Learn Res 12:2539–2561MathSciNetMATH
37.
Zurück zum Zitat Shi X, Kong X, Yu PS (2012) Transfer significant subgraphs across graph databases. In: Proceedings of the SIAM international conference on data mining. SDM Shi X, Kong X, Yu PS (2012) Transfer significant subgraphs across graph databases. In: Proceedings of the SIAM international conference on data mining. SDM
38.
Zurück zum Zitat Sinz FH, Chapelle O, Agarwal A, Schlkopf B (2007) An analysis of inference with the universum. In: NIPS’07, pp 1–1 Sinz FH, Chapelle O, Agarwal A, Schlkopf B (2007) An analysis of inference with the universum. In: NIPS’07, pp 1–1
39.
Zurück zum Zitat Sutherland JJ, O’Brien LA, Weaver DF (2004) A comparison of methods for modeling quantitative structure-activity relationships. J Med Chem 47(22):5541–5554CrossRef Sutherland JJ, O’Brien LA, Weaver DF (2004) A comparison of methods for modeling quantitative structure-activity relationships. J Med Chem 47(22):5541–5554CrossRef
40.
Zurück zum Zitat Thoma M, Cheng H, Gretton A, Han J, Kriegel H, Smola A, Song L, Yu P, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: Proceedings of the SDM. USA Thoma M, Cheng H, Gretton A, Han J, Kriegel H, Smola A, Song L, Yu P, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: Proceedings of the SDM. USA
41.
Zurück zum Zitat Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B Methodol 58(1):267–288 Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B Methodol 58(1):267–288
42.
Zurück zum Zitat Wang H, Zhang P, Tsang I, Chen L, Zhang C (2015) Defragging subgraph features for graph classification. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 1687–1690. ACM Wang H, Zhang P, Tsang I, Chen L, Zhang C (2015) Defragging subgraph features for graph classification. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 1687–1690. ACM
44.
Zurück zum Zitat Weston J, Collobert R, Sinz F, Bottou L, Vapnik V (2006) Inference with the universum. In: Proceedings of the 23rd international conference on machine learning, pp 1009–1016. ACM Weston J, Collobert R, Sinz F, Bottou L, Vapnik V (2006) Inference with the universum. In: Proceedings of the 23rd international conference on machine learning, pp 1009–1016. ACM
46.
Zurück zum Zitat Wu J, Hong Z, Pan S, Zhu X, Zhang C, Cai Z (2014) Multi-graph learning with positive and unlabeled bags. In: Proceedings of the 2014 SIAM international conference on data mining (SDM), pp 217–225 Wu J, Hong Z, Pan S, Zhu X, Zhang C, Cai Z (2014) Multi-graph learning with positive and unlabeled bags. In: Proceedings of the 2014 SIAM international conference on data mining (SDM), pp 217–225
47.
Zurück zum Zitat Wu J, Zhu X, Zhang C, Cai Z (2013) Multi-instance multi-graph dual embedding learning. In: ICDM, pp 827–836 Wu J, Zhu X, Zhang C, Cai Z (2013) Multi-instance multi-graph dual embedding learning. In: ICDM, pp 827–836
48.
Zurück zum Zitat Wu J, Zhu X, Zhang C, Yu PS (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26(10):2382–2396CrossRef Wu J, Zhu X, Zhang C, Yu PS (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26(10):2382–2396CrossRef
49.
Zurück zum Zitat Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by leap search. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 433–444. ACM Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by leap search. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 433–444. ACM
50.
Zurück zum Zitat Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: Proceedings of the ICDM, Maebashi City, Japan Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: Proceedings of the ICDM, Maebashi City, Japan
51.
Zurück zum Zitat Zhang D, Wang J, Wang F, Zhang C (2008) Semi-supervised classification with universum. In: SDM, pp 323–333. SIAM Zhang D, Wang J, Wang F, Zhang C (2008) Semi-supervised classification with universum. In: SDM, pp 323–333. SIAM
52.
Zurück zum Zitat Zhao Y, Kong X, Yu PS (2011) Positive and unlabeled learning for graph classification. In: IEEE 11th international conference on Data Mining (ICDM), 2011, pp 962–971. IEEE Zhao Y, Kong X, Yu PS (2011) Positive and unlabeled learning for graph classification. In: IEEE 11th international conference on Data Mining (ICDM), 2011, pp 962–971. IEEE
53.
Zurück zum Zitat Zhu X (2006) Semi-supervised learning literature survey. Comput Sci Univ Wis Madison 2:3 Zhu X (2006) Semi-supervised learning literature survey. Comput Sci Univ Wis Madison 2:3
54.
Zurück zum Zitat Zhu X (2011) Cross-domain semi-supervised learning using feature formulation. IEEE Trans Syst Man Cybern Part B 41(6):1627–1638CrossRef Zhu X (2011) Cross-domain semi-supervised learning using feature formulation. IEEE Trans Syst Man Cybern Part B 41(6):1627–1638CrossRef
55.
Zurück zum Zitat Zhu Y, Yu J, Cheng H, Qin L (2012) Graph classification: a diversified discriminative feature selection approach. In: Proceedings of the CIKM, pp 205–214. ACM Zhu Y, Yu J, Cheng H, Qin L (2012) Graph classification: a diversified discriminative feature selection approach. In: Proceedings of the CIKM, pp 205–214. ACM
Metadaten
Titel
Boosting for graph classification with universum
verfasst von
Shirui Pan
Jia Wu
Xingquan Zhu
Guodong Long
Chengqi Zhang
Publikationsdatum
19.03.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2017
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-016-0934-z

Weitere Artikel der Ausgabe 1/2017

Knowledge and Information Systems 1/2017 Zur Ausgabe