Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 2/2012

01.06.2012 | Original Article

Non-Parametric Kernel Learning with robust pairwise constraints

verfasst von: Changyou Chen, Junping Zhang, Xuefang He, Zhi-Hua Zhou

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 2/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For existing kernel learning based semi-supervised clustering algorithms, it is generally difficult to scale well with large scale datasets and robust pairwise constraints. In this paper, we propose a new Non-Parametric Kernel Learning ( NPKL ) framework to deal with these problems. We generalize the graph embedding framework into kernel learning, by reforming it as a semi-definitive programming ( SDP ) problem, smoothing and avoiding over-smoothing the functional Hilbert space with Laplacian regularization. We propose two algorithms to solve this problem. One is a straightforward algorithm using SDP to solve the original kernel learning problem, dented as TRAnsductive Graph Embedding Kernel ( TRAGEK ) learning; the other is to relax the SDP problem and solve it with a constrained gradient descent algorithm. To accelerate the learning speed, we further divide the data into groups and used the sub-kernels of these groups to approximate the whole kernel matrix. This algorithm is denoted as Efficient Non-PArametric Kernel Learning ( ENPAKL ). The advantages of the proposed NPKL framework are (1) supervised information in the form of pairwise constraints can be easily incorporated; (2) it is robust to the number of pairwise constraints, i.e., the number of constraints does not affect the running time too much; (3) ENPAKL is efficient to some extent compared to some related kernel learning algorithms since it is a constraint gradient descent based algorithm. Experiments for clustering based on the learned kernels show that the proposed framework scales well with the size of datasets and the number of pairwise constraints. Further experiments for image segmentation indicate the potential advantages of the proposed algorithms over the traditional k-means and N-cut clustering algorithms for image segmentation in term of segmentation accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Fußnoten
1
Note that although we want to learn a kernel matrix from the aspect of graph embedding, it has little relationship with some algorithms using graph embedding framework such as marginal factor analysis (MFA) [6]. The reason is that such kinds of algorithms aim at supervised learning for classification, thus there is no need to compare the proposed algorithm with them.
 
2
We use such kind of convention without declaration below.
 
3
More satisfactory results might be attained if employing more sophisticated regularizers.
 
4
Note that for two positive semi-definite matrices A and B\( tr\{AB\} \geq 0\) holds, but not always >0. We thus can not drop the last two constraints directly. Otherwise it is easy to run into numerical problem. Because the constraint still holds if 〈l p K v 〉 is equal to a small enough positive constant, but this is far from the goal that 〈l p K v 〉 should be as large as possible.
 
5
There are two points to be declared here. One is that it is easy to prove that K X in Eq. 32 is a positive semi-definite matrix if K D is positive semi-definite, this property makes K X of the whole dataset still be a kernel matrix, which does not violate our objective. The second point is that for unknown points, in order to constrain the feature space be a hyperball, we need to normalize the weights calculated in Eq. 30 by dividing the weights by a normalized scaler w i T K D w i , that is, \(w_i = \frac{w_i}{w_i^TK_Dw_i}. \)
 
6
This experiment was run on an Intel Core 2 Duo CPU T6400 2.00 GHZ with 2 GB of DDR2 memory
 
7
We used an efficient implementation of the N-cut algorithm in [38]
 
8
The k-means algorithm takes about 1 s for one image, the N-cut algorithm takes about 2 s, while ENPAKL needs about 5 min, and PCP cannot run in this experiment because the corresponding data is too large. How to accelerate the speed of the proposed algorithm further is our future work.
 
Literatur
1.
Zurück zum Zitat Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
2.
Zurück zum Zitat Kulis B, Basu S, Dhillon I, Mooney R (2005) Semi-supervised graph clustering: a kernel approach. In: Proceedings of the 22nd international conference on machine learning, pp 457–464 Kulis B, Basu S, Dhillon I, Mooney R (2005) Semi-supervised graph clustering: a kernel approach. In: Proceedings of the 22nd international conference on machine learning, pp 457–464
3.
Zurück zum Zitat Li Z, Liu J, Tang X (2008) Pairwise constraint propagation by semidefinite programming for semi-supervised classification. In: Proceedings of the 25th international conference on machine learning, pp 576–583 Li Z, Liu J, Tang X (2008) Pairwise constraint propagation by semidefinite programming for semi-supervised classification. In: Proceedings of the 25th international conference on machine learning, pp 576–583
4.
Zurück zum Zitat Yeung D-Y, Chang H, Dai G (2007) A scalable kernel-based algorithm for semi-supervised metric learning. In: Proceedings of the 20th international joint conference on artificial intelligence, pp 1138–1143 Yeung D-Y, Chang H, Dai G (2007) A scalable kernel-based algorithm for semi-supervised metric learning. In: Proceedings of the 20th international joint conference on artificial intelligence, pp 1138–1143
5.
Zurück zum Zitat Cortes C, Mohri M, Rostamizadeh A (2009) Learning non-linear combinations of kernels. In: Advances in neural information processing systems, vol 21 Cortes C, Mohri M, Rostamizadeh A (2009) Learning non-linear combinations of kernels. In: Advances in neural information processing systems, vol 21
6.
Zurück zum Zitat Yan SC, Xu D, Zhang BY, Zhang H-J, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51CrossRef Yan SC, Xu D, Zhang BY, Zhang H-J, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51CrossRef
7.
Zurück zum Zitat Yang J, Yan SC, Fu Y, Li XL, Huang TS (2008) Non-negative graph embedding. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition Yang J, Yan SC, Fu Y, Li XL, Huang TS (2008) Non-negative graph embedding. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition
8.
Zurück zum Zitat Schölkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge Schölkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
9.
Zurück zum Zitat Cortes C, Mohri M, Rostamizadeh A (2010) Two-stage learning kernel algorithms. In: Proceedings of the 27th international conference on machine learning Cortes C, Mohri M, Rostamizadeh A (2010) Two-stage learning kernel algorithms. In: Proceedings of the 27th international conference on machine learning
10.
Zurück zum Zitat Cortes C, Mohri M, Rostamizadeh A (2010) Generalization bounds for learning kernels. In: Proceedings of the 27th international conference on machine learning Cortes C, Mohri M, Rostamizadeh A (2010) Generalization bounds for learning kernels. In: Proceedings of the 27th international conference on machine learning
11.
Zurück zum Zitat Jin R, Hoi SCH, Yang T (2010) Online multiple kernel learning: algorithms and mistake bounds. In: Proceedings of the 21st international conference algorithmic learning theory, pp 390–404 Jin R, Hoi SCH, Yang T (2010) Online multiple kernel learning: algorithms and mistake bounds. In: Proceedings of the 21st international conference algorithmic learning theory, pp 390–404
12.
Zurück zum Zitat Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition—part II. IEEE Trans Syst Man Cybern Part B 29(6):786–801CrossRef Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition—part II. IEEE Trans Syst Man Cybern Part B 29(6):786–801CrossRef
13.
Zurück zum Zitat Yang MS, Wu KL, Hsieh JN, Yu J (2008) Alpha-cut implemented fuzzy clustering algorithms and switching regressions. IEEE Trans Syst Man Cybern Part B 38(3):904–915CrossRef Yang MS, Wu KL, Hsieh JN, Yu J (2008) Alpha-cut implemented fuzzy clustering algorithms and switching regressions. IEEE Trans Syst Man Cybern Part B 38(3):904–915CrossRef
14.
Zurück zum Zitat Trappey AJC, Trappey CV, Hsu F-C, Hsiao DW (2009) A fuzzy ontological knowledge document clustering methodology. IEEE Trans Syst Man Cybern Part B 39(3):123–131CrossRef Trappey AJC, Trappey CV, Hsu F-C, Hsiao DW (2009) A fuzzy ontological knowledge document clustering methodology. IEEE Trans Syst Man Cybern Part B 39(3):123–131CrossRef
15.
Zurück zum Zitat Xiong H, Wu J, Chen J (2009) k-Means clustering versus validation measures: a data-distribution perspective. IEEE Trans Syst Man Cybern Part B 39(2):318–331CrossRef Xiong H, Wu J, Chen J (2009) k-Means clustering versus validation measures: a data-distribution perspective. IEEE Trans Syst Man Cybern Part B 39(2):318–331CrossRef
16.
Zurück zum Zitat Basu S, Bilenko M, Mooney R (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 59–68 Basu S, Bilenko M, Mooney R (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 59–68
17.
Zurück zum Zitat Bilenko M, Basu S, Mooney R (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 21st international conference on machine learning, pp 81–89 Bilenko M, Basu S, Mooney R (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 21st international conference on machine learning, pp 81–89
18.
Zurück zum Zitat Kamvar SD, Klein D, Manning C (2003) Spectral learning. In: Proceedings of the 18th international joint conference on artificial intelligence, pp 561–566 Kamvar SD, Klein D, Manning C (2003) Spectral learning. In: Proceedings of the 18th international joint conference on artificial intelligence, pp 561–566
19.
Zurück zum Zitat Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems, vol 15 Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems, vol 15
20.
Zurück zum Zitat Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the 18th international conference on machine learning, pp. 798–803 Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the 18th international conference on machine learning, pp. 798–803
21.
Zurück zum Zitat Hong Y, Kwong S (2009) Learning assignment order of instances for the constrained k-means clustering algorithm. IEEE Trans Syst Man Cybern Part B 39(2):568–574CrossRef Hong Y, Kwong S (2009) Learning assignment order of instances for the constrained k-means clustering algorithm. IEEE Trans Syst Man Cybern Part B 39(2):568–574CrossRef
22.
Zurück zum Zitat Lu Z, Leen TK (2005) Semi-supervised learning with penalized probabilistic clustering. In: Advances in neural information processing systems, vol 17, pp 849–856 Lu Z, Leen TK (2005) Semi-supervised learning with penalized probabilistic clustering. In: Advances in neural information processing systems, vol 17, pp 849–856
23.
Zurück zum Zitat Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a Mahalanobis metric from equivalence constraints. J Mach Learn Res 6:937–965MathSciNetMATH Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a Mahalanobis metric from equivalence constraints. J Mach Learn Res 6:937–965MathSciNetMATH
24.
Zurück zum Zitat Hertz T, Bar-Hillel A, Weinshall D (2004) Boosting margin based distance function for clustering. In: Proceedings of the 21st international conference on machine learning, pp 393–400 Hertz T, Bar-Hillel A, Weinshall D (2004) Boosting margin based distance function for clustering. In: Proceedings of the 21st international conference on machine learning, pp 393–400
25.
Zurück zum Zitat Xu Z, Dai M, Meng D (2009) Fast and efficient strategies for model selection of Gaussian support vector machine. IEEE Trans Syst Man Cybern Part B 39(5):1292–1307CrossRef Xu Z, Dai M, Meng D (2009) Fast and efficient strategies for model selection of Gaussian support vector machine. IEEE Trans Syst Man Cybern Part B 39(5):1292–1307CrossRef
26.
Zurück zum Zitat Dhillon I, Guan Y, Kulis B (2004) Kernel k-means, spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556 Dhillon I, Guan Y, Kulis B (2004) Kernel k-means, spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556
27.
Zurück zum Zitat Bousquet O, Herrmann D (2003) On the complexity of learning the kernel matrix. In: Advances in neural information processing systems, vol 15, pp 399–406 Bousquet O, Herrmann D (2003) On the complexity of learning the kernel matrix. In: Advances in neural information processing systems, vol 15, pp 399–406
28.
Zurück zum Zitat Hoi SCH, Jin R, Lyu MR (2007) Learning nonparametric kernel matrices from pairwise constraints. In: Proceedings of the 24th international conference on machine learning, pp 361–368 Hoi SCH, Jin R, Lyu MR (2007) Learning nonparametric kernel matrices from pairwise constraints. In: Proceedings of the 24th international conference on machine learning, pp 361–368
29.
Zurück zum Zitat Zhuang J, Tsang IW, Hoi SCH (2009) SimpleNPKL: simple non-parametric kernel learning. In: Proceedings of the 26th international conference on machine learning, pp 1273–1280 Zhuang J, Tsang IW, Hoi SCH (2009) SimpleNPKL: simple non-parametric kernel learning. In: Proceedings of the 26th international conference on machine learning, pp 1273–1280
30.
Zurück zum Zitat Zhou DY, Huang J, Schölkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd international conference on machine learning, pp 1036–1043 Zhou DY, Huang J, Schölkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd international conference on machine learning, pp 1036–1043
31.
Zurück zum Zitat Sturm JF (1999) Using SeDuMi 1.02, a matlab toolbox for optimization over symmetric cones. Optim Methods Softw 11(2):625–653MathSciNetCrossRef Sturm JF (1999) Using SeDuMi 1.02, a matlab toolbox for optimization over symmetric cones. Optim Methods Softw 11(2):625–653MathSciNetCrossRef
32.
Zurück zum Zitat Adler RL, Dedieu JP, Margulies JY, Martens M, Shub M (2002) Newton’s method on Riemannian manifolds and a geometric model for the human spine. IMA J Numer Anal 22(3):359–390MathSciNetMATHCrossRef Adler RL, Dedieu JP, Margulies JY, Martens M, Shub M (2002) Newton’s method on Riemannian manifolds and a geometric model for the human spine. IMA J Numer Anal 22(3):359–390MathSciNetMATHCrossRef
33.
Zurück zum Zitat Golub GH, Loan CFV (1996) Matrix computation. Johns Hopkins University Press, Baltimore Golub GH, Loan CFV (1996) Matrix computation. Johns Hopkins University Press, Baltimore
34.
Zurück zum Zitat Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326CrossRef Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326CrossRef
36.
Zurück zum Zitat Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRef Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRef
39.
Zurück zum Zitat Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef
40.
Zurück zum Zitat Tan B, Zhang J, Wang L (2011) Semi-supervised elastic net for pedestrian counting. Pattern Recognit 44(10–11):2297–2304CrossRef Tan B, Zhang J, Wang L (2011) Semi-supervised elastic net for pedestrian counting. Pattern Recognit 44(10–11):2297–2304CrossRef
41.
Zurück zum Zitat Duchenne O, Audibert J-Y, Keriven R, Ponce J, Segonne F (2008) Segmentation by transduction. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition Duchenne O, Audibert J-Y, Keriven R, Ponce J, Segonne F (2008) Segmentation by transduction. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition
Metadaten
Titel
Non-Parametric Kernel Learning with robust pairwise constraints
verfasst von
Changyou Chen
Junping Zhang
Xuefang He
Zhi-Hua Zhou
Publikationsdatum
01.06.2012
Verlag
Springer-Verlag
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 2/2012
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-011-0048-6

Weitere Artikel der Ausgabe 2/2012

International Journal of Machine Learning and Cybernetics 2/2012 Zur Ausgabe

Neuer Inhalt