Skip to main content
Erschienen in: Advances in Data Analysis and Classification 1/2020

16.05.2019 | Regular Article

Learning a metric when clustering data points in the presence of constraints

verfasst von: Ahmad Ali Abin, Mohammad Ali Bashiri, Hamid Beigy

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Learning an appropriate distance measure under supervision of side information has become a topic of significant interest within machine learning community. In this paper, we address the problem of metric learning for constrained clustering by considering three important issues: (1) considering importance degree for constraints, (2) preserving the topological structure of data, and (3) preserving some natural distribution properties in the data. This work provides a unified way to handle different issues in constrained clustering by learning an appropriate distance measure. It has modeled the first issue by injecting the importance degree of constraints directly into an objective function. The topological structure of data is preserved by minimizing the reconstruction error of data in the target space. Finally we addressed the issue of preserving natural distribution properties in the data by using the proximity information of data. We have proposed two different methods to address the above mentioned issues. The first approach learns a linear transformation of data into a target space (linear-model) and the second one uses kernel functions to learn an appropriate distance measure (non-linear-model). Experiments show that considering these issues significantly improves clustering accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abin AA (2016a) Clustering with side information: further efforts to improve efficiency. Pattern Recognit Lett 84:252–258CrossRef Abin AA (2016a) Clustering with side information: further efforts to improve efficiency. Pattern Recognit Lett 84:252–258CrossRef
Zurück zum Zitat Abin AA (2016b) Querying beneficial constraints before clustering using facility location analysis. IEEE Trans Cybern 99:1–12 Abin AA (2016b) Querying beneficial constraints before clustering using facility location analysis. IEEE Trans Cybern 99:1–12
Zurück zum Zitat Abin AA, Beigy H (2014) Active selection of clustering constraints: a sequential approach. Pattern Recognit 47(3):1443–1458CrossRef Abin AA, Beigy H (2014) Active selection of clustering constraints: a sequential approach. Pattern Recognit 47(3):1443–1458CrossRef
Zurück zum Zitat Abin AA, Beigy H (2015) Active constrained fuzzy clustering: a multiple kernels learning approach. Pattern Recognit 48(3):953–967MATHCrossRef Abin AA, Beigy H (2015) Active constrained fuzzy clustering: a multiple kernels learning approach. Pattern Recognit 48(3):953–967MATHCrossRef
Zurück zum Zitat Arenas A, Diaz-Guilera A, Kurths J, Moreno Y, Zhou C (2008) Synchronization in complex networks. Phys Rep 469(3):93–153MathSciNetCrossRef Arenas A, Diaz-Guilera A, Kurths J, Moreno Y, Zhou C (2008) Synchronization in complex networks. Phys Rep 469(3):93–153MathSciNetCrossRef
Zurück zum Zitat Baghshah MS, Afsari F, Shouraki SB, Eslami E (2014) Scalable semi-supervised clustering by spectral kernel learning. Pattern Recognit Lett 45:161–171CrossRef Baghshah MS, Afsari F, Shouraki SB, Eslami E (2014) Scalable semi-supervised clustering by spectral kernel learning. Pattern Recognit Lett 45:161–171CrossRef
Zurück zum Zitat Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a Mahalanobis metric from equivalence constraints. J Mach Learn Res 6:937–965MathSciNetMATH Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a Mahalanobis metric from equivalence constraints. J Mach Learn Res 6:937–965MathSciNetMATH
Zurück zum Zitat Basu S, Banerjee A, Mooney RJ (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the 5th SIAM international conference on data mining, ICDM ’04, pp 333–344 Basu S, Banerjee A, Mooney RJ (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the 5th SIAM international conference on data mining, ICDM ’04, pp 333–344
Zurück zum Zitat Basu S, Davidson I, Wagstaff KL (2008) Constrained clustering: advances in algorithms, theory, and applications. Chapman and Hall/CRC, Boca RatonMATH Basu S, Davidson I, Wagstaff KL (2008) Constrained clustering: advances in algorithms, theory, and applications. Chapman and Hall/CRC, Boca RatonMATH
Zurück zum Zitat Bilenko M, Basu S, Mooney RJ (2004a) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 21th international conference on Machine learning, ICML ’04, pp 11–18 Bilenko M, Basu S, Mooney RJ (2004a) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 21th international conference on Machine learning, ICML ’04, pp 11–18
Zurück zum Zitat Bilenko M, Basu S, Mooney RJ (2004b) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on Machine learning, ACM, p 11 Bilenko M, Basu S, Mooney RJ (2004b) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on Machine learning, ACM, p 11
Zurück zum Zitat Chang H, yan Yeung D (2004) Locally linear metric adaptation for semi-supervised clustering. In: Proceedings of the 21th international conference on machine learning, ICML ’04, pp 153–160 Chang H, yan Yeung D (2004) Locally linear metric adaptation for semi-supervised clustering. In: Proceedings of the 21th international conference on machine learning, ICML ’04, pp 153–160
Zurück zum Zitat Cheng H, Hua KA, Vu K (2008) Constrained locally weighted clustering. Proc VLDB Endow 1(1):90–101CrossRef Cheng H, Hua KA, Vu K (2008) Constrained locally weighted clustering. Proc VLDB Endow 1(1):90–101CrossRef
Zurück zum Zitat Davidson I, Wagstaff KL, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: Proceedings of the 10th European conference on Principle and practice of knowledge discovery in databases, PKDD ’06, pp 115–126 Davidson I, Wagstaff KL, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: Proceedings of the 10th European conference on Principle and practice of knowledge discovery in databases, PKDD ’06, pp 115–126
Zurück zum Zitat Ding S, Jia H, Zhang L, Jin F (2014) Research of semi-supervised spectral clustering algorithm based on pairwise constraints. Neural Comput Appl 24(1):211–219CrossRef Ding S, Jia H, Zhang L, Jin F (2014) Research of semi-supervised spectral clustering algorithm based on pairwise constraints. Neural Comput Appl 24(1):211–219CrossRef
Zurück zum Zitat Gong C, Fu K, Wu Q, Tu E, Yang J (2014) Semi-supervised classification with pairwise constraints. Neurocomputing 139:130–137CrossRef Gong C, Fu K, Wu Q, Tu E, Yang J (2014) Semi-supervised classification with pairwise constraints. Neurocomputing 139:130–137CrossRef
Zurück zum Zitat He P, Xu X, Hu K, Chen L (2014) Semi-supervised clustering via multi-level random walk. Pattern Recognit 47(2):820–832MATHCrossRef He P, Xu X, Hu K, Chen L (2014) Semi-supervised clustering via multi-level random walk. Pattern Recognit 47(2):820–832MATHCrossRef
Zurück zum Zitat Hertz T, Bar-hillel A, Weinshall D (2004) Boosting margin based distance functions for clustering. In: Proceedings of the 21th international conference on machine learning, ICML ’04, pp 393–400 Hertz T, Bar-hillel A, Weinshall D (2004) Boosting margin based distance functions for clustering. In: Proceedings of the 21th international conference on machine learning, ICML ’04, pp 393–400
Zurück zum Zitat Hoi SCH, Jin R, Lyu MR (2007) Learning non-parametric kernel matrices from pairwise constraints. In: Proceedings of the 24th international conference on Machine learning, ICML ’07, pp 361–368 Hoi SCH, Jin R, Lyu MR (2007) Learning non-parametric kernel matrices from pairwise constraints. In: Proceedings of the 24th international conference on Machine learning, ICML ’07, pp 361–368
Zurück zum Zitat Jain P, Kulis B, Davis JV, Dhillon IS (2012) Metric and kernel learning using a linear transformation. J Mach Learn Res 13:519–547MathSciNetMATH Jain P, Kulis B, Davis JV, Dhillon IS (2012) Metric and kernel learning using a linear transformation. J Mach Learn Res 13:519–547MathSciNetMATH
Zurück zum Zitat Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32(5):656–665CrossRef Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32(5):656–665CrossRef
Zurück zum Zitat Khoreva A, Galasso F, Hein M, Schiele B (2014) Learning must-link constraints for video segmentation based on spectral clustering. In: Pattern recognition, Springer, pp 701–712 Khoreva A, Galasso F, Hein M, Schiele B (2014) Learning must-link constraints for video segmentation based on spectral clustering. In: Pattern recognition, Springer, pp 701–712
Zurück zum Zitat Kulis B, Basu S, Dhillon I, Mooney R (2009) Semi-supervised graph clustering: a kernel approach. Mach Learn 74(1):1–22CrossRef Kulis B, Basu S, Dhillon I, Mooney R (2009) Semi-supervised graph clustering: a kernel approach. Mach Learn 74(1):1–22CrossRef
Zurück zum Zitat Liu H, Wu Z, Li X, Cai D, Huang T (2012) Constrained non-negative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311CrossRef Liu H, Wu Z, Li X, Cai D, Huang T (2012) Constrained non-negative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311CrossRef
Zurück zum Zitat Melnykov V, Melnykov I, Michael S (2016) Semi-supervised model-based clustering with positive and negative constraints. Adv Data Anal Classif 10(3):327–349MathSciNetMATHCrossRef Melnykov V, Melnykov I, Michael S (2016) Semi-supervised model-based clustering with positive and negative constraints. Adv Data Anal Classif 10(3):327–349MathSciNetMATHCrossRef
Zurück zum Zitat Milligan G, Cooper M (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behav Res 21(4):441–458CrossRef Milligan G, Cooper M (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behav Res 21(4):441–458CrossRef
Zurück zum Zitat Okabe M, Yamada S (2009) Clustering with constrained similarity learning. In: Proceedings of the 2009 IEEE/WIC/ACM international conference on web intelligence and international conference on intelligent agent technology, pp 30–33 Okabe M, Yamada S (2009) Clustering with constrained similarity learning. In: Proceedings of the 2009 IEEE/WIC/ACM international conference on web intelligence and international conference on intelligent agent technology, pp 30–33
Zurück zum Zitat Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326CrossRef Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326CrossRef
Zurück zum Zitat Sinkkonen J, Kaski S (2002) Clustering based on conditional distributions in an auxiliary space. Neural Comput 14:217–239MATHCrossRef Sinkkonen J, Kaski S (2002) Clustering based on conditional distributions in an auxiliary space. Neural Comput 14:217–239MATHCrossRef
Zurück zum Zitat Soleymani Baghshah M, Bagheri Shouraki S (2010) Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data. Pattern Recognit 43:2982–2992MATHCrossRef Soleymani Baghshah M, Bagheri Shouraki S (2010) Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data. Pattern Recognit 43:2982–2992MATHCrossRef
Zurück zum Zitat Truong D, Battiti R (2013) A flexible cluster-oriented alternative clustering algorithm for choosing from the pareto front of solutions. Mach Learn 98(1):57–91MathSciNetMATH Truong D, Battiti R (2013) A flexible cluster-oriented alternative clustering algorithm for choosing from the pareto front of solutions. Mach Learn 98(1):57–91MathSciNetMATH
Zurück zum Zitat Vu VV, Labroche N, Bouchon-Meunier B (2012) Improving constrained clustering with active query selection. Pattern Recognit 45(4):1749–1758CrossRef Vu VV, Labroche N, Bouchon-Meunier B (2012) Improving constrained clustering with active query selection. Pattern Recognit 45(4):1749–1758CrossRef
Zurück zum Zitat Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the seventeenth national conference on artificial intelligence and twelfth conference on on innovative applications of artificial intelligence, July 30–August 3, 2000, Austin, Texas, USA, p 1097 Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the seventeenth national conference on artificial intelligence and twelfth conference on on innovative applications of artificial intelligence, July 30–August 3, 2000, Austin, Texas, USA, p 1097
Zurück zum Zitat Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the 18th international conference on machine learning, ICML ’01, pp 577–584 Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the 18th international conference on machine learning, ICML ’01, pp 577–584
Zurück zum Zitat Wang Q, Yuen PC, Feng G (2013) Semi-supervised metric learning via topology preserving multiple semi-supervised assumptions. Pattern Recognit 46(9):2576–2587MATHCrossRef Wang Q, Yuen PC, Feng G (2013) Semi-supervised metric learning via topology preserving multiple semi-supervised assumptions. Pattern Recognit 46(9):2576–2587MATHCrossRef
Zurück zum Zitat Wu S, Feng X, Zhou W (2014) Spectral clustering of high-dimensional data exploiting sparse representation vectors. Neurocomputing 135:229–239CrossRef Wu S, Feng X, Zhou W (2014) Spectral clustering of high-dimensional data exploiting sparse representation vectors. Neurocomputing 135:229–239CrossRef
Zurück zum Zitat Xiang S, Nie F, Zhang C (2008) Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognit 41(12):3600–3612MATHCrossRef Xiang S, Nie F, Zhang C (2008) Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognit 41(12):3600–3612MATHCrossRef
Zurück zum Zitat Ye J, Zhao Z, Liu H (2007) Adaptive distance metric learning for clustering. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), pp 1–7 Ye J, Zhao Z, Liu H (2007) Adaptive distance metric learning for clustering. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), pp 1–7
Zurück zum Zitat Yin X, Chen S, Hu E, Zhang D (2010) Semi-supervised clustering with metric learning: an adaptive kernel method. Pattern Recognit 43(4):1320–1333MATHCrossRef Yin X, Chen S, Hu E, Zhang D (2010) Semi-supervised clustering with metric learning: an adaptive kernel method. Pattern Recognit 43(4):1320–1333MATHCrossRef
Zurück zum Zitat Zhang Z, Zhao M, Chow TWS (2012) Marginal semi-supervised sub-manifold projections with informative constraints for dimensionality reduction and recognition. Neural Netw 36:97–111MATHCrossRef Zhang Z, Zhao M, Chow TWS (2012) Marginal semi-supervised sub-manifold projections with informative constraints for dimensionality reduction and recognition. Neural Netw 36:97–111MATHCrossRef
Metadaten
Titel
Learning a metric when clustering data points in the presence of constraints
verfasst von
Ahmad Ali Abin
Mohammad Ali Bashiri
Hamid Beigy
Publikationsdatum
16.05.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 1/2020
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-019-00359-6

Weitere Artikel der Ausgabe 1/2020

Advances in Data Analysis and Classification 1/2020 Zur Ausgabe

Regular Article

Count regression trees

Premium Partner