Skip to main content

2019 | OriginalPaper | Buchkapitel

On the k-Medoids Model for Semi-supervised Clustering

verfasst von : Rodrigo Randel, Daniel Aloise, Nenad Mladenović, Pierre Hansen

Erschienen in: Variable Neighborhood Search

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering is an automated and powerful technique for data analysis. It aims to divide a given set of data points into clusters which are homogeneous and/or well separated. A major challenge with clustering is to define an appropriate clustering criterion that can express a good separation of data into homogeneous groups such that the obtained clustering solution is meaningful and useful to the user. To circumvent this issue, it is suggested that the domain expert could provide background information about the dataset, which can be incorporated by a clustering algorithm in order to improve the solution. Performing clustering under this assumption is known as semi-supervised clustering. This work explores semi-supervised clustering through the k-medoids model. Results obtained by a Variable Neighborhood Search (VNS) heuristic show that the k-medoids model presents classification accuracy compared to the traditional k-means approach. Furthermore, the model demonstrates high flexibility and performance by combining kernel projections with pairwise constraints.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)MathSciNetMATH Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)MathSciNetMATH
2.
Zurück zum Zitat Delattre, M., Hansen, P.: Bicriterion cluster analysis. IEEE TPAMI 4, 277–291 (1980)MATH Delattre, M., Hansen, P.: Bicriterion cluster analysis. IEEE TPAMI 4, 277–291 (1980)MATH
3.
Zurück zum Zitat Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC, Boca Raton (2013)MATH Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC, Boca Raton (2013)MATH
4.
Zurück zum Zitat Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2008)MATH Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2008)MATH
5.
Zurück zum Zitat Hansen, P., Mladenović, N.: Variable neighborhood search: principles and applications. EJOR 130(3), 449–467 (2001)MathSciNetMATH Hansen, P., Mladenović, N.: Variable neighborhood search: principles and applications. EJOR 130(3), 449–467 (2001)MathSciNetMATH
6.
Zurück zum Zitat Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001) Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)
7.
Zurück zum Zitat Bekkerman, R., Sahami, M.: Semi-supervised clustering using combinatorial MRFs. In: ICML (2006) Bekkerman, R., Sahami, M.: Semi-supervised clustering using combinatorial MRFs. In: ICML (2006)
8.
Zurück zum Zitat Yan, B., Domeniconi, C.: An adaptive kernel method for semi-supervised clustering. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 521–532. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_49 Yan, B., Domeniconi, C.: An adaptive kernel method for semi-supervised clustering. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 521–532. Springer, Heidelberg (2006). https://​doi.​org/​10.​1007/​11871842_​49
9.
Zurück zum Zitat Law, M.H.C., Topchy, A., Jain, A.K.: Model-based clustering with probabilistic constraints. In: SIAM-SDM, pp. 641–645 (2005) Law, M.H.C., Topchy, A., Jain, A.K.: Model-based clustering with probabilistic constraints. In: SIAM-SDM, pp. 641–645 (2005)
10.
Zurück zum Zitat Ruiz, C., Spiliopoulou, M., Menasalvas, E.: Density-based semi-supervised clustering. Data Min. Knowl. Disc. 21(3), 345–370 (2010)MathSciNet Ruiz, C., Spiliopoulou, M., Menasalvas, E.: Density-based semi-supervised clustering. Data Min. Knowl. Disc. 21(3), 345–370 (2010)MathSciNet
11.
Zurück zum Zitat Xia, Y.: A global optimization method for semi-supervised clustering. Data Min. Knowl. Disc. 18(2), 214–256 (2009)MathSciNet Xia, Y.: A global optimization method for semi-supervised clustering. Data Min. Knowl. Disc. 18(2), 214–256 (2009)MathSciNet
12.
Zurück zum Zitat Tuy, H.: Concave programming under linear constraints. Soviet Math. 5, 1437–1440 (1964)MATH Tuy, H.: Concave programming under linear constraints. Soviet Math. 5, 1437–1440 (1964)MATH
13.
Zurück zum Zitat Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel approach. Mach. Learn. 74(1), 1–22 (2009) Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel approach. Mach. Learn. 74(1), 1–22 (2009)
14.
Zurück zum Zitat Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comp. 10(5), 1299–1319 (1998) Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comp. 10(5), 1299–1319 (1998)
15.
Zurück zum Zitat Christofides, N.: Graph Theory: An Algorithmic Approach (Computer Science and Applied Mathematics). Academic Press Inc., Orlando (1975)MATH Christofides, N.: Graph Theory: An Algorithmic Approach (Computer Science and Applied Mathematics). Academic Press Inc., Orlando (1975)MATH
16.
Zurück zum Zitat Steinley, D.: K-medoids and other criteria for crisp clustering. In: Handbook of Cluster Analysis. Chapman and Hall/CRC Handbooks of Modern Statistical Methods. CRC Press (2015) Steinley, D.: K-medoids and other criteria for crisp clustering. In: Handbook of Cluster Analysis. Chapman and Hall/CRC Handbooks of Modern Statistical Methods. CRC Press (2015)
17.
Zurück zum Zitat Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (program PAM). In: Finding Groups in Data: An Introduction to Cluster Analysis, pp. 68–125 (1990) Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (program PAM). In: Finding Groups in Data: An Introduction to Cluster Analysis, pp. 68–125 (1990)
18.
Zurück zum Zitat Teitz, M.B., Bart, P.: Heuristic methods for estimating the generalized vertex median of aweighted graph. Oper. Res. 16(5), 955–961 (1968)MATH Teitz, M.B., Bart, P.: Heuristic methods for estimating the generalized vertex median of aweighted graph. Oper. Res. 16(5), 955–961 (1968)MATH
19.
Zurück zum Zitat Whitaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21(2), 95–108 (1983)MATH Whitaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21(2), 95–108 (1983)MATH
20.
Zurück zum Zitat Hansen, P., Mladenović, N.: Variable neighborhood search for the P-median. Locat. Sci. 5(4), 207–226 (1997)MATH Hansen, P., Mladenović, N.: Variable neighborhood search for the P-median. Locat. Sci. 5(4), 207–226 (1997)MATH
21.
Zurück zum Zitat Resende, M.G.C., Werneck, R.F.: A fast swap-based local search procedure for location problems. ANOR 150(1), 205–230 (2007)MathSciNetMATH Resende, M.G.C., Werneck, R.F.: A fast swap-based local search procedure for location problems. ANOR 150(1), 205–230 (2007)MathSciNetMATH
22.
Zurück zum Zitat Davidson, I., Ravi, S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: SIAM-SDM, pp. 138–149 (2005) Davidson, I., Ravi, S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: SIAM-SDM, pp. 138–149 (2005)
23.
Zurück zum Zitat Mladenović, N., Hansen, P.: Variable neighborhood search. Comput. Oper. Res. 24(11), 1097–1100 (1997)MathSciNetMATH Mladenović, N., Hansen, P.: Variable neighborhood search. Comput. Oper. Res. 24(11), 1097–1100 (1997)MathSciNetMATH
24.
Zurück zum Zitat Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415, 247–253 (2017) Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415, 247–253 (2017)
25.
Zurück zum Zitat Hansen, P., Ruiz, M., Aloise, D.: A VNS heuristic for escaping local extrema entrapment in normalized cut clustering. Pattern Recog. 45(12), 4337–4345 (2012) Hansen, P., Ruiz, M., Aloise, D.: A VNS heuristic for escaping local extrema entrapment in normalized cut clustering. Pattern Recog. 45(12), 4337–4345 (2012)
26.
Zurück zum Zitat Hansen, P., Mladenović, N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recog. 34(2), 405–413 (2001)MATH Hansen, P., Mladenović, N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recog. 34(2), 405–413 (2001)MATH
27.
Zurück zum Zitat Santi, É., Aloise, D., Blanchard, S.J.: A model for clustering data from heterogeneous dissimilarities. EJOR 253(3), 659–672 (2016)MathSciNetMATH Santi, É., Aloise, D., Blanchard, S.J.: A model for clustering data from heterogeneous dissimilarities. EJOR 253(3), 659–672 (2016)MathSciNetMATH
28.
Zurück zum Zitat Kleinberg, J.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems, pp. 463–470 (2003) Kleinberg, J.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems, pp. 463–470 (2003)
29.
Zurück zum Zitat Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)MATH Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)MATH
30.
Zurück zum Zitat Lichman, M.: UCI machine learning repository (2013) Lichman, M.: UCI machine learning repository (2013)
31.
Zurück zum Zitat Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML 2004, pp. 81–88. ACM, New York (2004) Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML 2004, pp. 81–88. ACM, New York (2004)
Metadaten
Titel
On the k-Medoids Model for Semi-supervised Clustering
verfasst von
Rodrigo Randel
Daniel Aloise
Nenad Mladenović
Pierre Hansen
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-15843-9_2

Premium Partner