Skip to main content
Top
Published in: Pattern Analysis and Applications 3/2016

01-08-2016 | Theoretical Advances

Two density-based k-means initialization algorithms for non-metric data clustering

Authors: Filippo Maria Bianchi, Lorenzo Livi, Antonello Rizzi

Published in: Pattern Analysis and Applications | Issue 3/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we propose a density-based clusters’ representatives selection algorithm that identifies the most central patterns from the dense regions in the dataset. The method, which has been implemented using two different strategies, is applicable to input spaces with no trivial geometry. Our approach exploits a probability density function built through the Parzen estimator, which relies on a (not necessarily metric) dissimilarity measure. Being a representatives extractor a general-purpose algorithm, our method is obviously applicable in different contexts. However, to test the proposed procedure, we specifically consider the problem of initializing the k-means algorithm. We face problems defined on standard real-valued vectors, labeled graphs, and finally sequences of real-valued vectors and sequences of characters. The obtained results demonstrate the effectiveness of the proposed representative selection method with respect to other state-of-the-art solutions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms., SODA ’07Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1027–1035 Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms., SODA ’07Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1027–1035
3.
go back to reference Bardaji I, Ferrer M, Sanfeliu A (2010) A comparison between two representatives of a set of graphs: median vs. barycenter graph. In: Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition, SSPR&SPR’10. Springer, Berlin, pp 149–158 Bardaji I, Ferrer M, Sanfeliu A (2010) A comparison between two representatives of a set of graphs: median vs. barycenter graph. In: Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition, SSPR&SPR’10. Springer, Berlin, pp 149–158
5.
go back to reference Bulò SR, Pelillo M (2013) A game-theoretic approach to hypergraph clustering. IEEE Trans Pattern Anal Machine Intell 35(6):1312–1327CrossRef Bulò SR, Pelillo M (2013) A game-theoretic approach to hypergraph clustering. IEEE Trans Pattern Anal Machine Intell 35(6):1312–1327CrossRef
7.
go back to reference Del Vescovo G, Livi L, Frattale Mascioli FM, Rizzi A (2014) On the problem of modeling structured data with the MinSOD representative. Int J Comput Theory Eng 6(1):9–14. doi:10.7763/IJCTE.2014.V6.827 Del Vescovo G, Livi L, Frattale Mascioli FM, Rizzi A (2014) On the problem of modeling structured data with the MinSOD representative. Int J Comput Theory Eng 6(1):9–14. doi:10.​7763/​IJCTE.​2014.​V6.​827
8.
go back to reference Duin RPW, Fred ALN, Loog M, Pękalska E (2012) Mode seeking clustering by KNN and mean shift evaluated. In: Gimel’farb G, Hancock E, Imiya A, Kuijper A, Kudo M, Omachi S, Windeatt T, Yamada K (eds) Structural, syntactic, and statistical pattern recognition, LNCS, vol. 7626. Springer, Berlin, pp 51–59. doi:10.1007/978-3-642-34166-3\_6 Duin RPW, Fred ALN, Loog M, Pękalska E (2012) Mode seeking clustering by KNN and mean shift evaluated. In: Gimel’farb G, Hancock E, Imiya A, Kuijper A, Kudo M, Omachi S, Windeatt T, Yamada K (eds) Structural, syntactic, and statistical pattern recognition, LNCS, vol. 7626. Springer, Berlin, pp 51–59. doi:10.1007/978-3-642-34166-3\_6
9.
go back to reference Duin RPW, Pękalska E (2010) Non-Euclidean dissimilarities: causes and informativeness. In: Proceedings of the 2010 joint IAPR international conference on structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 324–333 Duin RPW, Pękalska E (2010) Non-Euclidean dissimilarities: causes and informativeness. In: Proceedings of the 2010 joint IAPR international conference on structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 324–333
10.
go back to reference Duin RPW, Pękalska E, Harol A, Lee WJ, Bunke H (2008) On Euclidean corrections for non-Euclidean dissimilarities. In: Vitoria Lobo N, Kasparis T, Roli F, Kwok J, Georgiopoulos M, Anagnostopoulos G, Loog M (eds) Structural, syntactic, and statistical pattern recognition, vol. 5342, LNCS. Springer, Berlin, pp 551–561. doi:10.1007/978-3-540-89689-0\_59 Duin RPW, Pękalska E, Harol A, Lee WJ, Bunke H (2008) On Euclidean corrections for non-Euclidean dissimilarities. In: Vitoria Lobo N, Kasparis T, Roli F, Kwok J, Georgiopoulos M, Anagnostopoulos G, Loog M (eds) Structural, syntactic, and statistical pattern recognition, vol. 5342, LNCS. Springer, Berlin, pp 551–561. doi:10.1007/978-3-540-89689-0\_59
11.
go back to reference Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231 Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
13.
go back to reference Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognit 41(1):176–190CrossRefMATH Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognit 41(1):176–190CrossRefMATH
15.
go back to reference Hinneburg A, Gabriel HH (2007) Denclue 2.0: fast clustering based on kernel density estimation. In: Advances in intelligent data analysis VII. Springer, Berlin, pp 70–80 Hinneburg A, Gabriel HH (2007) Denclue 2.0: fast clustering based on kernel density estimation. In: Advances in intelligent data analysis VII. Springer, Berlin, pp 70–80
17.
19.
go back to reference Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Dis Data 3(1):1:1–1:58. doi:10.1145/1497577.1497578 Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Dis Data 3(1):1:1–1:58. doi:10.​1145/​1497577.​1497578
20.
go back to reference Livi L, Bianchi FM, Rizzi A, Sadeghian A (2013) Dissimilarity space embedding of labeled graphs by a clustering-based compression procedure. In: Proceedings of the 2013 international joint conference on neural networks, pp 1646–1653. doi:10.1109/IJCNN.2013.6706937 Livi L, Bianchi FM, Rizzi A, Sadeghian A (2013) Dissimilarity space embedding of labeled graphs by a clustering-based compression procedure. In: Proceedings of the 2013 international joint conference on neural networks, pp 1646–1653. doi:10.​1109/​IJCNN.​2013.​6706937
21.
go back to reference Livi L, Del Vescovo G, Rizzi A (2012) Graph Recognition by Seriation and Frequent Substructures Mining. Proc First Int Conf Pattern Recognit Appl Methods 1:186–191. doi:10.5220/0003733201860191 Livi L, Del Vescovo G, Rizzi A (2012) Graph Recognition by Seriation and Frequent Substructures Mining. Proc First Int Conf Pattern Recognit Appl Methods 1:186–191. doi:10.​5220/​0003733201860191​
22.
go back to reference Livi L, Del Vescovo G, Rizzi A (2013) Combining graph seriation and substructures mining for graph recognition. In: Latorre Carmona P, Sánchez JS, Fred ALN (eds) Pattern recognition—applications and methods. Advances in intelligent and soft computing, vol 204. Springer, Berlin, pp 79–91. doi:10.1007/978-3-642-36530-0\_7 Livi L, Del Vescovo G, Rizzi A (2013) Combining graph seriation and substructures mining for graph recognition. In: Latorre Carmona P, Sánchez JS, Fred ALN (eds) Pattern recognition—applications and methods. Advances in intelligent and soft computing, vol 204. Springer, Berlin, pp 79–91. doi:10.1007/978-3-642-36530-0\_7
25.
go back to reference Livi L, Del Vescovo G, Rizzi A, Frattale Mascioli FM (2014) Building pattern recognition applications with the SPARE library. arXiv:1410.5263 Livi L, Del Vescovo G, Rizzi A, Frattale Mascioli FM (2014) Building pattern recognition applications with the SPARE library. arXiv:​1410.​5263
27.
go back to reference Livi L, Tahayori H, Sadeghian A, Rizzi A (2013) Aggregating \(\alpha\)-planes for Type-2 fuzzy set matching. In: 2013 Joint IFSA World Congress and NAFIPS annual meeting (IFSA/NAFIPS), pp 860–865 (2013). doi:10.1109/IFSA-NAFIPS.2013.6608513 Livi L, Tahayori H, Sadeghian A, Rizzi A (2013) Aggregating \(\alpha\)-planes for Type-2 fuzzy set matching. In: 2013 Joint IFSA World Congress and NAFIPS annual meeting (IFSA/NAFIPS), pp 860–865 (2013). doi:10.​1109/​IFSA-NAFIPS.​2013.​6608513
30.
go back to reference Ostrovsky R, Rabani Y, Schulman L, Swamy C (2006) The effectiveness of Lloyd-type methods for the k-means problem. In: FOCS ’06. 47th annual IEEE symposium on foundations of computer science, pp 165–176. doi:10.1109/FOCS.2006.75 Ostrovsky R, Rabani Y, Schulman L, Swamy C (2006) The effectiveness of Lloyd-type methods for the k-means problem. In: FOCS ’06. 47th annual IEEE symposium on foundations of computer science, pp 165–176. doi:10.​1109/​FOCS.​2006.​75
32.
go back to reference Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. In: Series in machine perception and artificial intelligence. World Scientific, Singapore Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. In: Series in machine perception and artificial intelligence. World Scientific, Singapore
33.
go back to reference Pekalska E, Harol A, Duin RPW, Spillmann B, Bunke H (2006) Non-Euclidean or non-metric measures can be informative. In: Yeung DY, Kwok J, Fred ALN, Roli F, Ridder D (eds) Structural, syntactic, and statistical pattern recognition, LNCS, vol 4109. Springer, Berlin, pp 871–880. doi:10.1007/11815921\_96 Pekalska E, Harol A, Duin RPW, Spillmann B, Bunke H (2006) Non-Euclidean or non-metric measures can be informative. In: Yeung DY, Kwok J, Fred ALN, Roli F, Ridder D (eds) Structural, syntactic, and statistical pattern recognition, LNCS, vol 4109. Springer, Berlin, pp 871–880. doi:10.1007/11815921\_96
34.
go back to reference Riesen K, Bunke H (2008) IAM graph database repository for graph based pattern recognition and machine learning. In: Proceedings of the 2008 joint IAPR international workshop on structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 287–297. doi:10.1007/978-3-540-89689-0\_33 Riesen K, Bunke H (2008) IAM graph database repository for graph based pattern recognition and machine learning. In: Proceedings of the 2008 joint IAPR international workshop on structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 287–297. doi:10.1007/978-3-540-89689-0\_33
35.
go back to reference Rizzi A, Del Vescovo G, Livi L, Frattale Mascioli FM (2012) A new granular computing approach for sequences representation and classification. In: Proceedings of the 2012 international joint conference on neural networks, pp 2268–2275. doi:10.1109/IJCNN.2012.6252680 Rizzi A, Del Vescovo G, Livi L, Frattale Mascioli FM (2012) A new granular computing approach for sequences representation and classification. In: Proceedings of the 2012 international joint conference on neural networks, pp 2268–2275. doi:10.​1109/​IJCNN.​2012.​6252680
36.
go back to reference Rizzi A, Livi L, Tahayori H, Sadeghian A (2013) Matching general type-2 fuzzy sets by comparing the vertical slices. In: 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), pp 866–871. doi:10.1109/IFSA-NAFIPS.2013.6608514 Rizzi A, Livi L, Tahayori H, Sadeghian A (2013) Matching general type-2 fuzzy sets by comparing the vertical slices. In: 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), pp 866–871. doi:10.1109/IFSA-NAFIPS.2013.6608514
39.
go back to reference Theodoridis S, Koutroumbas K (2008) Pattern recognition, 4th edn. Elsevier/Academic Press, Amsterdam Theodoridis S, Koutroumbas K (2008) Pattern recognition, 4th edn. Elsevier/Academic Press, Amsterdam
42.
go back to reference Yager RR, Filev DP (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24(8):1279–1284CrossRef Yager RR, Filev DP (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24(8):1279–1284CrossRef
43.
go back to reference Yu XG, Jian Y (2005) A new clustering algorithm based on knn and denclue. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 4. IEEE, New York, pp 2033–2038 Yu XG, Jian Y (2005) A new clustering algorithm based on knn and denclue. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 4. IEEE, New York, pp 2033–2038
Metadata
Title
Two density-based k-means initialization algorithms for non-metric data clustering
Authors
Filippo Maria Bianchi
Lorenzo Livi
Antonello Rizzi
Publication date
01-08-2016
Publisher
Springer London
Published in
Pattern Analysis and Applications / Issue 3/2016
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-014-0440-4

Other articles of this Issue 3/2016

Pattern Analysis and Applications 3/2016 Go to the issue

Premium Partner