Top

Pattern Analysis and Applications

Published in:

01-08-2016 | Theoretical Advances

Two density-based k-means initialization algorithms for non-metric data clustering

Authors: Filippo Maria Bianchi, Lorenzo Livi, Antonello Rizzi

Published in: Pattern Analysis and Applications | Issue 3/2016

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, we propose a density-based clusters’ representatives selection algorithm that identifies the most central patterns from the dense regions in the dataset. The method, which has been implemented using two different strategies, is applicable to input spaces with no trivial geometry. Our approach exploits a probability density function built through the Parzen estimator, which relies on a (not necessarily metric) dissimilarity measure. Being a representatives extractor a general-purpose algorithm, our method is obviously applicable in different contexts. However, to test the proposed procedure, we specifically consider the problem of initializing the k-means algorithm. We face problems defined on standard real-valued vectors, labeled graphs, and finally sequences of real-valued vectors and sequences of characters. The obtained results demonstrate the effectiveness of the proposed representative selection method with respect to other state-of-the-art solutions.

previous article Emerging opinion leaders in crowd unfollow crisis: a case study of mobile brands in Twitter

next article Statistical shape modelling for expression-invariant face analysis and recognition

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

http://libspare.org/

Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms., SODA ’07Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1027–1035

Bache K, Lichman M (2013) UCI Machine learning repository. http://archive.ics.uci.edu/ml

Bardaji I, Ferrer M, Sanfeliu A (2010) A comparison between two representatives of a set of graphs: median vs. barycenter graph. In: Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition, SSPR&SPR’10. Springer, Berlin, pp 149–158

Bianchi FM, Livi L, Rizzi A, Sadeghian A (2014) A Granular Computing approach to the design of optimized graph classification systems. Soft Comput 18(2):393–412. doi:10.1007/s00500-013-1065-z CrossRef

Bulò SR, Pelillo M (2013) A game-theoretic approach to hypergraph clustering. IEEE Trans Pattern Anal Machine Intell 35(6):1312–1327CrossRef

Cilibrasi R, Vitányi PMB (2005) Clustering by compression. IEEE Trans Inf Theory 51(4):1523–1545MathSciNetCrossRefMATH

Del Vescovo G, Livi L, Frattale Mascioli FM, Rizzi A (2014) On the problem of modeling structured data with the MinSOD representative. Int J Comput Theory Eng 6(1):9–14. doi:10.7763/IJCTE.2014.V6.827

Duin RPW, Fred ALN, Loog M, Pękalska E (2012) Mode seeking clustering by KNN and mean shift evaluated. In: Gimel’farb G, Hancock E, Imiya A, Kuijper A, Kudo M, Omachi S, Windeatt T, Yamada K (eds) Structural, syntactic, and statistical pattern recognition, LNCS, vol. 7626. Springer, Berlin, pp 51–59. doi:10.1007/978-3-642-34166-3\_6

Duin RPW, Pękalska E (2010) Non-Euclidean dissimilarities: causes and informativeness. In: Proceedings of the 2010 joint IAPR international conference on structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 324–333

10.

Duin RPW, Pękalska E, Harol A, Lee WJ, Bunke H (2008) On Euclidean corrections for non-Euclidean dissimilarities. In: Vitoria Lobo N, Kasparis T, Roli F, Kwok J, Georgiopoulos M, Anagnostopoulos G, Loog M (eds) Structural, syntactic, and statistical pattern recognition, vol. 5342, LNCS. Springer, Berlin, pp 551–561. doi:10.1007/978-3-540-89689-0\_59

11.

Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231

12.

Filippone M (2009) Dealing with non-metric dissimilarities in Fuzzy central clustering algorithms. Int J Approx Reas 50(2):363–384. doi:10.1016/j.ijar.2008.08.006 CrossRefMATH

13.

Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognit 41(1):176–190CrossRefMATH

14.

Mascioli Frattale FM, Rizzi A, Panella M, Martinelli G (2000) Scale-based approach to hierarchical fuzzy clustering. Signal Process 80(6):1001–1016. doi:10.1016/S0165-1684(00)00016-5

15.

Hinneburg A, Gabriel HH (2007) Denclue 2.0: fast clustering based on kernel density estimation. In: Advances in intelligent data analysis VII. Springer, Berlin, pp 70–80

16.

Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666. doi:10.1016/j.patrec.2009.09.011 CrossRef

17.

Jiang X, Müunger A, Bunke H (2001) On median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 23:1144–1151. doi:10.1109/34.954604 CrossRef

18.

Kannan R, Vempala S, Vetta A (2004) On clusterings: Good, bad, and spectral. J ACM (JACM) 51:497–515MathSciNetCrossRefMATH

19.

Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Dis Data 3(1):1:1–1:58. doi:10.1145/1497577.1497578

20.

Livi L, Bianchi FM, Rizzi A, Sadeghian A (2013) Dissimilarity space embedding of labeled graphs by a clustering-based compression procedure. In: Proceedings of the 2013 international joint conference on neural networks, pp 1646–1653. doi:10.1109/IJCNN.2013.6706937

21.

Livi L, Del Vescovo G, Rizzi A (2012) Graph Recognition by Seriation and Frequent Substructures Mining. Proc First Int Conf Pattern Recognit Appl Methods 1:186–191. doi:10.5220/0003733201860191

22.

Livi L, Del Vescovo G, Rizzi A (2013) Combining graph seriation and substructures mining for graph recognition. In: Latorre Carmona P, Sánchez JS, Fred ALN (eds) Pattern recognition—applications and methods. Advances in intelligent and soft computing, vol 204. Springer, Berlin, pp 79–91. doi:10.1007/978-3-642-36530-0\_7

23.

Livi L, Rizzi A (2013) Graph ambiguity. Fuzzy Sets Syst 221:24–47. doi:10.1016/j.fss.2013.01.001 MathSciNetCrossRefMATH

24.

Livi L, Rizzi A (2013) The graph matching problem. Pattern Anal Appl 16(3):253–283. doi:10.1007/s10044-012-0284-8 MathSciNetCrossRefMATH

25.

Livi L, Del Vescovo G, Rizzi A, Frattale Mascioli FM (2014) Building pattern recognition applications with the SPARE library. arXiv:1410.5263

26.

Livi L, Rizzi A, Sadeghian A (2014) Optimized dissimilarity space embedding for labeled graphs. Inf Sci 266:47–64. doi:10.1016/j.ins.2014.01.005 MathSciNetCrossRef

27.

Livi L, Tahayori H, Sadeghian A, Rizzi A (2013) Aggregating \(\alpha\)-planes for Type-2 fuzzy set matching. In: 2013 Joint IFSA World Congress and NAFIPS annual meeting (IFSA/NAFIPS), pp 860–865 (2013). doi:10.1109/IFSA-NAFIPS.2013.6608513

28.

Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137. doi:10.1109/TIT.1982.1056489 MathSciNetCrossRefMATH

29.

Lozano MA, Escolano F (2013) Graph matching and clustering using kernel attributes. Neurocomputing 113:177–194. doi:10.1016/j.neucom.2013.01.015 CrossRef

30.

Ostrovsky R, Rabani Y, Schulman L, Swamy C (2006) The effectiveness of Lloyd-type methods for the k-means problem. In: FOCS ’06. 47th annual IEEE symposium on foundations of computer science, pp 165–176. doi:10.1109/FOCS.2006.75

31.

Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33(3):1065–1076MathSciNetCrossRefMATH

32.

Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. In: Series in machine perception and artificial intelligence. World Scientific, Singapore

33.

Pekalska E, Harol A, Duin RPW, Spillmann B, Bunke H (2006) Non-Euclidean or non-metric measures can be informative. In: Yeung DY, Kwok J, Fred ALN, Roli F, Ridder D (eds) Structural, syntactic, and statistical pattern recognition, LNCS, vol 4109. Springer, Berlin, pp 871–880. doi:10.1007/11815921\_96

34.

Riesen K, Bunke H (2008) IAM graph database repository for graph based pattern recognition and machine learning. In: Proceedings of the 2008 joint IAPR international workshop on structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 287–297. doi:10.1007/978-3-540-89689-0\_33

35.

Rizzi A, Del Vescovo G, Livi L, Frattale Mascioli FM (2012) A new granular computing approach for sequences representation and classification. In: Proceedings of the 2012 international joint conference on neural networks, pp 2268–2275. doi:10.1109/IJCNN.2012.6252680

36.

Rizzi A, Livi L, Tahayori H, Sadeghian A (2013) Matching general type-2 fuzzy sets by comparing the vertical slices. In: 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), pp 866–871. doi:10.1109/IFSA-NAFIPS.2013.6608514

37.

Roth V, Laub J, Kawanabe M, Buhmann J (2003) Optimal cluster preserving embedding of nonmetric proximity data. IEEE Trans Pattern Anal Mach Intell 25(12):1540–1551. doi:10.1109/TPAMI.2003.1251147 CrossRef

38.

Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. doi:10.1016/0377-0427(87)90125-7 CrossRefMATH

39.

Theodoridis S, Koutroumbas K (2008) Pattern recognition, 4th edn. Elsevier/Academic Press, Amsterdam

40.

Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881. doi:10.1109/TPAMI.2005.237 CrossRef

41.

Vendramin L, Campello RJGB, Hruschka ER (2010) Relative clustering validity criteria: a comparative overview. Stat Anal Data Min 3(4):209–235. doi:10.1002/sam.v3:4 MathSciNet

42.

Yager RR, Filev DP (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24(8):1279–1284CrossRef

43.

Yu XG, Jian Y (2005) A new clustering algorithm based on knn and denclue. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 4. IEEE, New York, pp 2033–2038

Title: Two density-based k-means initialization algorithms for non-metric data clustering
Authors: Filippo Maria Bianchi
Lorenzo Livi
Antonello Rizzi
Publication date: 01-08-2016
Publisher: Springer London
Published in: Pattern Analysis and Applications / Issue 3/2016
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-014-0440-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2016

Emerging opinion leaders in crowd unfollow crisis: a case study of mobile brands in Twitter

Cross-document word matching for segmentation and retrieval of Ottoman divans

Approaching the accuracy–cost conflict in embedded classification system design

Scale-space module detection for random fields observed on a graph non-embedded in a metric space

Hidden Markov models for gene sequence classification

Scribble-based object segmentation with modified gaussian mixture models

Premium Partner