Skip to main content
Top
Published in: Advances in Data Analysis and Classification 3/2020

27-11-2019 | Regular Article

Is-ClusterMPP: clustering algorithm through point processes and influence space towards high-dimensional data

Authors: Khadidja Henni, Pierre-Yves Louis, Brigitte Vannier, Ahmed Moussa

Published in: Advances in Data Analysis and Classification | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Clustering via marked point processes and influence space, Is-ClusterMPP, is a new unsupervised clustering algorithm through adaptive MCMC sampling of a marked point processes of interacting balls. The designed Gibbs energy cost function makes use of k-influence space information. It detects clusters of different shapes, sizes and unbalanced local densities. It aims at dealing also with high-dimensional datasets. By using the k-influence space, Is-ClusterMPP solves the problem of local heterogeneity in densities and prevents the impact of the global density in the detection of unbalanced classes. This concept reduces also the input values amount. The curse of dimensionality is handled by using a local subspace clustering principal embedded in a weighted similarity metric. Balls covering data points are constituting a configuration sampled from a marked point process (MPP). Due to the choice of the energy function, they tends to cover neighboring data, which share the same cluster. The statistical model of random balls is sampled through a Monte Carlo Markovian dynamical approach. The energy is balancing different goals. (1) The data driven objective function is provided according to k-influence space. Data in a high-dense region are favored to be covered by a ball. (2) An interaction part in the energy prevents the balls full overlap phenomenon and favors connected groups of balls. The algorithm through Markov dynamics, does converge towards configurations sampled from the MPP model. This algorithm has been applied in real benchmarks through gene expression data set of various sizes. Different experiments have been done to compare Is-ClusterMPP against the most well-known clustering algorithms and its efficiency is claimed.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Adovanovic M, Nanopoulos A, Ivanovic M (2010) Hubs in space: popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531MathSciNetMATH Adovanovic M, Nanopoulos A, Ivanovic M (2010) Hubs in space: popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531MathSciNetMATH
go back to reference Alata O, Burg S, Dupas A (2011) Grouping/degrouping point process, a point process driven by geometrical and topological properties of a partition in regions. Comput Vis Image Underst 115(9):1324–1339 Alata O, Burg S, Dupas A (2011) Grouping/degrouping point process, a point process driven by geometrical and topological properties of a partition in regions. Comput Vis Image Underst 115(9):1324–1339
go back to reference Baddeley A, Rubak E, Turner R (2015) Spatial point patterns: methodology and applications with R. Chapman and Hall/CRC. ISBN 9781482210200 Baddeley A, Rubak E, Turner R (2015) Spatial point patterns: methodology and applications with R. Chapman and Hall/CRC. ISBN 9781482210200
go back to reference Bar-Hen A, Emily M, Picard N (2015) Spatial cluster detection using nearest neighbor distance. Spat Stat 14(Part C):400–411MathSciNet Bar-Hen A, Emily M, Picard N (2015) Spatial cluster detection using nearest neighbor distance. Spat Stat 14(Part C):400–411MathSciNet
go back to reference Bhagat PM, Halgaonkar PS, Wadhai VM (2013) Review of clustering algorithm for categorical data. Int J Eng Adv Technol 32:2249–8958 Bhagat PM, Halgaonkar PS, Wadhai VM (2013) Review of clustering algorithm for categorical data. Int J Eng Adv Technol 32:2249–8958
go back to reference Bhole P, Agrawal AJ (2014) Extractive based single document text summarization using clustering approach. IAES Int J Artif Intell 3(2):73–78 Bhole P, Agrawal AJ (2014) Extractive based single document text summarization using clustering approach. IAES Int J Artif Intell 3(2):73–78
go back to reference Bisheng Y, Wenxue X, Zhen D (2013) Automated extraction of building outlines from airborne laser scanning point clouds. Geosci Remote Sens Lett IEEE 10(6):1399–1403 Bisheng Y, Wenxue X, Zhen D (2013) Automated extraction of building outlines from airborne laser scanning point clouds. Geosci Remote Sens Lett IEEE 10(6):1399–1403
go back to reference Bohm C, Railing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Fourth IEEE international conference on data mining (ICDM’04), Icdm 04, pP 27–34. ISBN 0-7695-2142-8 Bohm C, Railing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Fourth IEEE international conference on data mining (ICDM’04), Icdm 04, pP 27–34. ISBN 0-7695-2142-8
go back to reference Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519MathSciNetMATH Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519MathSciNetMATH
go back to reference Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 7819 LNAI, pp 160–172. ISBN 9783642374555 Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 7819 LNAI, pp 160–172. ISBN 9783642374555
go back to reference Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: parameter reduction and outlier detection. Inf Syst 38(3):317–330 Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: parameter reduction and outlier detection. Inf Syst 38(3):317–330
go back to reference Chin YC, Baddeley AJ (1999) On connected component Markov point processes. Adv Appl Probab 31(2):279–282MathSciNetMATH Chin YC, Baddeley AJ (1999) On connected component Markov point processes. Adv Appl Probab 31(2):279–282MathSciNetMATH
go back to reference Chiu SN, Stoyan D, Kendall WS, Mecke J (2013) Stochastic geometry and its applications. Wiley, Lodon. ISBN 978-0-470-66481-0 Chiu SN, Stoyan D, Kendall WS, Mecke J (2013) Stochastic geometry and its applications. Wiley, Lodon. ISBN 978-0-470-66481-0
go back to reference Datar M, Immorlica N, Indyk P (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: SCG’04, June 9–11, 2004, Brooklyn, New York, USA., pp 253–262. ISBN 1581138857 Datar M, Immorlica N, Indyk P (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: SCG’04, June 9–11, 2004, Brooklyn, New York, USA., pp 253–262. ISBN 1581138857
go back to reference Descombes X, Zerubia J (2002) Marked point processes in image analysis. IEEE Signal Process Mag 19(5):77–84 Descombes X, Zerubia J (2002) Marked point processes in image analysis. IEEE Signal Process Mag 19(5):77–84
go back to reference Descombes X, Zhizhina E (2004) Applications of Gibbs fields methods to image processing problems. Probl Inf Transm 40(3):279–295MathSciNetMATH Descombes X, Zhizhina E (2004) Applications of Gibbs fields methods to image processing problems. Probl Inf Transm 40(3):279–295MathSciNetMATH
go back to reference Descombes X, Stoica R, Garcin L, Zerubia J (2001) A RJMCMC algorithm for object processes in image processing. Monte Carlo Methods Appl 7(1–2):149–156MathSciNetMATH Descombes X, Stoica R, Garcin L, Zerubia J (2001) A RJMCMC algorithm for object processes in image processing. Monte Carlo Methods Appl 7(1–2):149–156MathSciNetMATH
go back to reference Descombes X, Minlos R, Zhizhina E (2009) Object extraction using a Stochastic birth-and-death dynamics in continuum. J Math Imaging Vis 33(3):347–359MathSciNet Descombes X, Minlos R, Zhizhina E (2009) Object extraction using a Stochastic birth-and-death dynamics in continuum. J Math Imaging Vis 33(3):347–359MathSciNet
go back to reference El Sonbaty Y, Ismail MA, Farouk M (2004) An efficient density based clustering algorithm for large databases. In: 16th IEEE international conference on tools with artificial intelligence, Ictai, pp 637–677. ISBN 0-7695-2236-X El Sonbaty Y, Ismail MA, Farouk M (2004) An efficient density based clustering algorithm for large databases. In: 16th IEEE international conference on tools with artificial intelligence, Ictai, pp 637–677. ISBN 0-7695-2236-X
go back to reference Elavarasi SA, Akilandeswari J, Sathiyabhama B (2011) A survey on partition clustering algorithms. Learning 1(1):1–14 Elavarasi SA, Akilandeswari J, Sathiyabhama B (2011) A survey on partition clustering algorithms. Learning 1(1):1–14
go back to reference Elbatta MTH, Ashour WM (2013) A dynamic method for discovering density varied clusters. Int J Signal Process Image Process Pattern Recognit 6(1):123–134 Elbatta MTH, Ashour WM (2013) A dynamic method for discovering density varied clusters. Int J Signal Process Image Process Pattern Recognit 6(1):123–134
go back to reference Elgendy N, Elragal A (2014) Big data analytics: a literature review paper. Adv Data Min Appl Theor Asp 8557:214–227 Elgendy N, Elragal A (2014) Big data analytics: a literature review paper. Adv Data Min Appl Theor Asp 8557:214–227
go back to reference Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster Analysis, 5th edn. Wiley, LondonMATH Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster Analysis, 5th edn. Wiley, LondonMATH
go back to reference Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya A, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279 Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya A, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279
go back to reference Fahim A, Salem A (2006) Density clustering based on radius of data (DCBRD). Informatika 16:344–349 Fahim A, Salem A (2006) Density clustering based on radius of data (DCBRD). Informatika 16:344–349
go back to reference Flexer A, Schnitzer D (2015) Choosing \(L^p\) norms in high-dimensional spaces based on hub analysis. Neurocomputing 169:281–287 Flexer A, Schnitzer D (2015) Choosing \(L^p\) norms in high-dimensional spaces based on hub analysis. Neurocomputing 169:281–287
go back to reference Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetMATH Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetMATH
go back to reference Gaonkar MN, Sawant K (2013) Auto Eps DBSCAN: DBSCAN with Eps automatic for large dataset. Int J Adv Comput Theory Eng 2(2):2319–2526 Gaonkar MN, Sawant K (2013) Auto Eps DBSCAN: DBSCAN with Eps automatic for large dataset. Int J Adv Comput Theory Eng 2(2):2319–2526
go back to reference Green PJ (1995) Reversible jump Markov chain monte carlo computation and Bayesian model determination. Biometrika 82(4):711–732MathSciNetMATH Green PJ (1995) Reversible jump Markov chain monte carlo computation and Bayesian model determination. Biometrika 82(4):711–732MathSciNetMATH
go back to reference Gul A, Perperoglou A, Khan Z, Mahmoud O, Miftahuddin M, Adler W, Lausen B (2018) Ensemble of a subset of knn classifiers. Adv Data Anal Classif 12(4):827–840 MathSciNetMATH Gul A, Perperoglou A, Khan Z, Mahmoud O, Miftahuddin M, Adler W, Lausen B (2018) Ensemble of a subset of knn classifiers. Adv Data Anal Classif 12(4):827–840 MathSciNetMATH
go back to reference Henni K, Alata O, Alidrissi A, Vannier B, Zaoui L, Moussa A (2017a) Marked point processes for MicroArray data clustering. In: Palumbo F, Montanari A, Vichi M (eds) Data science—innovative developments in data analysis and clustering: studies in classification, data analysis and knowledge organization, pp 125–137. Springer, Berlin. ISBN 978-3-319-55723-6 Henni K, Alata O, Alidrissi A, Vannier B, Zaoui L, Moussa A (2017a) Marked point processes for MicroArray data clustering. In: Palumbo F, Montanari A, Vichi M (eds) Data science—innovative developments in data analysis and clustering: studies in classification, data analysis and knowledge organization, pp 125–137. Springer, Berlin. ISBN 978-3-319-55723-6
go back to reference Henni K, Alata O, Zaoui L, Vannier B, Alidrissi A, Moussa A (2017b) ClusterMPP: an unsupervised density-based clustering algorithm via marked point process. Intell Data Anal 21(4):827–847 Henni K, Alata O, Zaoui L, Vannier B, Alidrissi A, Moussa A (2017b) ClusterMPP: an unsupervised density-based clustering algorithm via marked point process. Intell Data Anal 21(4):827–847
go back to reference Ilango M, Mohan V (2010) A survey of grid based clustering algorithms. Int J Eng Sci Technol 2(8):3441–3446 Ilango M, Mohan V (2010) A survey of grid based clustering algorithms. Int J Eng Sci Technol 2(8):3441–3446
go back to reference Lv Y, Ma T, Tang M, Cao J, Tian Y (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171:9–22 Lv Y, Ma T, Tang M, Cao J, Tian Y (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171:9–22
go back to reference Martin E, Hans-Peter K, Jörg S, Xiaowei X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Second international conference on knowledge discovery and data mining, pp 226–231. AAAI Press Martin E, Hans-Peter K, Jörg S, Xiaowei X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Second international conference on knowledge discovery and data mining, pp 226–231. AAAI Press
go back to reference Møller J, Waagepetersen RP (2004) Statistical inference and simulation for spatial point processes. Chapman and Hall/CRC, London. ISBN 9781584882657 Møller J, Waagepetersen RP (2004) Statistical inference and simulation for spatial point processes. Chapman and Hall/CRC, London. ISBN 9781584882657
go back to reference Montanari A, Calò DG (2013) Model-based clustering of probability density functions. Adv Data Anal Classif 7(3):301–319MathSciNetMATH Montanari A, Calò DG (2013) Model-based clustering of probability density functions. Adv Data Anal Classif 7(3):301–319MathSciNetMATH
go back to reference Moussa A, Sbihi A, Postaire JG (2008) A Markov random field model for mode detection in cluster analysis. Pattern Recognit Lett 29(9):1197–1207 Moussa A, Sbihi A, Postaire JG (2008) A Markov random field model for mode detection in cluster analysis. Pattern Recognit Lett 29(9):1197–1207
go back to reference Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 2(1):86–97 Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 2(1):86–97
go back to reference Ng RT, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016 Ng RT, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016
go back to reference Ortner M, Descombes X, Zerubia J (2007) Building outline extraction from Digital Elevation Models using marked point processes. Int J Comput Vis 72:107–132 Ortner M, Descombes X, Zerubia J (2007) Building outline extraction from Digital Elevation Models using marked point processes. Int J Comput Vis 72:107–132
go back to reference Peng L, Dong Z, Naijun W (2007) VDBSCAN: varied density based spatial clustering of applications with noise. In: Proceedings—ICSSSM’07: 2007 international conference on service systems and service management. ISBN 1424408857 Peng L, Dong Z, Naijun W (2007) VDBSCAN: varied density based spatial clustering of applications with noise. In: Proceedings—ICSSSM’07: 2007 international conference on service systems and service management. ISBN 1424408857
go back to reference Ram A, Jalal S, Jalal AS, Kumar M (2010) DVBSCAN: a density based algorithm for discovering density varied clusters in large spatial databases. Int J Comput Appl 3(6):1–4 Ram A, Jalal S, Jalal AS, Kumar M (2010) DVBSCAN: a density based algorithm for discovering density varied clusters in large spatial databases. Int J Comput Appl 3(6):1–4
go back to reference Rehman SU, Asghar S, Fong S, Sarasvady S (2014) DBSCAN: past, present and future. In: The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014), pp 232–238 Rehman SU, Asghar S, Fong S, Sarasvady S (2014) DBSCAN: past, present and future. In: The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014), pp 232–238
go back to reference Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks—PPT. Science 344(6191):1492–1496 Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks—PPT. Science 344(6191):1492–1496
go back to reference Samé A, Ambroise C, Govaert G (2007) An online classification EM algorithm based on the mixture model. Stat Comput 17(3):209–218MathSciNet Samé A, Ambroise C, Govaert G (2007) An online classification EM algorithm based on the mixture model. Stat Comput 17(3):209–218MathSciNet
go back to reference Stoica RS (2014) Modélisation probabiliste et inférence statistique pour l’analyse des données spatialisées. Research Habilitation Thesis, Université Lille 1 Stoica RS (2014) Modélisation probabiliste et inférence statistique pour l’analyse des données spatialisées. Research Habilitation Thesis, Université Lille 1
go back to reference Stoica RS, Descombes X, Zerubia J (2004) A Gibbs point process for road extraction from remotely sensed images. Int J Comput Vis 57(2):121–136 Stoica RS, Descombes X, Zerubia J (2004) A Gibbs point process for road extraction from remotely sensed images. Int J Comput Vis 57(2):121–136
go back to reference Stoica RS, Gay E, Kretzschmar A (2007) Cluster pattern detection in spatial data based on Monte Carlo inference. Biom J 49(4):505–519MathSciNet Stoica RS, Gay E, Kretzschmar A (2007) Cluster pattern detection in spatial data based on Monte Carlo inference. Biom J 49(4):505–519MathSciNet
go back to reference Stoica RS, Martinez VJ, Saar E (2010) Filaments in observed and mock galaxy catalogues. Astron Astrophys 510:1–12 Stoica RS, Martinez VJ, Saar E (2010) Filaments in observed and mock galaxy catalogues. Astron Astrophys 510:1–12
go back to reference Sun S, Huang R (2010) An adaptive k-nearest neighbor algorithm. In: Proceedings—2010 7th international conference on fuzzy systems and knowledge discovery, FSKD 2010, vol 1, pp 91–94. ISBN 9781424459346 Sun S, Huang R (2010) An adaptive k-nearest neighbor algorithm. In: Proceedings—2010 7th international conference on fuzzy systems and knowledge discovery, FSKD 2010, vol 1, pp 91–94. ISBN 9781424459346
go back to reference Uncu O, Gruver WA, Kotak DB, Sabaz D, Alibhai Z, Ng C (2007) GRIDBSCAN: GRId density-based spatial clustering of applications with noise. In: Conference Proceedings—IEEE International conference on systems, man and cybernetics, vol 4, pp 2976–2981. ISBN 1424401003 Uncu O, Gruver WA, Kotak DB, Sabaz D, Alibhai Z, Ng C (2007) GRIDBSCAN: GRId density-based spatial clustering of applications with noise. In: Conference Proceedings—IEEE International conference on systems, man and cybernetics, vol 4, pp 2976–2981. ISBN 1424401003
go back to reference van Lieshout MNM (2000) Markov point processes and their applications. Imperial College Press, LondonMATH van Lieshout MNM (2000) Markov point processes and their applications. Imperial College Press, LondonMATH
go back to reference van Lieshout MNM, Stoica RS (2006) Perfect simulation for marked point processes. Comput Stat Data Anal 51:679–698MathSciNetMATH van Lieshout MNM, Stoica RS (2006) Perfect simulation for marked point processes. Comput Stat Data Anal 51:679–698MathSciNetMATH
go back to reference Xiaopeng Y, Deyi Z, Yan Z (2005) A new clustering algorithm based on distance and density. In: International conference on services systems and services management, vol 2 Xiaopeng Y, Deyi Z, Yan Z (2005) A new clustering algorithm based on distance and density. In: International conference on services systems and services management, vol 2
go back to reference Zhang Q, Yang LT, Chen Z, Xia F (2015) A high-order possibilistic-means algorithm for clustering incomplete multimedia data. IEEE Syst J PP(99):1–10 Zhang Q, Yang LT, Chen Z, Xia F (2015) A high-order possibilistic-means algorithm for clustering incomplete multimedia data. IEEE Syst J PP(99):1–10
go back to reference Zhu X, Melnykov V (2015) Probabilistic assessment of model-based clustering. Adv Data Anal Classif 9(4):395–422MathSciNetMATH Zhu X, Melnykov V (2015) Probabilistic assessment of model-based clustering. Adv Data Anal Classif 9(4):395–422MathSciNetMATH
Metadata
Title
Is-ClusterMPP: clustering algorithm through point processes and influence space towards high-dimensional data
Authors
Khadidja Henni
Pierre-Yves Louis
Brigitte Vannier
Ahmed Moussa
Publication date
27-11-2019
Publisher
Springer Berlin Heidelberg
Published in
Advances in Data Analysis and Classification / Issue 3/2020
Print ISSN: 1862-5347
Electronic ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-019-00379-2

Other articles of this Issue 3/2020

Advances in Data Analysis and Classification 3/2020 Go to the issue

Premium Partner