Skip to main content
Top
Published in: The Journal of Supercomputing 4/2021

14-09-2020

Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

Authors: Malihe Danesh, Morteza Dorrigiv, Farzin Yaghmaee

Published in: The Journal of Supercomputing | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Graphs are commonly used to express the communication of various data. Faced with uncertain data, we have probabilistic graphs. As a fundamental problem of such graphs, clustering has many applications in analyzing uncertain data. In this paper, we propose a novel method based on ensemble clustering for large probabilistic graphs. To generate ensemble clusters, we develop a set of probable possible worlds of the initial probabilistic graph. Then, we present a probabilistic co-association matrix as a consensus function to integrate base clustering results. It relies on co-occurrences of node pairs based on the probability of the corresponding common cluster graphs. Also, we apply two improvements in the steps before and after of ensembles generation. In the before step, we append neighborhood information based on node features to the initial graph to achieve a more accurate estimation of the probability between the nodes. In the after step, we use supervised metric learning-based Mahalanobis distance to automatically learn a metric from ensemble clusters. It aims to gain crucial features of the base clustering results. We evaluate our work using five real-world datasets and three clustering evaluation metrics, namely the Dunn index, Davies–Bouldin index, and Silhouette coefficient. The results show the impressive performance of clustering large probabilistic graphs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zou Z, Li J, Gao H et al (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Trans Knowl Data Eng 22:1203–1218CrossRef Zou Z, Li J, Gao H et al (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Trans Knowl Data Eng 22:1203–1218CrossRef
2.
go back to reference Papapetrou O, Ioannou E, Skoutas D (2011) Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: EDBT/ICDT’11, pp 355–366 Papapetrou O, Ioannou E, Skoutas D (2011) Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: EDBT/ICDT’11, pp 355–366
3.
go back to reference Potamias M, Bonchi F, Gionis A et al (2010) k-nearest neighbors in uncertain graphs. Proc VLDB Endow 3(1):997–1008CrossRef Potamias M, Bonchi F, Gionis A et al (2010) k-nearest neighbors in uncertain graphs. Proc VLDB Endow 3(1):997–1008CrossRef
4.
go back to reference Strehl A, Ghosh J (2003) Cluster ensembles—A knowledge reuse framework for combining partitions. J Mach Learn Res 3:583–617MathSciNetMATH Strehl A, Ghosh J (2003) Cluster ensembles—A knowledge reuse framework for combining partitions. J Mach Learn Res 3:583–617MathSciNetMATH
5.
go back to reference Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881CrossRef Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881CrossRef
6.
7.
go back to reference Boongoen T, Iam-On N (2018) Cluster ensembles: a survey of approaches with recent extensions and applications. Comput Sci Rev 28:1–25MathSciNetCrossRef Boongoen T, Iam-On N (2018) Cluster ensembles: a survey of approaches with recent extensions and applications. Comput Sci Rev 28:1–25MathSciNetCrossRef
8.
go back to reference Alqurashi T, Wang W (2019) Clustering ensemble method. Int J Mach Learn Cyb 10:1227–1246CrossRef Alqurashi T, Wang W (2019) Clustering ensemble method. Int J Mach Learn Cyb 10:1227–1246CrossRef
9.
go back to reference Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recogn Artif Intell 25(03):337–372MathSciNetCrossRef Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recogn Artif Intell 25(03):337–372MathSciNetCrossRef
10.
go back to reference Kollios G, Potamias M, Terzi E (2013) Clustering large probabilistic graphs. IEEE Trans Knowl Data Eng 25(2):325–336CrossRef Kollios G, Potamias M, Terzi E (2013) Clustering large probabilistic graphs. IEEE Trans Knowl Data Eng 25(2):325–336CrossRef
11.
go back to reference Ailon N, Charikar M, Newman A (2005) Aggregating Inconsistent Information: Ranking and Clustering. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp 684–693 Ailon N, Charikar M, Newman A (2005) Aggregating Inconsistent Information: Ranking and Clustering. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp 684–693
12.
go back to reference Halim Z, Waqas M, Hussain SF (2015) Clustering large probabilistic graphs using multi-population evolutionary algorithm. Inf Sci 317:78–95CrossRef Halim Z, Waqas M, Hussain SF (2015) Clustering large probabilistic graphs using multi-population evolutionary algorithm. Inf Sci 317:78–95CrossRef
13.
go back to reference Gu Y, Gao C, Cong G et al (2014) Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Trans Knowl Data Eng 26(5):1117–1130CrossRef Gu Y, Gao C, Cong G et al (2014) Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Trans Knowl Data Eng 26(5):1117–1130CrossRef
14.
go back to reference Ceccarello M, Fantozzi C, Pietracaprina A et al (2017) Clustering uncertain graphs. Proc VLDB Endowment 11(4):472–544CrossRef Ceccarello M, Fantozzi C, Pietracaprina A et al (2017) Clustering uncertain graphs. Proc VLDB Endowment 11(4):472–544CrossRef
15.
go back to reference Halim Z, Khattak JH (2019) Density-based clustering of big probabilistic graphs. Evolv Syst 10(3):333–350CrossRef Halim Z, Khattak JH (2019) Density-based clustering of big probabilistic graphs. Evolv Syst 10(3):333–350CrossRef
16.
go back to reference Qiu YX, Li RH, Li J, Qiao S et al (2018) Efficient structural clustering on probabilistic graphs. IEEE Trans Knowl Data Eng 31(10):1954–1968CrossRef Qiu YX, Li RH, Li J, Qiao S et al (2018) Efficient structural clustering on probabilistic graphs. IEEE Trans Knowl Data Eng 31(10):1954–1968CrossRef
17.
go back to reference Iam-On N, Boongoen T (2015) Comparative study of matrix refinement approaches for ensemble clustering. Mach Learn 98(1–2):269–300MathSciNetCrossRef Iam-On N, Boongoen T (2015) Comparative study of matrix refinement approaches for ensemble clustering. Mach Learn 98(1–2):269–300MathSciNetCrossRef
18.
go back to reference Huang D, Wang CD, Wu JS et al (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE TKDE 32(6):1212–1226 Huang D, Wang CD, Wu JS et al (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE TKDE 32(6):1212–1226
19.
go back to reference Iam-On N, Boongoen T, Garrett S et al (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409CrossRef Iam-On N, Boongoen T, Garrett S et al (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409CrossRef
20.
go back to reference Yi J, Yang T, Jin R et al (2012) Robust ensemble clustering by matrix completion. In: Proceedings of IEEE International Conference on Data Mining (ICDM) Yi J, Yang T, Jin R et al (2012) Robust ensemble clustering by matrix completion. In: Proceedings of IEEE International Conference on Data Mining (ICDM)
21.
go back to reference Fred AN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850CrossRef Fred AN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850CrossRef
22.
go back to reference Lourenço A, Bulò SR, Rebagliati N et al (2015) Probabilistic consensus clustering using evidence accumulation. Mach Learn 98(1–2):331–357MathSciNetCrossRef Lourenço A, Bulò SR, Rebagliati N et al (2015) Probabilistic consensus clustering using evidence accumulation. Mach Learn 98(1–2):331–357MathSciNetCrossRef
23.
go back to reference Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of International Conference on Machine Learning (ICML) Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of International Conference on Machine Learning (ICML)
24.
go back to reference Huang D, Lai JH, Wang CD (2016) Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng 28(5):1312–1326CrossRef Huang D, Lai JH, Wang CD (2016) Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng 28(5):1312–1326CrossRef
25.
go back to reference Huang D, Wang CD, Lai JH (2018) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473CrossRef Huang D, Wang CD, Lai JH (2018) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473CrossRef
26.
go back to reference Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRef Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRef
27.
go back to reference Huang D, Lai J, Wang CD (2016) Ensemble clustering using factor graph. Pattern Recogn 50:131–142CrossRef Huang D, Lai J, Wang CD (2016) Ensemble clustering using factor graph. Pattern Recogn 50:131–142CrossRef
28.
go back to reference Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn 47(2):833–842CrossRef Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn 47(2):833–842CrossRef
29.
go back to reference Weiszfeld E, Plastria F (2009) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167(1):7–41MathSciNetCrossRef Weiszfeld E, Plastria F (2009) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167(1):7–41MathSciNetCrossRef
30.
go back to reference Benjelloun O, Sarma AD, Halevy A et al (2006) ULDBs: databases with uncertainty and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp 953–964 Benjelloun O, Sarma AD, Halevy A et al (2006) ULDBs: databases with uncertainty and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp 953–964
31.
go back to reference Dalvi NN, Suciu D (2004) Efficient Query Evaluation on Probabilistic Databases. In: Proceedings of the 30th International Conference on Very Large Databases, Toronto, Canada. Dalvi NN, Suciu D (2004) Efficient Query Evaluation on Probabilistic Databases. In: Proceedings of the 30th International Conference on Very Large Databases, Toronto, Canada.
32.
go back to reference Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):1–30CrossRef Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):1–30CrossRef
33.
go back to reference Han K, Gui F, Xiao X et al (2019) Efficient and effective algorithms for clustering uncertain graphs. Proc VLDB Endow 12(6):667–680CrossRef Han K, Gui F, Xiao X et al (2019) Efficient and effective algorithms for clustering uncertain graphs. Proc VLDB Endow 12(6):667–680CrossRef
34.
35.
go back to reference Bian W, Tao D (2011) Learning a distance metric by empirical loss minimization. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp 1186–1191, Barcelona, Spain Bian W, Tao D (2011) Learning a distance metric by empirical loss minimization. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp 1186–1191, Barcelona, Spain
36.
go back to reference Luo Y, Wen Y, Tao D (2016) On combining side information and unlabeled data for heterogeneous multi-task metric learning. In: The 25th International Joint Conference on Artificial Intelligence, pp 1809–1815, New York Luo Y, Wen Y, Tao D (2016) On combining side information and unlabeled data for heterogeneous multi-task metric learning. In: The 25th International Joint Conference on Artificial Intelligence, pp 1809–1815, New York
37.
go back to reference Xiang S, Nie F, Zhang C (2008) Learning a mahalanobis distance metric for data clustering and classification. Pattern Recogn 41(12):3600–3612CrossRef Xiang S, Nie F, Zhang C (2008) Learning a mahalanobis distance metric for data clustering and classification. Pattern Recogn 41(12):3600–3612CrossRef
38.
go back to reference Xing EP, Ng AY, Jordan MI et al (2003) Distance metric learning with application to clustering with side-information. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, pp 521–528. Cambridge Xing EP, Ng AY, Jordan MI et al (2003) Distance metric learning with application to clustering with side-information. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, pp 521–528. Cambridge
39.
go back to reference Law MT, Yu Y, Cord M et al (2016) Closed-form training of mahalanobis distance for supervised clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3909–3917, Las Vegas, NV Law MT, Yu Y, Cord M et al (2016) Closed-form training of mahalanobis distance for supervised clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3909–3917, Las Vegas, NV
40.
go back to reference McFee B, Lanckriet GR (2010) Metric learning to rank. In: Proceedings of the 27th International Conference on Machine Learning, pp 775–782, Haifa, Israel. McFee B, Lanckriet GR (2010) Metric learning to rank. In: Proceedings of the 27th International Conference on Machine Learning, pp 775–782, Haifa, Israel.
41.
go back to reference Mahalanobis PC (1936) On the generalized distance in statistics. In: Proceedings of the National Institute of Science, Calcutta, India Mahalanobis PC (1936) On the generalized distance in statistics. In: Proceedings of the National Institute of Science, Calcutta, India
42.
go back to reference Bellet A, Habrard A, Sebban M (2015) Metric learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San RafaelMATH Bellet A, Habrard A, Sebban M (2015) Metric learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San RafaelMATH
44.
go back to reference Krogan NJ et al (2006) Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440(7084):637–643CrossRef Krogan NJ et al (2006) Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440(7084):637–643CrossRef
45.
go back to reference Wu X, Ma T, Cao J et al (2018) A comparative study of clustering ensemble algorithms. Comput & Electr Eng 68:603–615CrossRef Wu X, Ma T, Cao J et al (2018) A comparative study of clustering ensemble algorithms. Comput & Electr Eng 68:603–615CrossRef
46.
go back to reference Leutbecher M (2018) Ensemble size: how suboptimal is less than infinity? Q J R Meteorol Soc 145:107–128CrossRef Leutbecher M (2018) Ensemble size: how suboptimal is less than infinity? Q J R Meteorol Soc 145:107–128CrossRef
47.
go back to reference Buizza R, Palmer TN (1998) Impact of ensemble size on ensemble prediction. Mon Weather Rev 126:2503–2518CrossRef Buizza R, Palmer TN (1998) Impact of ensemble size on ensemble prediction. Mon Weather Rev 126:2503–2518CrossRef
Metadata
Title
Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning
Authors
Malihe Danesh
Morteza Dorrigiv
Farzin Yaghmaee
Publication date
14-09-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 4/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03429-1

Other articles of this Issue 4/2021

The Journal of Supercomputing 4/2021 Go to the issue

Premium Partner