Skip to main content
Erschienen in: The Journal of Supercomputing 4/2021

14.09.2020

Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

verfasst von: Malihe Danesh, Morteza Dorrigiv, Farzin Yaghmaee

Erschienen in: The Journal of Supercomputing | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Graphs are commonly used to express the communication of various data. Faced with uncertain data, we have probabilistic graphs. As a fundamental problem of such graphs, clustering has many applications in analyzing uncertain data. In this paper, we propose a novel method based on ensemble clustering for large probabilistic graphs. To generate ensemble clusters, we develop a set of probable possible worlds of the initial probabilistic graph. Then, we present a probabilistic co-association matrix as a consensus function to integrate base clustering results. It relies on co-occurrences of node pairs based on the probability of the corresponding common cluster graphs. Also, we apply two improvements in the steps before and after of ensembles generation. In the before step, we append neighborhood information based on node features to the initial graph to achieve a more accurate estimation of the probability between the nodes. In the after step, we use supervised metric learning-based Mahalanobis distance to automatically learn a metric from ensemble clusters. It aims to gain crucial features of the base clustering results. We evaluate our work using five real-world datasets and three clustering evaluation metrics, namely the Dunn index, Davies–Bouldin index, and Silhouette coefficient. The results show the impressive performance of clustering large probabilistic graphs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zou Z, Li J, Gao H et al (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Trans Knowl Data Eng 22:1203–1218CrossRef Zou Z, Li J, Gao H et al (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Trans Knowl Data Eng 22:1203–1218CrossRef
2.
Zurück zum Zitat Papapetrou O, Ioannou E, Skoutas D (2011) Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: EDBT/ICDT’11, pp 355–366 Papapetrou O, Ioannou E, Skoutas D (2011) Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: EDBT/ICDT’11, pp 355–366
3.
Zurück zum Zitat Potamias M, Bonchi F, Gionis A et al (2010) k-nearest neighbors in uncertain graphs. Proc VLDB Endow 3(1):997–1008CrossRef Potamias M, Bonchi F, Gionis A et al (2010) k-nearest neighbors in uncertain graphs. Proc VLDB Endow 3(1):997–1008CrossRef
4.
Zurück zum Zitat Strehl A, Ghosh J (2003) Cluster ensembles—A knowledge reuse framework for combining partitions. J Mach Learn Res 3:583–617MathSciNetMATH Strehl A, Ghosh J (2003) Cluster ensembles—A knowledge reuse framework for combining partitions. J Mach Learn Res 3:583–617MathSciNetMATH
5.
Zurück zum Zitat Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881CrossRef Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881CrossRef
6.
Zurück zum Zitat Li F, Qian Y, Wang J et al (2019) Clustering ensemble based on sample’s stability. Artif Intell 273:37–55MathSciNetCrossRef Li F, Qian Y, Wang J et al (2019) Clustering ensemble based on sample’s stability. Artif Intell 273:37–55MathSciNetCrossRef
7.
Zurück zum Zitat Boongoen T, Iam-On N (2018) Cluster ensembles: a survey of approaches with recent extensions and applications. Comput Sci Rev 28:1–25MathSciNetCrossRef Boongoen T, Iam-On N (2018) Cluster ensembles: a survey of approaches with recent extensions and applications. Comput Sci Rev 28:1–25MathSciNetCrossRef
8.
Zurück zum Zitat Alqurashi T, Wang W (2019) Clustering ensemble method. Int J Mach Learn Cyb 10:1227–1246CrossRef Alqurashi T, Wang W (2019) Clustering ensemble method. Int J Mach Learn Cyb 10:1227–1246CrossRef
9.
Zurück zum Zitat Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recogn Artif Intell 25(03):337–372MathSciNetCrossRef Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recogn Artif Intell 25(03):337–372MathSciNetCrossRef
10.
Zurück zum Zitat Kollios G, Potamias M, Terzi E (2013) Clustering large probabilistic graphs. IEEE Trans Knowl Data Eng 25(2):325–336CrossRef Kollios G, Potamias M, Terzi E (2013) Clustering large probabilistic graphs. IEEE Trans Knowl Data Eng 25(2):325–336CrossRef
11.
Zurück zum Zitat Ailon N, Charikar M, Newman A (2005) Aggregating Inconsistent Information: Ranking and Clustering. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp 684–693 Ailon N, Charikar M, Newman A (2005) Aggregating Inconsistent Information: Ranking and Clustering. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp 684–693
12.
Zurück zum Zitat Halim Z, Waqas M, Hussain SF (2015) Clustering large probabilistic graphs using multi-population evolutionary algorithm. Inf Sci 317:78–95CrossRef Halim Z, Waqas M, Hussain SF (2015) Clustering large probabilistic graphs using multi-population evolutionary algorithm. Inf Sci 317:78–95CrossRef
13.
Zurück zum Zitat Gu Y, Gao C, Cong G et al (2014) Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Trans Knowl Data Eng 26(5):1117–1130CrossRef Gu Y, Gao C, Cong G et al (2014) Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Trans Knowl Data Eng 26(5):1117–1130CrossRef
14.
Zurück zum Zitat Ceccarello M, Fantozzi C, Pietracaprina A et al (2017) Clustering uncertain graphs. Proc VLDB Endowment 11(4):472–544CrossRef Ceccarello M, Fantozzi C, Pietracaprina A et al (2017) Clustering uncertain graphs. Proc VLDB Endowment 11(4):472–544CrossRef
15.
Zurück zum Zitat Halim Z, Khattak JH (2019) Density-based clustering of big probabilistic graphs. Evolv Syst 10(3):333–350CrossRef Halim Z, Khattak JH (2019) Density-based clustering of big probabilistic graphs. Evolv Syst 10(3):333–350CrossRef
16.
Zurück zum Zitat Qiu YX, Li RH, Li J, Qiao S et al (2018) Efficient structural clustering on probabilistic graphs. IEEE Trans Knowl Data Eng 31(10):1954–1968CrossRef Qiu YX, Li RH, Li J, Qiao S et al (2018) Efficient structural clustering on probabilistic graphs. IEEE Trans Knowl Data Eng 31(10):1954–1968CrossRef
17.
Zurück zum Zitat Iam-On N, Boongoen T (2015) Comparative study of matrix refinement approaches for ensemble clustering. Mach Learn 98(1–2):269–300MathSciNetCrossRef Iam-On N, Boongoen T (2015) Comparative study of matrix refinement approaches for ensemble clustering. Mach Learn 98(1–2):269–300MathSciNetCrossRef
18.
Zurück zum Zitat Huang D, Wang CD, Wu JS et al (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE TKDE 32(6):1212–1226 Huang D, Wang CD, Wu JS et al (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE TKDE 32(6):1212–1226
19.
Zurück zum Zitat Iam-On N, Boongoen T, Garrett S et al (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409CrossRef Iam-On N, Boongoen T, Garrett S et al (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409CrossRef
20.
Zurück zum Zitat Yi J, Yang T, Jin R et al (2012) Robust ensemble clustering by matrix completion. In: Proceedings of IEEE International Conference on Data Mining (ICDM) Yi J, Yang T, Jin R et al (2012) Robust ensemble clustering by matrix completion. In: Proceedings of IEEE International Conference on Data Mining (ICDM)
21.
Zurück zum Zitat Fred AN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850CrossRef Fred AN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850CrossRef
22.
Zurück zum Zitat Lourenço A, Bulò SR, Rebagliati N et al (2015) Probabilistic consensus clustering using evidence accumulation. Mach Learn 98(1–2):331–357MathSciNetCrossRef Lourenço A, Bulò SR, Rebagliati N et al (2015) Probabilistic consensus clustering using evidence accumulation. Mach Learn 98(1–2):331–357MathSciNetCrossRef
23.
Zurück zum Zitat Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of International Conference on Machine Learning (ICML) Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of International Conference on Machine Learning (ICML)
24.
Zurück zum Zitat Huang D, Lai JH, Wang CD (2016) Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng 28(5):1312–1326CrossRef Huang D, Lai JH, Wang CD (2016) Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng 28(5):1312–1326CrossRef
25.
Zurück zum Zitat Huang D, Wang CD, Lai JH (2018) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473CrossRef Huang D, Wang CD, Lai JH (2018) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473CrossRef
26.
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRef Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRef
27.
Zurück zum Zitat Huang D, Lai J, Wang CD (2016) Ensemble clustering using factor graph. Pattern Recogn 50:131–142CrossRef Huang D, Lai J, Wang CD (2016) Ensemble clustering using factor graph. Pattern Recogn 50:131–142CrossRef
28.
Zurück zum Zitat Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn 47(2):833–842CrossRef Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn 47(2):833–842CrossRef
29.
Zurück zum Zitat Weiszfeld E, Plastria F (2009) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167(1):7–41MathSciNetCrossRef Weiszfeld E, Plastria F (2009) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167(1):7–41MathSciNetCrossRef
30.
Zurück zum Zitat Benjelloun O, Sarma AD, Halevy A et al (2006) ULDBs: databases with uncertainty and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp 953–964 Benjelloun O, Sarma AD, Halevy A et al (2006) ULDBs: databases with uncertainty and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp 953–964
31.
Zurück zum Zitat Dalvi NN, Suciu D (2004) Efficient Query Evaluation on Probabilistic Databases. In: Proceedings of the 30th International Conference on Very Large Databases, Toronto, Canada. Dalvi NN, Suciu D (2004) Efficient Query Evaluation on Probabilistic Databases. In: Proceedings of the 30th International Conference on Very Large Databases, Toronto, Canada.
32.
Zurück zum Zitat Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):1–30CrossRef Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):1–30CrossRef
33.
Zurück zum Zitat Han K, Gui F, Xiao X et al (2019) Efficient and effective algorithms for clustering uncertain graphs. Proc VLDB Endow 12(6):667–680CrossRef Han K, Gui F, Xiao X et al (2019) Efficient and effective algorithms for clustering uncertain graphs. Proc VLDB Endow 12(6):667–680CrossRef
34.
Zurück zum Zitat Shamir R, Sharan R, Tsur D (2004) Cluster graph modification problems. Discrete Appl Math 144(1–2):173–182MathSciNetCrossRef Shamir R, Sharan R, Tsur D (2004) Cluster graph modification problems. Discrete Appl Math 144(1–2):173–182MathSciNetCrossRef
35.
Zurück zum Zitat Bian W, Tao D (2011) Learning a distance metric by empirical loss minimization. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp 1186–1191, Barcelona, Spain Bian W, Tao D (2011) Learning a distance metric by empirical loss minimization. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp 1186–1191, Barcelona, Spain
36.
Zurück zum Zitat Luo Y, Wen Y, Tao D (2016) On combining side information and unlabeled data for heterogeneous multi-task metric learning. In: The 25th International Joint Conference on Artificial Intelligence, pp 1809–1815, New York Luo Y, Wen Y, Tao D (2016) On combining side information and unlabeled data for heterogeneous multi-task metric learning. In: The 25th International Joint Conference on Artificial Intelligence, pp 1809–1815, New York
37.
Zurück zum Zitat Xiang S, Nie F, Zhang C (2008) Learning a mahalanobis distance metric for data clustering and classification. Pattern Recogn 41(12):3600–3612CrossRef Xiang S, Nie F, Zhang C (2008) Learning a mahalanobis distance metric for data clustering and classification. Pattern Recogn 41(12):3600–3612CrossRef
38.
Zurück zum Zitat Xing EP, Ng AY, Jordan MI et al (2003) Distance metric learning with application to clustering with side-information. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, pp 521–528. Cambridge Xing EP, Ng AY, Jordan MI et al (2003) Distance metric learning with application to clustering with side-information. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, pp 521–528. Cambridge
39.
Zurück zum Zitat Law MT, Yu Y, Cord M et al (2016) Closed-form training of mahalanobis distance for supervised clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3909–3917, Las Vegas, NV Law MT, Yu Y, Cord M et al (2016) Closed-form training of mahalanobis distance for supervised clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3909–3917, Las Vegas, NV
40.
Zurück zum Zitat McFee B, Lanckriet GR (2010) Metric learning to rank. In: Proceedings of the 27th International Conference on Machine Learning, pp 775–782, Haifa, Israel. McFee B, Lanckriet GR (2010) Metric learning to rank. In: Proceedings of the 27th International Conference on Machine Learning, pp 775–782, Haifa, Israel.
41.
Zurück zum Zitat Mahalanobis PC (1936) On the generalized distance in statistics. In: Proceedings of the National Institute of Science, Calcutta, India Mahalanobis PC (1936) On the generalized distance in statistics. In: Proceedings of the National Institute of Science, Calcutta, India
42.
Zurück zum Zitat Bellet A, Habrard A, Sebban M (2015) Metric learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San RafaelMATH Bellet A, Habrard A, Sebban M (2015) Metric learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San RafaelMATH
44.
Zurück zum Zitat Krogan NJ et al (2006) Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440(7084):637–643CrossRef Krogan NJ et al (2006) Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440(7084):637–643CrossRef
45.
Zurück zum Zitat Wu X, Ma T, Cao J et al (2018) A comparative study of clustering ensemble algorithms. Comput & Electr Eng 68:603–615CrossRef Wu X, Ma T, Cao J et al (2018) A comparative study of clustering ensemble algorithms. Comput & Electr Eng 68:603–615CrossRef
46.
Zurück zum Zitat Leutbecher M (2018) Ensemble size: how suboptimal is less than infinity? Q J R Meteorol Soc 145:107–128CrossRef Leutbecher M (2018) Ensemble size: how suboptimal is less than infinity? Q J R Meteorol Soc 145:107–128CrossRef
47.
Zurück zum Zitat Buizza R, Palmer TN (1998) Impact of ensemble size on ensemble prediction. Mon Weather Rev 126:2503–2518CrossRef Buizza R, Palmer TN (1998) Impact of ensemble size on ensemble prediction. Mon Weather Rev 126:2503–2518CrossRef
Metadaten
Titel
Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning
verfasst von
Malihe Danesh
Morteza Dorrigiv
Farzin Yaghmaee
Publikationsdatum
14.09.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 4/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03429-1

Weitere Artikel der Ausgabe 4/2021

The Journal of Supercomputing 4/2021 Zur Ausgabe