Two-stage semi-supervised clustering ensemble framework based on constraint weight | springerprofessional.de Skip to main content
Top

Hint

Swipe to navigate through the articles of this issue

Published in: International Journal of Machine Learning and Cybernetics 2/2023

01-11-2022 | Original Article

Two-stage semi-supervised clustering ensemble framework based on constraint weight

Authors: Ding Zhang, Youlong Yang, Haiquan Qiu

Published in: International Journal of Machine Learning and Cybernetics | Issue 2/2023

Login to get access
share
SHARE

Abstract

Semi-supervised clustering ensemble introduces partial supervised information, usually pairwise constraints, to achieve better performance than clustering ensemble. Although it has been successful in many aspects, there are still several limitations that need to be further improved. Firstly, supervised information is only utilized in ensemble generation, but not in the consensus process. Secondly, all clustering solutions participate in getting a final partition without considering redundancy among clustering solutions. Thirdly, each cluster in the same clustering solution is treated equally, which neglects the influence of different clusters to the final clustering result. To address these issues, we propose a two-stage semi-supervised clustering ensemble framework which considers both ensemble member selection and the weighting of clusters. Especially, we define the weight of each pairwise constraint to assist ensemble members selection and the weighting of clusters. In the first stage, a subset of clustering solutions is obtained based on the quality and diversity of clustering solutions in consideration of supervised information. In the second stage, the quality of each cluster is determined by the consistency of unsupervised and supervised information. For the unsupervised information consistency of a cluster, we consider evaluating it by the consistency of a cluster relative to all clustering solutions. For the supervised information consistency of a cluster, it depends on how satisfied a cluster is with the supervised information. In the end, the final partition is achieved by a weighted co-association matrix as consensus function. Experimental results on various datasets show that the proposed framework outperforms most of state-of-the-art clustering algorithms.

To get access to this content you need the following product:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 69.000 Bücher
  • über 500 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 50.000 Bücher
  • über 380 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe



 


Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 58.000 Bücher
  • über 300 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko





Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Show more products
Literature
1.
go back to reference Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666 CrossRef Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666 CrossRef
2.
go back to reference Ding Y, Zhao Y, Shen X, Musuvathi M, Mytkowicz T (2015) Yinyang k-means: a drop-in replacement of the classic k-means with consistent speedup. In International conference on machine learning, pp 579-587 Ding Y, Zhao Y, Shen X, Musuvathi M, Mytkowicz T (2015) Yinyang k-means: a drop-in replacement of the classic k-means with consistent speedup. In International conference on machine learning, pp 579-587
3.
go back to reference Zhang Z, Liu L, Shen F, Shen H, Shao L (2018) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782 CrossRef Zhang Z, Liu L, Shen F, Shen H, Shao L (2018) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782 CrossRef
4.
go back to reference Liu X, Li M, Tang C, Xia J, Xiong J, Liu L, Zhu E (2020) Efficient and effective regularized incomplete multi-view clustering. IEEE Trans Pattern Anal Mach Intell 43(8):2634–2646 Liu X, Li M, Tang C, Xia J, Xiong J, Liu L, Zhu E (2020) Efficient and effective regularized incomplete multi-view clustering. IEEE Trans Pattern Anal Mach Intell 43(8):2634–2646
5.
go back to reference Xia S, Peng D, Meng D, Zhang C, Wang G, Giem E, Chen Z (2020) A fast adaptive k-means with no bounds. IEEE Trans Pattern Anal Mach Intell Xia S, Peng D, Meng D, Zhang C, Wang G, Giem E, Chen Z (2020) A fast adaptive k-means with no bounds. IEEE Trans Pattern Anal Mach Intell
6.
go back to reference Zhou J, Zheng H, Pan L (2019) Ensemble clustering based on dense representation. Neurocomputing 357:66–76 CrossRef Zhou J, Zheng H, Pan L (2019) Ensemble clustering based on dense representation. Neurocomputing 357:66–76 CrossRef
7.
go back to reference Li F, Qian Y, Wang J, Dang C, Jing L (2019) Clustering ensemble based on sample’s stability. Artif Intell 273:37–55 MATHCrossRef Li F, Qian Y, Wang J, Dang C, Jing L (2019) Clustering ensemble based on sample’s stability. Artif Intell 273:37–55 MATHCrossRef
8.
go back to reference Yu L, Cao F, Zhao X, Yang X, Liang J (2020) Combining attribute content and label information for categorical data ensemble clustering. Appl Math Comput 381:125280 MATH Yu L, Cao F, Zhao X, Yang X, Liang J (2020) Combining attribute content and label information for categorical data ensemble clustering. Appl Math Comput 381:125280 MATH
10.
go back to reference Yu Z, Chen H, You J, Wong HS, Liu J, Han G (2014) Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles. IEEE/ACM Trans Comput Biol Bioinf 11(4):727–740 CrossRef Yu Z, Chen H, You J, Wong HS, Liu J, Han G (2014) Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles. IEEE/ACM Trans Comput Biol Bioinf 11(4):727–740 CrossRef
11.
go back to reference Yang F, Li T, Zhou Q, Xiao H (2017) Cluster ensemble selection with constraints. Neurocomputing 235:59–70 CrossRef Yang F, Li T, Zhou Q, Xiao H (2017) Cluster ensemble selection with constraints. Neurocomputing 235:59–70 CrossRef
12.
go back to reference Xiao W, Yang Y, Wang H, Li T, Xing H (2016) Semi-supervised hierarchical clustering ensemble and its application. Neurocomputing 173:1362–1376 CrossRef Xiao W, Yang Y, Wang H, Li T, Xing H (2016) Semi-supervised hierarchical clustering ensemble and its application. Neurocomputing 173:1362–1376 CrossRef
13.
go back to reference Topchy A, Jain AK, Punch W (2003) Combining multiple weak clusterings. In: Third IEEE international conference on data mining, pp 331–338 Topchy A, Jain AK, Punch W (2003) Combining multiple weak clusterings. In: Third IEEE international conference on data mining, pp 331–338
14.
go back to reference Fred AL, Jain AK (2002) Data clustering using evidence accumulation. In: Object recognition supported by user interaction for service robots 4, pp 276–280 Fred AL, Jain AK (2002) Data clustering using evidence accumulation. In: Object recognition supported by user interaction for service robots 4, pp 276–280
15.
go back to reference Yu Z, Luo P, You J, Wong HS, Leung H, Wu S, Han G (2015) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714 CrossRef Yu Z, Luo P, You J, Wong HS, Leung H, Wu S, Han G (2015) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714 CrossRef
16.
go back to reference Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning, pp 186–193 Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning, pp 186–193
17.
go back to reference Fred AL, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850 CrossRef Fred AL, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850 CrossRef
18.
go back to reference Iam-On N, Boongoen T, Garrett S, Price C (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409 CrossRef Iam-On N, Boongoen T, Garrett S, Price C (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409 CrossRef
19.
go back to reference Liu H, Wu J, Liu T, Tao D, Fu Y (2017) Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Trans Knowl Data Eng 29(5):1129–1143 CrossRef Liu H, Wu J, Liu T, Tao D, Fu Y (2017) Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Trans Knowl Data Eng 29(5):1129–1143 CrossRef
20.
go back to reference Huang D, Wang C-D, Lai J-H (2017) Locally weighted ensemble clustering. IEEE Trans Cybernet 48(5):1460–1473 CrossRef Huang D, Wang C-D, Lai J-H (2017) Locally weighted ensemble clustering. IEEE Trans Cybernet 48(5):1460–1473 CrossRef
21.
go back to reference Bai L, Liang J, Du H, Guo Y (2018) An information-theoretical framework for cluster ensemble. IEEE Trans Knowl Data Eng 31(8):1464–1477 Bai L, Liang J, Du H, Guo Y (2018) An information-theoretical framework for cluster ensemble. IEEE Trans Knowl Data Eng 31(8):1464–1477
22.
go back to reference Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(12):583–617 MATH Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(12):583–617 MATH
23.
go back to reference Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the twenty-first international conference on machine learning, p 36 Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the twenty-first international conference on machine learning, p 36
24.
go back to reference Huang D, Lai JH, Wang CD (2015) Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng 28(5):1312–1326 CrossRef Huang D, Lai JH, Wang CD (2015) Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng 28(5):1312–1326 CrossRef
25.
go back to reference Křvánek M, Morávek J (1986) Np-hard problems in hierarchical-tree clustering. Acta Inform 23(3):311–323 MATHCrossRef Křvánek M, Morávek J (1986) Np-hard problems in hierarchical-tree clustering. Acta Inform 23(3):311–323 MATHCrossRef
26.
go back to reference Li T, Ding C, Jordan MI (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Seventh IEEE international conference on data mining, pp 577–582 Li T, Ding C, Jordan MI (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Seventh IEEE international conference on data mining, pp 577–582
27.
go back to reference Vega-Pons S, Correa-Morris J, Ruiz-Shulcloper J (2010) Weighted partition consensus via kernels. Pattern Recogn 43(8):2712–2724 MATHCrossRef Vega-Pons S, Correa-Morris J, Ruiz-Shulcloper J (2010) Weighted partition consensus via kernels. Pattern Recogn 43(8):2712–2724 MATHCrossRef
28.
go back to reference Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn 47(2):833–842 MATHCrossRef Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn 47(2):833–842 MATHCrossRef
29.
go back to reference Yu Z, Li L, Gao Y, You J, Liu J, Wong HS, Han G (2014) Hybrid clustering solution selection strategy. Pattern Recogn 47(10):3362–3375 CrossRef Yu Z, Li L, Gao Y, You J, Liu J, Wong HS, Han G (2014) Hybrid clustering solution selection strategy. Pattern Recogn 47(10):3362–3375 CrossRef
30.
go back to reference Jia J, Xiao X, Liu B, Jiao L (2011) Bagging-based spectral clustering ensemble selection. Pattern Recogn Lett 32(10):1456–1467 CrossRef Jia J, Xiao X, Liu B, Jiao L (2011) Bagging-based spectral clustering ensemble selection. Pattern Recogn Lett 32(10):1456–1467 CrossRef
31.
go back to reference Ma T, Yu T, Wu X, Cao J, Al-Abdulkarim A, Al-Dhelaan A, Al-Dhelaan M (2020) Multiple clustering and selecting algorithms with combining strategy for selective clustering ensemble. Soft Comput 24(20):15129–15141 CrossRef Ma T, Yu T, Wu X, Cao J, Al-Abdulkarim A, Al-Dhelaan A, Al-Dhelaan M (2020) Multiple clustering and selecting algorithms with combining strategy for selective clustering ensemble. Soft Comput 24(20):15129–15141 CrossRef
32.
go back to reference Wagstaff K, Cardie C, Rogers S, Schrodl S (2001) Constrained k-means clustering with background knowledge. Icml 1:577–584 Wagstaff K, Cardie C, Rogers S, Schrodl S (2001) Constrained k-means clustering with background knowledge. Icml 1:577–584
33.
go back to reference Zeng H, Cheung YM (2011) Semi-supervised maximum margin clustering with pairwise constraints. IEEE Trans Knowl Data Eng 24(5):926–939 CrossRef Zeng H, Cheung YM (2011) Semi-supervised maximum margin clustering with pairwise constraints. IEEE Trans Knowl Data Eng 24(5):926–939 CrossRef
34.
go back to reference Anand S, Mittal S, Tuzel O, Meer P (2013) Semi-supervised kernel mean shift clustering. IEEE Trans Pattern Anal Mach Intell 36(6):1201–1215 CrossRef Anand S, Mittal S, Tuzel O, Meer P (2013) Semi-supervised kernel mean shift clustering. IEEE Trans Pattern Anal Mach Intell 36(6):1201–1215 CrossRef
35.
go back to reference Liu CL, Hsaio WH, Lee CH, Gou FS (2013) Semi-supervised linear discriminant clustering. IEEE Trans Cybernet 44(7):989–1000 CrossRef Liu CL, Hsaio WH, Lee CH, Gou FS (2013) Semi-supervised linear discriminant clustering. IEEE Trans Cybernet 44(7):989–1000 CrossRef
36.
go back to reference Lu Z, Peng Y (2013) Exhaustive and efficient constraint propagation: a graph-based learning approach and its applications. Int J Comput Vis 103(3):306–325 MATHCrossRef Lu Z, Peng Y (2013) Exhaustive and efficient constraint propagation: a graph-based learning approach and its applications. Int J Comput Vis 103(3):306–325 MATHCrossRef
37.
go back to reference Xiong S, Azimi J, Fern XZ (2013) Active learning of constraints for semi-supervised clustering. IEEE Trans Knowl Data Eng 26(1):43–54 CrossRef Xiong S, Azimi J, Fern XZ (2013) Active learning of constraints for semi-supervised clustering. IEEE Trans Knowl Data Eng 26(1):43–54 CrossRef
38.
go back to reference Zhang D, Chen S, Zhou ZH, Yang Q (2008) Constraint projections for ensemble learning. In AAAI, pp 758–763 Zhang D, Chen S, Zhou ZH, Yang Q (2008) Constraint projections for ensemble learning. In AAAI, pp 758–763
39.
go back to reference Yu Z, Kuang Z, Liu J, Chen H, Zhang J, You J, Han G (2017) Adaptive ensembling of semi-supervised clustering solutions. IEEE Trans Knowl Data Eng 29(8):1577–1590 CrossRef Yu Z, Kuang Z, Liu J, Chen H, Zhang J, You J, Han G (2017) Adaptive ensembling of semi-supervised clustering solutions. IEEE Trans Knowl Data Eng 29(8):1577–1590 CrossRef
40.
go back to reference Yu Z, Luo P, Liu J, Wong HS, You J, Han G, Zhang J (2018) Semi-supervised ensemble clustering based on selected constraint projection. IEEE Trans Knowl Data Eng 30(12):2394–2407 CrossRef Yu Z, Luo P, Liu J, Wong HS, You J, Han G, Zhang J (2018) Semi-supervised ensemble clustering based on selected constraint projection. IEEE Trans Knowl Data Eng 30(12):2394–2407 CrossRef
41.
go back to reference Lai Y, He S, Lin Z, Yang F, Zhou QF, Zhou X (2019) An adaptive robust semi-supervised clustering framework using weighted consensus of random k-means ensemble. IEEE Trans Knowl Data Eng Lai Y, He S, Lin Z, Yang F, Zhou QF, Zhou X (2019) An adaptive robust semi-supervised clustering framework using weighted consensus of random k-means ensemble. IEEE Trans Knowl Data Eng
42.
go back to reference Yang F, Li X, Li Q, Li T (2014) Exploring the diversity in cluster ensemble generation: random sampling and random projection. Expert Syst Appl 41(10):4844–4866 CrossRef Yang F, Li X, Li Q, Li T (2014) Exploring the diversity in cluster ensemble generation: random sampling and random projection. Expert Syst Appl 41(10):4844–4866 CrossRef
43.
go back to reference Li F, Qian Y, Wang J, Dang C, Liu B (2018) Cluster’s quality evaluation and selective clustering ensemble. ACM Trans Knowl Discov Data (TKDD) 12(5):1–27 Li F, Qian Y, Wang J, Dang C, Liu B (2018) Cluster’s quality evaluation and selective clustering ensemble. ACM Trans Knowl Discov Data (TKDD) 12(5):1–27
44.
go back to reference Law MH, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, Vol 2, pp II–II Law MH, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, Vol 2, pp II–II
45.
go back to reference Alizadeh H, Minaei-Bidgoli B, Parvin H (2014) Cluster ensemble selection based on a new cluster stability measure. Intell Data Anal 18(3):389–408 CrossRef Alizadeh H, Minaei-Bidgoli B, Parvin H (2014) Cluster ensemble selection based on a new cluster stability measure. Intell Data Anal 18(3):389–408 CrossRef
46.
go back to reference Asuncion A, Newman D (2007) UCI machine learning repository Asuncion A, Newman D (2007) UCI machine learning repository
47.
go back to reference Cai D, He X, Han J, Huang TS (2010) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560 Cai D, He X, Han J, Huang TS (2010) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560
48.
go back to reference Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF (2005) GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Informat 74(7–8):491–503 CrossRef Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF (2005) GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Informat 74(7–8):491–503 CrossRef
49.
go back to reference Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854 MATH Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854 MATH
50.
go back to reference Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850 CrossRef Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850 CrossRef
51.
go back to reference Wang H, Li T, Li T, Yang Y (2014) Constraint neighborhood projections for semi-supervised clustering. IEEE Trans Cybernet 44(5):636–643 CrossRef Wang H, Li T, Li T, Yang Y (2014) Constraint neighborhood projections for semi-supervised clustering. IEEE Trans Cybernet 44(5):636–643 CrossRef
52.
go back to reference Huang D, Wang CD, Wu JS, Lai JH, Kwoh CK (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng 32(6):1212–1226 CrossRef Huang D, Wang CD, Wu JS, Lai JH, Kwoh CK (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng 32(6):1212–1226 CrossRef
53.
go back to reference Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392 MATHCrossRef Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392 MATHCrossRef
54.
go back to reference Huang R, Lam W, Zhang Z (2007) Active learning of constraints for semi-supervised text clustering. In: Proceedings of the 2007 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 113–124 Huang R, Lam W, Zhang Z (2007) Active learning of constraints for semi-supervised text clustering. In: Proceedings of the 2007 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 113–124
55.
go back to reference Xiong C, Johnson DM, Corso JJ (2016) Active clustering with model-based uncertainty reduction. IEEE Trans Pattern Anal Mach Intell 39(1):5–17 CrossRef Xiong C, Johnson DM, Corso JJ (2016) Active clustering with model-based uncertainty reduction. IEEE Trans Pattern Anal Mach Intell 39(1):5–17 CrossRef
Metadata
Title
Two-stage semi-supervised clustering ensemble framework based on constraint weight
Authors
Ding Zhang
Youlong Yang
Haiquan Qiu
Publication date
01-11-2022
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 2/2023
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-022-01651-2

Other articles of this Issue 2/2023

International Journal of Machine Learning and Cybernetics 2/2023 Go to the issue