Skip to main content
Top

13-08-2024

Combining Semi-supervised Clustering and Classification Under a Generalized Framework

Authors: Zhen Jiang, Lingyun Zhao, Yu Lu

Published in: Journal of Classification

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most machine learning algorithms rely on having a sufficient amount of labeled data to train a reliable classifier. However, labeling data is often costly and time-consuming, while unlabeled data can be readily accessible. Therefore, learning from both labeled and unlabeled data has become a hot topic of interest. Inspired by the co-training algorithm, we present a learning framework called CSCC, which combines semi-supervised clustering and classification to learn from both labeled and unlabeled data. Unlike existing co-training style methods that construct diverse classifiers to learn from each other, CSCC leverages the diversity between semi-supervised clustering and classification models to achieve mutual enhancement. Existing classification algorithms can be easily adapted to CSCC, allowing them to generalize from a few labeled data. Especially, in order to bridge the gap between class information and clustering, we propose a semi-supervised hierarchical clustering algorithm that utilizes labeled data to guide the process of cluster-splitting. Within the CSCC framework, we introduce two loss functions to supervise the iterative updating of the semi-supervised clustering and classification models, respectively. Extensive experiments conducted on a variety of benchmark datasets validate the superiority of CSCC over other state-of-the-art methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Basu, S., Banerjee, A., Mooney, A. & Raymond, J. (2002). Semi-supervised clustering by seeding. In Proceedings of the nineteenth international conference on machine learning (pp. 27–34). Morgan Kaufmann Publishers Inc. https://doi.org/10.5555/645531.656012 Basu, S., Banerjee, A., Mooney, A. & Raymond, J. (2002). Semi-supervised clustering by seeding. In Proceedings of the nineteenth international conference on machine learning (pp. 27–34). Morgan Kaufmann Publishers Inc. https://​doi.​org/​10.​5555/​645531.​656012
go back to reference Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on computational learning theory (pp. 92–100). Association for Computing Machinery. https://doi.org/10.1145/279943.279962 Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on computational learning theory (pp. 92–100). Association for Computing Machinery. https://​doi.​org/​10.​1145/​279943.​279962
go back to reference Chen, M., Du, Y., Zhang, Y., Qian, S., & Wang, C. (2022). Semi-supervised learning with multi-head co-training. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36(6), pp. 6278–6286). Chen, M., Du, Y., Zhang, Y., Qian, S., & Wang, C. (2022). Semi-supervised learning with multi-head co-training. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36(6), pp. 6278–6286).
go back to reference Dong-DongChen, W., & WeiGao, Z. (2018). Tri-net for semi-supervised deep learning. In Proceedings of twenty-seventh international joint conference on artificial intelligence (pp. 2014–2020). Dong-DongChen, W., & WeiGao, Z. (2018). Tri-net for semi-supervised deep learning. In Proceedings of twenty-seventh international joint conference on artificial intelligence (pp. 2014–2020).
go back to reference Gallego, A.-J., Calvo-Zaragoza, J., Valero-Mas, J. J., & Rico-Juan, J. R. (2018). Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation. Pattern Recognition, 74, 531–543.CrossRef Gallego, A.-J., Calvo-Zaragoza, J., Valero-Mas, J. J., & Rico-Juan, J. R. (2018). Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation. Pattern Recognition, 74, 531–543.CrossRef
go back to reference Gan, H., Sang, N., Huang, R., Tong, X., & Dan, Z. (2013). Using clustering analysis to improve semi-supervised classification. Neurocomputing, 101, 290–298.CrossRef Gan, H., Sang, N., Huang, R., Tong, X., & Dan, Z. (2013). Using clustering analysis to improve semi-supervised classification. Neurocomputing, 101, 290–298.CrossRef
go back to reference Gan, H., Huang, R., Luo, Z., Xi, X., & Gao, Y. (2018). On using supervised clustering analysis to improve classification performance. Information Sciences, 454, 216–228.MathSciNetCrossRef Gan, H., Huang, R., Luo, Z., Xi, X., & Gao, Y. (2018). On using supervised clustering analysis to improve classification performance. Information Sciences, 454, 216–228.MathSciNetCrossRef
go back to reference Gertrudes, J. C., Zimek, A., Sander, J., & Campello, R. J. G. B. (2018). A unified framework of density-based clustering for semi-supervised classification. In Proceedings of the 30th international conference on scientific and statistical database management. Association for Computing Machinery. https://doi.org/10.1145/3221269.3223037 Gertrudes, J. C., Zimek, A., Sander, J., & Campello, R. J. G. B. (2018). A unified framework of density-based clustering for semi-supervised classification. In Proceedings of the 30th international conference on scientific and statistical database management. Association for Computing Machinery. https://​doi.​org/​10.​1145/​3221269.​3223037
go back to reference Goldman, S., & Zhou, Y. (2000). Enhancing supervised learning with unlabeled data. In Proceedings of the seventeenth international conference on machine learning (pp. 327–334). Goldman, S., & Zhou, Y. (2000). Enhancing supervised learning with unlabeled data. In Proceedings of the seventeenth international conference on machine learning (pp. 327–334).
go back to reference Gong, M., Zhou, H., Qin, A. K., Liu, W., & Zhao, Z. (2022). Self-paced co-training of graph neural networks for semi-supervised node classification. IEEE Transactions on Neural Networks and Learning Systems, 34(11), 9234–9247.CrossRef Gong, M., Zhou, H., Qin, A. K., Liu, W., & Zhao, Z. (2022). Self-paced co-training of graph neural networks for semi-supervised node classification. IEEE Transactions on Neural Networks and Learning Systems, 34(11), 9234–9247.CrossRef
go back to reference Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., … Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 31). Curran Associates, Inc. Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., … Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 31). Curran Associates, Inc.
go back to reference Huang, Q., Gao, R., & Akhavan, H. (2023). An ensemble hierarchical clustering algorithm based on merits at cluster and partition levels. Pattern Recognition, 136, 109255.CrossRef Huang, Q., Gao, R., & Akhavan, H. (2023). An ensemble hierarchical clustering algorithm based on merits at cluster and partition levels. Pattern Recognition, 136, 109255.CrossRef
go back to reference Jia, H., Zhu, D., Huang, L., Mao, Q., Wang, L., & Song, H. (2023). Global and local structure preserving nonnegative subspace clustering. Pattern Recognition, 138, 109388.CrossRef Jia, H., Zhu, D., Huang, L., Mao, Q., Wang, L., & Song, H. (2023). Global and local structure preserving nonnegative subspace clustering. Pattern Recognition, 138, 109388.CrossRef
go back to reference Jiang, Z., Zhang, S., & Zeng, J. (2013). A hybrid generative/discriminative method for semi-supervised classification. Knowledge-Based Systems, 37, 137–145.CrossRef Jiang, Z., Zhang, S., & Zeng, J. (2013). A hybrid generative/discriminative method for semi-supervised classification. Knowledge-Based Systems, 37, 137–145.CrossRef
go back to reference Jiang, Z., Zhan, Y., Mao, Q., & Du, Y. (2022). Semi-supervised clustering under a “compact-cluster” assumption. IEEE Transactions on Knowledge and Data Engineering, 35(5), 5244–5256. Jiang, Z., Zhan, Y., Mao, Q., & Du, Y. (2022). Semi-supervised clustering under a “compact-cluster” assumption. IEEE Transactions on Knowledge and Data Engineering, 35(5), 5244–5256.
go back to reference Jiang, Z., Zhao, L., Lu, Y., Zhan, Y., & Mao, Q. (2023a). A semi-supervised resampling method for class-imbalanced learning. Expert Systems with Applications, 221, 119733.CrossRef Jiang, Z., Zhao, L., Lu, Y., Zhan, Y., & Mao, Q. (2023a). A semi-supervised resampling method for class-imbalanced learning. Expert Systems with Applications, 221, 119733.CrossRef
go back to reference Jiang, Z., Zhao, L., & Zhan, Y. (2023b). A boosted co-training method for class-imbalanced learning. Expert Systems, 40(9), e13377.CrossRef Jiang, Z., Zhao, L., & Zhan, Y. (2023b). A boosted co-training method for class-imbalanced learning. Expert Systems, 40(9), e13377.CrossRef
go back to reference Lin, W.-C., Tsai, C.-F., Hu, Y.-H., & Jhang, J.-S. (2017). Clustering-based undersampling in class-imbalanced data. Information Sciences, 409, 17–26.CrossRef Lin, W.-C., Tsai, C.-F., Hu, Y.-H., & Jhang, J.-S. (2017). Clustering-based undersampling in class-imbalanced data. Information Sciences, 409, 17–26.CrossRef
go back to reference Liu, H., Tao, Z., & Fu, Y. (2017). Partition level constrained clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(10), 2469–2483.CrossRef Liu, H., Tao, Z., & Fu, Y. (2017). Partition level constrained clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(10), 2469–2483.CrossRef
go back to reference Ma, F., Meng, D., Dong, X., & Yang, Y. (2020). Self-paced multi-view co-training. Journal of Machine Learning Research, 21(57), 1–38.MathSciNet Ma, F., Meng, D., Dong, X., & Yang, Y. (2020). Self-paced multi-view co-training. Journal of Machine Learning Research, 21(57), 1–38.MathSciNet
go back to reference Jan, Md., & Z., & Verma, B. (2019). Evolutionary classifier and cluster selection approach for ensemble classification. ACM Transactions on Knowledge Discovery from Data (TKDD), 14(1), 1–18.CrossRef Jan, Md., & Z., & Verma, B. (2019). Evolutionary classifier and cluster selection approach for ensemble classification. ACM Transactions on Knowledge Discovery from Data (TKDD), 14(1), 1–18.CrossRef
go back to reference Melnykov, I., & Melnykov, V. (2020). A note on the formal implementation of the K-means algorithm with hard positive and negative constraints. Journal of Classification, 37(3), 789–809.MathSciNetCrossRef Melnykov, I., & Melnykov, V. (2020). A note on the formal implementation of the K-means algorithm with hard positive and negative constraints. Journal of Classification, 37(3), 789–809.MathSciNetCrossRef
go back to reference Piroonsup, N., & Sinthupinyo, S. (2018). Analysis of training data using clustering to improve semi-supervised self-training. Knowledge-Based Systems, 143, 65–80.CrossRef Piroonsup, N., & Sinthupinyo, S. (2018). Analysis of training data using clustering to improve semi-supervised self-training. Knowledge-Based Systems, 143, 65–80.CrossRef
go back to reference Pratt, J. W. (1959). Remarks on zeros and ties in the Wilcoxon signed rank procedures. Journal of the American Statistical Association, 54(287), 655–667.MathSciNetCrossRef Pratt, J. W. (1959). Remarks on zeros and ties in the Wilcoxon signed rank procedures. Journal of the American Statistical Association, 54(287), 655–667.MathSciNetCrossRef
go back to reference Rashmi, M., & Sankaran, P. (2019). Optimal landmark point selection using clustering for manifold modeling and data classification. Journal of Classification, 36(1), 94–112.MathSciNetCrossRef Rashmi, M., & Sankaran, P. (2019). Optimal landmark point selection using clustering for manifold modeling and data classification. Journal of Classification, 36(1), 94–112.MathSciNetCrossRef
go back to reference Raskutti, B., Ferrá, H., & Kowalczyk, A. (2002). Combining clustering and co-training to enhance text classification using unlabelled data. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 620–625). Association for Computing Machinery. Raskutti, B., Ferrá, H., & Kowalczyk, A. (2002). Combining clustering and co-training to enhance text classification using unlabelled data. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 620–625). Association for Computing Machinery.
go back to reference Sachdeva, R., Cordeiro, F. R., Belagiannis, V., Reid, I., & Carneiro, G. (2023). ScanMix: Learning from severe label noise via semantic clustering and semi-supervised learning. Pattern Recognition, 134, 109121.CrossRef Sachdeva, R., Cordeiro, F. R., Belagiannis, V., Reid, I., & Carneiro, G. (2023). ScanMix: Learning from severe label noise via semantic clustering and semi-supervised learning. Pattern Recognition, 134, 109121.CrossRef
go back to reference Sindhwani, V., & Rosenberg, D. S. (2008). An RKHS for multi-view learning and manifold co-regularization. In Proceedings of the 25th international conference on machine learning (pp. 976–983). Association for Computing Machinery. https://doi.org/10.1145/1390156.1390279 Sindhwani, V., & Rosenberg, D. S. (2008). An RKHS for multi-view learning and manifold co-regularization. In Proceedings of the 25th international conference on machine learning (pp. 976–983). Association for Computing Machinery. https://​doi.​org/​10.​1145/​1390156.​1390279
go back to reference Song, Q., Ni, J., & Wang, G. (2011). A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering, 25(1), 1–14.CrossRef Song, Q., Ni, J., & Wang, G. (2011). A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering, 25(1), 1–14.CrossRef
go back to reference Van Engelen, J. E., & Hoos, H. H. (2020). A survey on semi-supervised learning. Machine Learning, 109(2), 373–440.MathSciNetCrossRef Van Engelen, J. E., & Hoos, H. H. (2020). A survey on semi-supervised learning. Machine Learning, 109(2), 373–440.MathSciNetCrossRef
go back to reference Verma, B., & Rahman, A. (2011). Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning. IEEE Transactions on Knowledge and Data Engineering, 24(4), 605–618.CrossRef Verma, B., & Rahman, A. (2011). Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning. IEEE Transactions on Knowledge and Data Engineering, 24(4), 605–618.CrossRef
go back to reference Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al. (2001). Constrained k-means clustering with background knowledge. In Proceedings of the eighteenth international conference on machine learning (Vol. 1, pp. 577–584). Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al. (2001). Constrained k-means clustering with background knowledge. In Proceedings of the eighteenth international conference on machine learning  (Vol. 1, pp. 577–584).
go back to reference Wu, J., Liu, H., Xiong, H., Cao, J., & Chen, J. (2014). K-means-based consensus clustering: A unified view. IEEE Transactions on Knowledge and Data Engineering, 27(1), 155–169.CrossRef Wu, J., Liu, H., Xiong, H., Cao, J., & Chen, J. (2014). K-means-based consensus clustering: A unified view. IEEE Transactions on Knowledge and Data Engineering, 27(1), 155–169.CrossRef
go back to reference Xue, H., Chen, S., & Yang, Q. (2009). Discriminatively regularized least-squares classification. Pattern Recognition, 42(1), 93–104.CrossRef Xue, H., Chen, S., & Yang, Q. (2009). Discriminatively regularized least-squares classification. Pattern Recognition, 42(1), 93–104.CrossRef
go back to reference Ye, H.-J., Zhan, D.-C., Miao, Y., Jiang, Y., & Zhou, Z.-H. (2015). Rank consistency based multi-view learning: A privacy-preserving approach. In Proceedings of the 24th ACM international on conference on Information and knowledge management (pp. 991–1000). Association for Computing Machinery. Ye, H.-J., Zhan, D.-C., Miao, Y., Jiang, Y., & Zhou, Z.-H. (2015). Rank consistency based multi-view learning: A privacy-preserving approach. In Proceedings of the 24th ACM international on conference on Information and knowledge management (pp. 991–1000). Association for Computing Machinery.
go back to reference Yu, Z., Luo, P., Liu, J., Wong, H.-S., You, J., Han, G., & Zhang, J. (2018). Semi-supervised ensemble clustering based on selected constraint projection. IEEE Transactions on Knowledge and Data Engineering, 30(12), 2394–2407.CrossRef Yu, Z., Luo, P., Liu, J., Wong, H.-S., You, J., Han, G., & Zhang, J. (2018). Semi-supervised ensemble clustering based on selected constraint projection. IEEE Transactions on Knowledge and Data Engineering, 30(12), 2394–2407.CrossRef
go back to reference Zeng, S., Tong, X., Sang, N., & Huang, R. (2013). A study on semi-supervised FCM algorithm. Knowledge and Information Systems, 35, 585–612.CrossRef Zeng, S., Tong, X., Sang, N., & Huang, R. (2013). A study on semi-supervised FCM algorithm. Knowledge and Information Systems, 35, 585–612.CrossRef
go back to reference Zhang, M.-L., & Zhou, Z.-H. (2011). CoTrade: Confident co-training with data editing. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(6), 1612–1626.CrossRef Zhang, M.-L., & Zhou, Z.-H. (2011). CoTrade: Confident co-training with data editing. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(6), 1612–1626.CrossRef
go back to reference Zhou, Z.-H., & Li, M. (2005). Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 17(11), 1529–1541.CrossRef Zhou, Z.-H., & Li, M. (2005). Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 17(11), 1529–1541.CrossRef
Metadata
Title
Combining Semi-supervised Clustering and Classification Under a Generalized Framework
Authors
Zhen Jiang
Lingyun Zhao
Yu Lu
Publication date
13-08-2024
Publisher
Springer US
Published in
Journal of Classification
Print ISSN: 0176-4268
Electronic ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-024-09489-9

Premium Partner