Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 8/2023

01.03.2023 | Original Article

A co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors

verfasst von: Yanlu Gong, Quanwang Wu, Dongdong Cheng

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 8/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As an effective semi-supervised learning algorithm, the co-training method trains two classifiers on two views independently. The unlabeled sample selection strategy in the self-labeled process is crucial for co-training. However, most of the existing strategies strongly depend on parameter settings and require re-calculating the confidence of unlabeled samples in each iteration. Inspired by the concept of natural neighbors introduced recently, a co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors (CT-NaN) is proposed in this paper. In CT-NaN, the confidence value of unlabeled samples is calculated in a parameter-free manner by analyzing the training data based on natural neighbors before the iteration of co-training, and it requires to be calculated only once in the whole process of co-training. Besides, CT-NaN is able to mitigate the negative effect of outliers because the training stops automatically when only outliers remain. Four groups of experiments with 22 data sets are conducted, and the results verify the effectiveness of CT-NaN when compared with 8 state-of-the-art co-training methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Wang X, Lin X, Dang X (2020) Supervised learning in spiking neural networks: a review of algorithms and evaluations. Neural Netw 125:258–280CrossRef Wang X, Lin X, Dang X (2020) Supervised learning in spiking neural networks: a review of algorithms and evaluations. Neural Netw 125:258–280CrossRef
2.
Zurück zum Zitat Wang Y, Ye H, Zhang T, Zhang H (2019) A data mining method based on unsupervised learning and spatiotemporal analysis for sheath current monitoring. Neurocomputing 352:54–63CrossRef Wang Y, Ye H, Zhang T, Zhang H (2019) A data mining method based on unsupervised learning and spatiotemporal analysis for sheath current monitoring. Neurocomputing 352:54–63CrossRef
3.
Zurück zum Zitat Patwary MJ, Wang X-Z (2019) Sensitivity analysis on initial classifier accuracy in fuzziness based semi-supervised learning. Inf Sci 490:93–112CrossRef Patwary MJ, Wang X-Z (2019) Sensitivity analysis on initial classifier accuracy in fuzziness based semi-supervised learning. Inf Sci 490:93–112CrossRef
4.
Zurück zum Zitat Zhang X-Y, Shi H, Zhu X, Li P (2019) Active semi-supervised learning based on self-expressive correlation with generative adversarial networks. Neurocomputing 345:103–113CrossRef Zhang X-Y, Shi H, Zhu X, Li P (2019) Active semi-supervised learning based on self-expressive correlation with generative adversarial networks. Neurocomputing 345:103–113CrossRef
5.
Zurück zum Zitat Gu X (2020) A self-training hierarchical prototype-based approach for semi-supervised classification. Inf Sci 535:204–224MathSciNetCrossRef Gu X (2020) A self-training hierarchical prototype-based approach for semi-supervised classification. Inf Sci 535:204–224MathSciNetCrossRef
6.
Zurück zum Zitat Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541CrossRef Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541CrossRef
7.
Zurück zum Zitat Duan J, Luo B, Zeng J (2020) Semi-supervised learning with generative model for sentiment classification of stock messages. Expert Syst Appl 158:113540CrossRef Duan J, Luo B, Zeng J (2020) Semi-supervised learning with generative model for sentiment classification of stock messages. Expert Syst Appl 158:113540CrossRef
8.
Zurück zum Zitat Dong A, Chung F-L, Deng Z, Wang S (2015) Semi-supervised SVM with extended hidden features. IEEE Trans Cybern 46:2924–2937CrossRef Dong A, Chung F-L, Deng Z, Wang S (2015) Semi-supervised SVM with extended hidden features. IEEE Trans Cybern 46:2924–2937CrossRef
9.
Zurück zum Zitat Dornaika F, El Traboulsi Y (2019) Joint sparse graph and flexible embedding for graph-based semi-supervised learning. Neural Netw 114:91–95CrossRef Dornaika F, El Traboulsi Y (2019) Joint sparse graph and flexible embedding for graph-based semi-supervised learning. Neural Netw 114:91–95CrossRef
10.
Zurück zum Zitat Triguero I, García S, Herrera F (2014) SEG-SSC: A framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans Cybern 45:622–634CrossRef Triguero I, García S, Herrera F (2014) SEG-SSC: A framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans Cybern 45:622–634CrossRef
11.
Zurück zum Zitat Xu X, Li W, Xu D, Tsang IW (2015) Co-labeling for multi-view weakly labeled learning. IEEE Trans Pattern Anal Mach Intell 38:1113–1125CrossRef Xu X, Li W, Xu D, Tsang IW (2015) Co-labeling for multi-view weakly labeled learning. IEEE Trans Pattern Anal Mach Intell 38:1113–1125CrossRef
12.
Zurück zum Zitat Peng J, Estrada G, Pedersoli M, Desrosiers C (2020) Deep co-training for semi-supervised image segmentation. Pattern Recogn 107:107269CrossRef Peng J, Estrada G, Pedersoli M, Desrosiers C (2020) Deep co-training for semi-supervised image segmentation. Pattern Recogn 107:107269CrossRef
13.
Zurück zum Zitat Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp 92–100 Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp 92–100
14.
Zurück zum Zitat Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298CrossRef Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298CrossRef
15.
Zurück zum Zitat Wu D, Shang M, Luo X, Xu J, Yan H, Deng W, Wang G (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275:180–191CrossRef Wu D, Shang M, Luo X, Xu J, Yan H, Deng W, Wang G (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275:180–191CrossRef
16.
Zurück zum Zitat Gong Y, Lu J (2019) Co-training method combined with semi-supervised clustering and weighted K-nearest neighbor. Comput Eng Appl 55:114–118 Gong Y, Lu J (2019) Co-training method combined with semi-supervised clustering and weighted K-nearest neighbor. Comput Eng Appl 55:114–118
17.
Zurück zum Zitat Gong Y, Lu J (2019) Co-training method combined active learning and density peaks clustering. Comput Appl 39:2297–2301 Gong Y, Lu J (2019) Co-training method combined active learning and density peaks clustering. Comput Appl 39:2297–2301
18.
Zurück zum Zitat Lu J, Gong Y (2021) A co-training method based on entropy and multi-criteria. Appl Intell 51:3212–3225CrossRef Lu J, Gong Y (2021) A co-training method based on entropy and multi-criteria. Appl Intell 51:3212–3225CrossRef
19.
Zurück zum Zitat Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp 86–93 Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp 86–93
20.
Zurück zum Zitat Zhang M-L, Zhou Z-H (2011) CoTrade: confident co-training with data editing. IEEE Trans Syst Man Cybern Part B (Cybern) 41:1612–1626CrossRef Zhang M-L, Zhou Z-H (2011) CoTrade: confident co-training with data editing. IEEE Trans Syst Man Cybern Part B (Cybern) 41:1612–1626CrossRef
21.
Zurück zum Zitat Zhang Y, Wen J, Wang X, Jiang Z (2014) Semi-supervised learning combining co-training with active learning. Expert Syst Appl 41:2372–2378CrossRef Zhang Y, Wen J, Wang X, Jiang Z (2014) Semi-supervised learning combining co-training with active learning. Expert Syst Appl 41:2372–2378CrossRef
22.
Zurück zum Zitat Azad PV, Yaslan Y (2017) Using co-training to empower active learning. In: 2017 25th Signal Processing and Communications Applications Conference (SIU), IEEE, pp 1–4 Azad PV, Yaslan Y (2017) Using co-training to empower active learning. In: 2017 25th Signal Processing and Communications Applications Conference (SIU), IEEE, pp 1–4
23.
Zurück zum Zitat Liu Z, Gao Z, Li X (2018) Co-training method based on margin sample addition. Chin J Sci Instrum 39:45–53 Liu Z, Gao Z, Li X (2018) Co-training method based on margin sample addition. Chin J Sci Instrum 39:45–53
24.
Zurück zum Zitat Ma F, Meng D, Xie Q, Li Z, Dong X (2017) Self-paced co-training. In: International Conference on Machine Learning, PMLR, pp 2275–2284 Ma F, Meng D, Xie Q, Li Z, Dong X (2017) Self-paced co-training. In: International Conference on Machine Learning, PMLR, pp 2275–2284
25.
Zurück zum Zitat Du J, Ling CX, Zhou Z-H (2010) When does cotraining work in real data? IEEE Trans Knowl Data Eng 23:788–799CrossRef Du J, Ling CX, Zhou Z-H (2010) When does cotraining work in real data? IEEE Trans Knowl Data Eng 23:788–799CrossRef
26.
Zurück zum Zitat Chen M, Weinberger KQ, Chen Y (2011) Automatic feature decomposition for single view co-training. In: ICML Chen M, Weinberger KQ, Chen Y (2011) Automatic feature decomposition for single view co-training. In: ICML
27.
Zurück zum Zitat Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. ICML, Citeseer, pp 327–334 Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. ICML, Citeseer, pp 327–334
28.
Zurück zum Zitat Zhou Z-H, Li M (2005) Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17:1529–1541CrossRef Zhou Z-H, Li M (2005) Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17:1529–1541CrossRef
29.
Zurück zum Zitat Wang W, Zhou Z-H (2010) A new analysis of co-training. In: ICML Wang W, Zhou Z-H (2010) A new analysis of co-training. In: ICML
30.
Zurück zum Zitat Gao C, Zhou J, Miao D, Wen J, Yue X (2021) Three-way decision with co-training for partially labeled data. Inf Sci 544:500–518MathSciNetCrossRefMATH Gao C, Zhou J, Miao D, Wen J, Yue X (2021) Three-way decision with co-training for partially labeled data. Inf Sci 544:500–518MathSciNetCrossRefMATH
31.
Zurück zum Zitat Han T, Xie W, Zisserman A (2020) Self-supervised co-training for video representation learning. Adv Neural Inf Process Syst 33:5679–5690 Han T, Xie W, Zisserman A (2020) Self-supervised co-training for video representation learning. Adv Neural Inf Process Syst 33:5679–5690
32.
Zurück zum Zitat Zhan W, Zhang M-L (2017) Inductive semi-supervised multi-label learning with co-training. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1305–1314 Zhan W, Zhang M-L (2017) Inductive semi-supervised multi-label learning with co-training. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1305–1314
33.
Zurück zum Zitat Xing Y, Yu G, Domeniconi C, Wang J, Zhang Z (2018) Multi-label co-training. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence; 2018, pp 2882–2888. Xing Y, Yu G, Domeniconi C, Wang J, Zhang Z (2018) Multi-label co-training. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence; 2018, pp 2882–2888.
34.
Zurück zum Zitat Xu C, Zhao W, Zhao J, Guan Z, Song X, Li J (2022) Uncertainty-aware multi-view deep learning for internet of things applications. In: IEEE Transactions on Industrial Informatics, pp 1–12 Xu C, Zhao W, Zhao J, Guan Z, Song X, Li J (2022) Uncertainty-aware multi-view deep learning for internet of things applications. In: IEEE Transactions on Industrial Informatics, pp 1–12
35.
Zurück zum Zitat Yin X, Shu T, Huang Q (2012) Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl-Based Syst 35:304–311CrossRef Yin X, Shu T, Huang Q (2012) Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl-Based Syst 35:304–311CrossRef
36.
Zurück zum Zitat Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496CrossRef Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496CrossRef
37.
Zurück zum Zitat Hou J, Pelillo M (2016) A new density kernel in density peak based clustering. In: 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, pp 468–473 Hou J, Pelillo M (2016) A new density kernel in density peak based clustering. In: 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, pp 468–473
38.
Zurück zum Zitat Ding J, He X, Yuan J, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22:2777–2796CrossRef Ding J, He X, Yuan J, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22:2777–2796CrossRef
39.
Zurück zum Zitat Ma F, Meng D, Dong X, Yang Y (2020) Self-paced multi-view co-training. J Mach Learn Res 21:1–38MathSciNetMATH Ma F, Meng D, Dong X, Yang Y (2020) Self-paced multi-view co-training. J Mach Learn Res 21:1–38MathSciNetMATH
40.
Zurück zum Zitat Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recogn Lett 140:172–178CrossRef Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recogn Lett 140:172–178CrossRef
41.
Zurück zum Zitat Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter K. Pattern Recogn Lett 80:30–36CrossRef Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter K. Pattern Recogn Lett 80:30–36CrossRef
42.
Zurück zum Zitat Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A novel cluster validity index based on local cores. IEEE Trans Neural Netw Learn Syst 30:985–999CrossRef Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A novel cluster validity index based on local cores. IEEE Trans Neural Netw Learn Syst 30:985–999CrossRef
43.
Zurück zum Zitat Cheng D, Zhu Q, Huang J, Wu Q, Lijun Y (2019) Clustering with local density peaks-based minimum spanning tree. In: IEEE Transactions on Knowledge and Data Engineering Cheng D, Zhu Q, Huang J, Wu Q, Lijun Y (2019) Clustering with local density peaks-based minimum spanning tree. In: IEEE Transactions on Knowledge and Data Engineering
44.
Zurück zum Zitat Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77CrossRef Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77CrossRef
45.
Zurück zum Zitat Wahid A, Annavarapu CSR (2021) NaNOD: a natural neighbour-based outlier detection algorithm. Neural Comput Appl 33:2107–2123CrossRef Wahid A, Annavarapu CSR (2021) NaNOD: a natural neighbour-based outlier detection algorithm. Neural Comput Appl 33:2107–2123CrossRef
46.
Zurück zum Zitat Yousef A, Charkari NM (2015) SFM: a novel sequence-based fusion method for disease genes identification and prioritization. J Theor Biol 383:12–19MathSciNetCrossRef Yousef A, Charkari NM (2015) SFM: a novel sequence-based fusion method for disease genes identification and prioritization. J Theor Biol 383:12–19MathSciNetCrossRef
47.
Zurück zum Zitat Nikdelfaz O, Jalili S (2018) Disease genes prediction by HMM based PU-learning using gene expression profiles. J Biomed Inform 81:102–111CrossRef Nikdelfaz O, Jalili S (2018) Disease genes prediction by HMM based PU-learning using gene expression profiles. J Biomed Inform 81:102–111CrossRef
48.
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
Metadaten
Titel
A co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors
verfasst von
Yanlu Gong
Quanwang Wu
Dongdong Cheng
Publikationsdatum
01.03.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 8/2023
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-023-01805-w

Weitere Artikel der Ausgabe 8/2023

International Journal of Machine Learning and Cybernetics 8/2023 Zur Ausgabe

Neuer Inhalt