Top

International Journal of Machine Learning and Cybernetics

Published in:

27-06-2022 | Original Article

Merits of Bayesian networks in overcoming small data challenges: a meta-model for handling missing data

Authors: Hanen Ameur, Hasna Njah, Salma Jamoussi

Published in: International Journal of Machine Learning and Cybernetics | Issue 1/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The abundant availability of data in Big Data era has helped achieving significant advances in the machine learning field. However, many datasets appear with incompleteness from different perspectives such as values, labels, annotations and records. By discarding the records yielding ambiguousness, the exploitable data settles down to a small, sometimes ineffective, portion. Making the most of this small portion is burdensome because it usually yields overfitted models. In this paper we propose a new taxonomy for data missingness, in the machine learning context, along with a new metamodel to address the missing data problem within real and open data. Our proposed methodology relies on a H2S Kernel whose ultimate goal is the effective learning of a generalized Bayesian network from small input datasets. Our contributions are motivated by the strong probabilistic foundation of the Bayesian network, on the one hand, and on the ensemble learning effectiveness, on the other hand. The highlights of our kernel are the new strategy for multiple Bayesian network structure learning and the novel technique for the weighted fusion of Bayesian network structures. To harness on the richness of the merged network in terms of knowledge, we propose four H2S-derived systems to address the missing values/records impacts involving the annotation, the balancing, missing values imputation and data over-sampling. We combine these systems into a meta-model, and we perform a step-by-step experimental study. The obtained results showcase the efficiency of our contributions to deal with multi-class problems and with extremely small datasets.

previous article A novelty detection approach to effectively predict conversion from mild cognitive impairment to Alzheimer’s disease

next article Observer-based mixed and passive control for T-S fuzzy semi-Markovian jump systems with time-varying delay via sliding mode method

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

inform now

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

inform now

https://www.uml.org/.

Survey network is downloadable, from BNLEARN website, in different formats at: https://www.bnlearn.com/bnrepository/discrete-small.html#survey.

these datasets are available online via UCI Machine Learning repository https://archive.ics.uci.edu/ml/datasets.php.

Weka is available at: https://waikato.github.io/weka-wiki/downloading_weka.

The collective package is available online via: https://github.com/fracpete/collective-classification-weka-package.

The documentation of ROSE is found at: https://cran.r-project.org/web/packages/ROSE/ROSE.pdf.

Akeret J, Refregier A, Amara A, Seehars S, Hasner C (2015) Approximate Bayesian computation for forward modeling in cosmology. J Cosmol Astropart Phys 2015(08):043CrossRef

Ben-David S, Lu T, Pál D, Sotáková M (2009) Learning low density separators. In: van Dyk D, Welling M (eds) Proceedings of the twelth international conference on artificial intelligence and statistics. Proceedings of Machine Learning Research, PMLR, Florida, USA, pp 25–32

Boonchuay K, Sinapiromsaran K, Lursinsap C (2017) Decision tree induction based on minority entropy for the class imbalance problem. Pattern Anal Appl 20(3):769–782CrossRef

2 Carvalho AM (2009) Scoring functions for learning Bayesian networks. Inesc-id Tec. Rep 1

Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888–899CrossRef

Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM sigkdd International Conference on knowledge discovery and data mining. Association for Computing Machinery, USA, pp 785–794. https://doi.org/10.1145/2939672.2939785

Chen Z, Lin T, Xia X, Xu H, Ding S (2018) A synthetic neighborhood generation based ensemble learning for the imbalanced data classification. Appl Intell 48(8):2441–2457CrossRef

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mech Learn Res 12(ARTICLE):2493–2537MATH

Cooper GF (1990) The computational complexity of probabilistic inference using Bayesian belief networks. Artif Intell 42(2–3):393–405MATHCrossRef

10.

Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9(4):309–347MATHCrossRef

11.

Correia AHC, Cussens J, de Campos C (2020) On pruning for score-based Bayesian network structure learning. In: The 23rd international conference on artificial intelligence and statistics {AISTATS}, Proceedings of Machine Learning Research, vol 108. PMLR, pp 2709–2718

12.

Domingues I, Amorim JP, Abreu PH, Duarte H, Santos J (2018) Evaluation of oversampling data balancing techniques in the context of ordinal classification. In: 2018 International Joint Conference on neural networks (IJCNN). IEEE, Brazil, pp 1–8. https://doi.org/10.1109/IJCNN.2018.8489599

13.

Dópido I, Li J, Marpu PR, Plaza A, Dias JMB, Benediktsson JA (2013) Semisupervised self-learning for hyperspectral image classification. IEEE Trans Geosci Remote Sens 51(7):4032–4044CrossRef

14.

Džeroski S, Panov P, Ženko B (2009) Ensemble methods in machine learning. In: Encyclopedia of Complexity and Systems Science. Springer, New York, NY, pp 5317–5325. NY. https://doi.org/10.1007/978-0-387-30440-3_315

15.

Fawcett T (2004) Roc graphs: notes and practical considerations for researchers. Mach Learn 31(1):1–38

16.

Feng W, Huang W, Ren J (2018) Class imbalance ensemble learning based on the margin theory. Appl Sci 8(5):815CrossRef

17.

Fernández A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905MATHCrossRef

18.

François O, Leray P (2006) Learning the tree augmented Naive Bayes classifier from incomplete datasets. In: Third European workshop on probabilistic graphical models, 12–15 September, Prague, Czech Republic. Electronic Proceedings, pp 91–98

19.

Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2):131–163MATHCrossRef

20.

Gámez JA, Mateo JL, Puerta JM (2011) Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Min Knowl Disc 22(1):106–148MATHCrossRef

21.

Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB (eds) Advances in Intelligent Computing. International Conference on intelligent computing (ICIC). Springer, Berlin, Heidelberg, pp 878–887. https://doi.org/10.1007/11538059_91

22.

Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243MATHCrossRef

23.

Huang Y, Gao Y, Gan Y, Ye M (2021) A new financial data forecasting model using genetic algorithm and long short-term memory network. Neurocomputing 425:207–218CrossRef

24.

Imam N, Issac B, Jacob SM (2019) A semi-supervised learning approach for tackling twitter spam drift. Int J Comput Intell Appl 18(02):1950010CrossRef

25.

Imam T, Ting KM, Kamruzzaman J (2006) z-SVM: An SVM for improved classification of imbalanced data. In: Sattar A, Kang, Bh (eds) Advances in artificial intelligence, 19th Australian joint conference on artificial intelligence, Hobart, Australia. Springer, Berlin, Heidelberg, pp 264–273. https://doi.org/10.1007/11941439_30

26.

Janžura M, Nielsen J (2006) A simulated annealing-based method for learning Bayesian networks from statistical data. Int J Intell Syst 21(3):335–348MATHCrossRef

27.

Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402CrossRef

28.

Kim J, Tae D, Seok J (2020) A survey of missing data imputation using generative adversarial networks. In: 2020 International Conference on artificial intelligence in information and communication (ICAIIC). IEEE, Fukuoka, Japan, pp 454–456. https://doi.org/10.1109/ICAIIC48513.2020.9065044

29.

Kraaijeveld P, Druzdzel MJ, Onisko A, Wasyluk H (2005) Genierate: an interactive generator of diagnostic Bayesian network models. In: Proc. 16th Int. Workshop Principles Diagnosis. Citeseer, pp 175–180

30.

Kramer SC, Sorenson HW (1988) Bayesian parameter estimation. IEEE Trans Autom Control 33(2):217–222MATHCrossRef

31.

Lateh MA, Muda AK, Yusof ZIM, Muda NA, Azmi MS (2017) Handling a small dataset problem in prediction model by employ artificial data generation approach: a review. J Phys Conf Ser 892:012016CrossRef

32.

Li H, Jin G, Zhou J, Zb ZHOU, Dq LI (2008) Survey of Bayesian network inference algorithms. Syst Eng Eclectron 30(5):935–939

33.

Little RJ, Rubin DB (2019) Statistical analysis with missing data, vol 793. Wiley, HobokenMATH

34.

Liu F, Tian F, Zhu Q (2007) Bayesian network structure ensemble learning. In: Alhajj R, Gao H, Li J, Li X, Zaïane OR (eds) Advanced Data Mining and Applications. ADMA 2007 Springer, Berlin, Heidelberg, pp 454–465. https://doi.org/10.1007/978-3-540-73871-8_42

35.

Liu H, Wang J (2006) A new way to enumerate cycles in graph. In: Advanced International Conference on Telecommunications and International Conference on Internet and Web Applications and Services (AICT/ICIW). IEEE, Guadeloupe, French Caribbean, p 57. https://doi.org/10.1109/AICT-ICIW.2006.22

36.

Longadge R, Dongre S (2013) Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707

37.

Mack C, Su Z, Westreich D (2018) Managing missing data in patient registries: addendum to registries for evaluating patient outcomes: A user’s Gguide, Third Edition [Internet]. Agency for healthcare research and quality (US), Rockville (MD), Report No.: 17(18)-EHC015-EF

38.

Mallapragada PK, Jin R, Jain AK, Liu Y (2008) Semiboost: boosting for semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 31(11):2000–2014CrossRef

39.

Marlin B (2008) Missing data problems in machine learning. Ph.D. thesis

40.

Marqués AI, García V, Sánchez JS (2012) Exploring the behaviour of base classifiers in credit scoring ensembles. Expert Syst Appl 39(11):10244–10250CrossRef

41.

Martins MS, El Yafrani M, Delgado M, Lüders R, Santana R, Siqueira HV, Akcay HG, Ahiod B (2021) Analysis of Bayesian network learning techniques for a hybrid multi-objective Bayesian estimation of distribution algorithm: a case study on mnk landscape. J Heuristics 27(4):549–573CrossRef

42.

Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784

43.

Neapolitan RE, Jiang X (2010) Probabilistic methods for financial and marketing informatics. Elsevier, AmsterdamMATH

44.

Njah H, Jamoussi S (2015) Weighted ensemble learning of Bayesian network for gene regulatory networks. Neurocomputing 150:404–416CrossRef

45.

Paton K (1969) An algorithm for finding a fundamental set of cycles of a graph. Commun ACM 12(9):514–518MATHCrossRef

46.

Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San FranciscoMATH

47.

Pearl J (2014) Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier, AmsterdamMATH

48.

Pellet JP, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9(7):1295–1342. https://doi.org/10.5555/1390681.1442776CrossRefMATH

49.

Pérez-Miñana E (2016) Improving ecosystem services modelling: Insights from a Bayesian network tools review. Environ Model Softw 85:184–201CrossRef

50.

Qi GJ, Luo J (2020) Small data challenges in big data era: a survey of recent progress on unsupervised and semi-supervised methods. IEEE Trans Pattern Anal Mach Intell 44(4):2168–2187. https://doi.org/10.1109/TPAMI.2020.3031898CrossRef

51.

Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106CrossRef

52.

Ramanan N, Natarajan S (2020) Causal learning from predictive modeling for observational data. Front Big Data 3:34CrossRef

53.

Rancoita PM, Zaffalon M, Zucca E, Bertoni F, De Campos CP (2016) Bayesian network data imputation with application to survival tree analysis. Comput Stat Data Anal 93:373–387MATHCrossRef

54.

Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26(2):195–239MATHCrossRef

55.

Rekha G, Reddy VK, Tyagi AK, Nair MM (2020) Distance-based bootstrap sampling in bagging for imbalanced data-set. In: 2020 International Conference on emerging trends in information technology and engineering (ic-ETITE). IEEE, Vellore, India, pp 1–6. https://doi.org/10.1109/ic-ETITE47903.2020.345

56.

Rissanen J (1999) Hypothesis selection and testing by the mdl principle. Comput J 42(4):260–269MATHCrossRef

57.

Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models. In: 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05), pp 29–36. https://doi.org/10.1109/ACVMOT.2005.107

58.

Sagi O, Rokach L (2018) Ensemble learning: a survey. Wiley Interdiscipl Rev Data Min Knowl Discov 8(4):e1249

59.

Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10(3):e0118432CrossRef

60.

Sakamoto Y, Ishiguro M (1986) Akaike information criterion statistics, vol 81. D. Reidel, Dordrecht, p 26853 (10.5555)MATH

61.

Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464MATHCrossRef

62.

Scutari M (2009) Learning Bayesian networks with the bnlearn r package. arXiv preprint arXiv:0908.3817

63.

Scutari M, Lebre S (2013) Bayesian network constraint-based structure learning algorithms: parallel and optimised implementations in the bnlearn R Package. http://arxiv.org/abs/1406.7648

64.

Spirtes P, Glymour CN, Scheines R, Heckerman D (2000) Causation, prediction, and search. MIT Press, CambridgeMATH

65.

Tang Y, Wang Y, Cooper KM, Li L (2014) Towards big data Bayesian network learning-an ensemble learning based approach. In: 2014 IEEE International Congress on big data. IEEE, Anchorage, AK, USA, pp 355–357. https://doi.org/10.1109/BigData.Congress.2014.58

66.

Tanha J, van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370CrossRef

67.

Taud H, Mas JF (2018) Multilayer perceptron (mlp). In: Camacho Olmedo MT, Paegelow M, Mas JF, Escobar F (eds) Geomatic approaches for modeling land change scenarios. Springer, Cham, pp 451–455. https://doi.org/10.1007/978-3-319-60801-3_27

68.

Thomassen C (1985) Even cycles in directed graphs. Eur J Comb 6(1):85–89MATHCrossRef

69.

Tong Y, Tien I (2017) Algorithms for Bayesian network modeling, inference, and reliability assessment for multistate flow networks. J Comput Civ Eng 31(5):04017051CrossRef

70.

Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65(1):31–78MATHCrossRef

71.

Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440MATHCrossRef

72.

Vapnik V, Guyon I, Hastie T (1995) Support vector machines. Mach Learn 20(3):273–297CrossRef

73.

Vilardell M, Buxó M, Clèries R, Martínez JM, Garcia G, Ameijide A, Font R, Civit S, Marcos-Gragera R, Vilardell ML et al (2020) Missing data imputation and synthetic data simulation through modeling graphical probabilistic dependencies between variables (modgraprodep): an application to breast cancer survival. Artif Intell Med 107:101875CrossRef

74.

Xu L, Schuurmans D (2005) Unsupervised and semi-supervised multi-class support vector machines. In: AAAI, vol. 40, p. 50

75.

Yap BW, Abd Rani K, Abd Rahman HA, Fong S, Khairudin Z, Abdullah NN (2014) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Proceedings of the first international conference on advanced data and information engineering (DaEng-2013). Springer, pp 13–22

76.

Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Annual Meeting of the Association for computational linguistics, MIT, Cambridge, Massachusetts, USA, pp 189–196. https://doi.org/10.3115/981658.981684

77.

Yaslan Y, Cataltepe Z (2010) Co-training with relevant random subspaces. Neurocomputing 73(10–12):1652–1661CrossRef

78.

Yoon J, Jordon J, Schaar M (2018) Gain: Missing data imputation using generative adversarial nets. In: Proceedings of the 35th International Conference on Machine Learning (ICML). PMLR, Stockholm, Sweden, pp 5675–5684

79.

Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED (2002) Using Bayesian network inference algorithms to recover molecular genetic regulatory networks. In: International Conference on systems biology, vol 2002

80.

Yu S, Krishnapuram B, Rosales R, Rao RB (2011) Bayesian co-training. J Mach Learn Res 12:2649–2680MATH

81.

Zheng W, Jin M (2020) The effects of class imbalance and training data size on classifier learning: an empirical study. SN Comput Sci 1(2):1–13CrossRef

82.

Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems, vol 16. MIT Press

83.

Zhu X, Lafferty J (2005) Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. In: Proceedings of the 22nd International Conference on machine learning, pp 1052–1059. https://doi.org/10.1145/1102351.1102484

Title: Merits of Bayesian networks in overcoming small data challenges: a meta-model for handling missing data
Authors: Hanen Ameur
Hasna Njah
Salma Jamoussi
Publication date: 27-06-2022
Publisher: Springer Berlin Heidelberg
Published in: International Journal of Machine Learning and Cybernetics / Issue 1/2023
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-022-01577-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Other articles of this Issue 1/2023

Fractional mega trend diffusion function-based feature extraction for plant disease prediction

Observer-based mixed and passive control for T-S fuzzy semi-Markovian jump systems with time-varying delay via sliding mode method

Distance metric learning with local multiple kernel embedding

Knowledge transfer based hierarchical few-shot learning via tree-structured knowledge graph

Special issue on small data analytics

Solving large-scale global optimization problems and engineering design problems using a novel biogeography-based optimization with Lévy and Brownian movements