Skip to main content
Top

2021 | OriginalPaper | Chapter

BL_SMOTE Ensemble Method for Prediction of Thyroid Disease on Imbalanced Classification Problem

Authors : Rajshree Srivastava, Pardeep Kumar

Published in: Proceedings of Second International Conference on Computing, Communications, and Cyber-Security

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The imbalanced classification problem is one of the most challenging problems in various domains such as in machine learning and data mining. In this state of an imbalanced dataset, each class associated with a given dataset is distributed unevenly. This case arises when the positive class is smaller than the negative class. To overcome this problem, oversampling and undersampling techniques are used. Undersampling leads to the problem of information loss. In this paper, borderline_synthetic minority oversampling technique (BL_SMOTE) ensemble method is used for the prediction of thyroid disease to solve imbalanced classification problems using the oversampling technique. For the ensemble, we have used decision tree and random forest classifier. The proposed method for detection of the thyroid has achieved 98.88% accuracy, 99.12% specificity, 98.93% F-measure, and 98.66% sensitivity on thyroid UCI repository dataset. The proposed method is competitive to the other methods proposed in the literature for prediction of thyroid disease on an imbalanced classification problem.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Tahir MAUH, Asghar S, Manzoor A, Noor MA (2019) A classification model for class imbalance dataset using genetic programming. IEEE Access 7:71013–71037CrossRef Tahir MAUH, Asghar S, Manzoor A, Noor MA (2019) A classification model for class imbalance dataset using genetic programming. IEEE Access 7:71013–71037CrossRef
2.
go back to reference Abd Elrahman SM, Abraham A (2013) A review of class imbalance problem. J Netw Innov Comput 1:332–340 Abd Elrahman SM, Abraham A (2013) A review of class imbalance problem. J Netw Innov Comput 1:332–340
3.
go back to reference Awoyemi JO, Adetunmbi AO, Oluwadare SA (2017) Credit card fraud detection using machine learning techniques: a comparative analysis. In: 2017 international conference on computing networking and informatics (ICCNI), pp 1–9. IEEE (2017) Awoyemi JO, Adetunmbi AO, Oluwadare SA (2017) Credit card fraud detection using machine learning techniques: a comparative analysis. In: 2017 international conference on computing networking and informatics (ICCNI), pp 1–9. IEEE (2017)
4.
go back to reference He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley
5.
go back to reference He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284 He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284
6.
go back to reference Li DC, Liu CW, Hu SC (2010) A learning method for the class imbalance problem with medical data sets. Comput Biol Med 40:509–518CrossRef Li DC, Liu CW, Hu SC (2010) A learning method for the class imbalance problem with medical data sets. Comput Biol Med 40:509–518CrossRef
7.
go back to reference Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE: synthetic minority oversampling technique. J Artif Intell Res 16:321–357 Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE: synthetic minority oversampling technique. J Artif Intell Res 16:321–357
8.
go back to reference Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new oversampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887 Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new oversampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887
9.
go back to reference He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the international joint conference on neural networks, pp 1322–1328 He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the international joint conference on neural networks, pp 1322–1328
10.
go back to reference Lin W, Wu Z, Lin L, Wen A, Li J (2017) An ensemble random forest algorithm for insurance big data analysis. IEEE Access 5:16568–16575CrossRef Lin W, Wu Z, Lin L, Wen A, Li J (2017) An ensemble random forest algorithm for insurance big data analysis. IEEE Access 5:16568–16575CrossRef
11.
go back to reference Helmy T, Rasheed Z, Al-Mulhem M (2011) Adaptive fuzzy logic-based framework for handling imprecision and uncertainty in classification of bioinformatics datasets. Int J Comput Methods 8(3):513–534CrossRef Helmy T, Rasheed Z, Al-Mulhem M (2011) Adaptive fuzzy logic-based framework for handling imprecision and uncertainty in classification of bioinformatics datasets. Int J Comput Methods 8(3):513–534CrossRef
12.
go back to reference Kotsiantis SB (2011) Cascade generalization with reweighting data for handling imbalanced problems. Comput J 54:1547–1559CrossRef Kotsiantis SB (2011) Cascade generalization with reweighting data for handling imbalanced problems. Comput J 54:1547–1559CrossRef
13.
go back to reference Chamasemani FF, Singh YP (2011) Multi-class support vector machine (SVM) classifiers—an application in hypothyroid detection and classification. In: 6th international conference of bio-inspired computer theory and application, pp 351–356 Chamasemani FF, Singh YP (2011) Multi-class support vector machine (SVM) classifiers—an application in hypothyroid detection and classification. In: 6th international conference of bio-inspired computer theory and application, pp 351–356
14.
go back to reference Agrawal A, Viktor HL, Paquet E (2015) SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling. In: 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3K), vol 1, pp 226–234 Agrawal A, Viktor HL, Paquet E (2015) SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling. In: 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3K), vol 1, pp 226–234
15.
go back to reference Pan Q, Zhang Y, Zuo M, Xiang L, Chen D (2016) Improved ensemble classification method of thyroid disease based on random forest. In: 8th international conference on information technology in medicine and education (ITME), pp 567–571 Pan Q, Zhang Y, Zuo M, Xiang L, Chen D (2016) Improved ensemble classification method of thyroid disease based on random forest. In: 8th international conference on information technology in medicine and education (ITME), pp 567–571
16.
go back to reference Dash S, Das MN, Mishra BK (2016) Implementation of an optimized classification model for prediction of hypothyroid disease risks. In: International conference on invention computer technology (ICICT), vol 2, pp 4–7 Dash S, Das MN, Mishra BK (2016) Implementation of an optimized classification model for prediction of hypothyroid disease risks. In: International conference on invention computer technology (ICICT), vol 2, pp 4–7
17.
go back to reference Mustafa N, Memon RA, Li JP, Omer MZ (2017) A classification model for imbalanced medical data based on PCA and farther distance based synthetic minority oversampling technique. Int J Adv Comput Sci Appl 8:61–67 Mustafa N, Memon RA, Li JP, Omer MZ (2017) A classification model for imbalanced medical data based on PCA and farther distance based synthetic minority oversampling technique. Int J Adv Comput Sci Appl 8:61–67
18.
go back to reference Pasha SJ, Mohamed ES (2020) Ensemble gain ratio feature selection (EGFS) model with machine learning and data mining algorithms for disease risk prediction. In: International conference on inventive computation technologies (ICICT), pp 590–596. IEEE Pasha SJ, Mohamed ES (2020) Ensemble gain ratio feature selection (EGFS) model with machine learning and data mining algorithms for disease risk prediction. In: International conference on inventive computation technologies (ICICT), pp 590–596. IEEE
19.
go back to reference Rekha G, Reddy VK, Tyagi AK, Nair MM (2020) Distance-based Bootstrap sampling in bagging for imbalanced data-set. In: International conference on emerging trends in information technology and engineering (IC-ETITE), pp 1–6. IEEE Rekha G, Reddy VK, Tyagi AK, Nair MM (2020) Distance-based Bootstrap sampling in bagging for imbalanced data-set. In: International conference on emerging trends in information technology and engineering (IC-ETITE), pp 1–6. IEEE
Metadata
Title
BL_SMOTE Ensemble Method for Prediction of Thyroid Disease on Imbalanced Classification Problem
Authors
Rajshree Srivastava
Pardeep Kumar
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-16-0733-2_52