Skip to main content
Erschienen in: Soft Computing 13/2019

03.04.2018 | Methodologies and Application

Diagnosis system for imbalanced multi-minority medical dataset

verfasst von: Swati Shilaskar, Ashok Ghatol

Erschienen in: Soft Computing | Ausgabe 13/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Medical datasets inherently suffer from imbalance problem. Occurrence of some of the sub-pathologies is scarce than the other. In this work, a disease diagnosis system for multiclass classification is developed. Hybrid synthetic sampling technique is used for extremely imbalanced datasets. Cluster-based self-class algorithm is proposed in this work. Compared to near miss algorithm, this exhibits equivalent performance with reduced time for sampling. The results of classification are compared across baseline approaches which do not consider clustering and synthetic sampling. A new technique based on confidence measure is proposed to evaluate test samples by OVO classifiers. This technique along with hybrid sampling suggests an improvement over the classical approaches currently used in disease diagnosis systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ahmadi MA, Bahadori A (2015) A LSSVM approach for determining well placement and conning phenomena in horizontal wells. Fuel 153:276–283CrossRef Ahmadi MA, Bahadori A (2015) A LSSVM approach for determining well placement and conning phenomena in horizontal wells. Fuel 153:276–283CrossRef
Zurück zum Zitat Autio L, Juhola M, Laurikkala J (2007) On the neural network classification of medical data and an endeavour to balance non-uniform data sets with artificial data extension. Comput Biol Med 37(3):388–397CrossRef Autio L, Juhola M, Laurikkala J (2007) On the neural network classification of medical data and an endeavour to balance non-uniform data sets with artificial data extension. Comput Biol Med 37(3):388–397CrossRef
Zurück zum Zitat Bhatia S, Prakash P, Pillai GN (2008) SVM based decision support system for heart disease classification with integer-coded genetic algorithm to select critical features. In: Proceedings of the world congress on engineering and computer science, WCECS 2008, pp 22–24 Bhatia S, Prakash P, Pillai GN (2008) SVM based decision support system for heart disease classification with integer-coded genetic algorithm to select critical features. In: Proceedings of the world congress on engineering and computer science, WCECS 2008, pp 22–24
Zurück zum Zitat Chawla NV (2010) Data mining and knowledge discovery handbook. Springer, New York, pp 875–886 Chawla NV (2010) Data mining and knowledge discovery handbook. Springer, New York, pp 875–886
Zurück zum Zitat Das B, Krishnan NC, Cook DJ (2015) RACOG and wRACOG: two probabilistic oversampling techniques. IEEE Trans Knowl Data Eng 27(1):222–234CrossRef Das B, Krishnan NC, Cook DJ (2015) RACOG and wRACOG: two probabilistic oversampling techniques. IEEE Trans Knowl Data Eng 27(1):222–234CrossRef
Zurück zum Zitat Juhola M, Viikki K, Laurikkala J, Pyykko I, Kentala E (2001) On classification capability of neural networks: a case study with otoneurological data. Stud Health Technol Inform 1:474–478 Juhola M, Viikki K, Laurikkala J, Pyykko I, Kentala E (2001) On classification capability of neural networks: a case study with otoneurological data. Stud Health Technol Inform 1:474–478
Zurück zum Zitat Kohli N, Verma NK, Roy A (2010) SVM based methods for arrhythmia classification in ECG. In: 2010 international conference on computer and communication technology (ICCCT), pp 486–490. IEEE Kohli N, Verma NK, Roy A (2010) SVM based methods for arrhythmia classification in ECG. In: 2010 international conference on computer and communication technology (ICCCT), pp 486–490. IEEE
Zurück zum Zitat Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232CrossRef Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232CrossRef
Zurück zum Zitat Liu A, Ghosh J, Martin CE (2007) Generative oversampling for mining imbalanced datasets. In: Proceedings of the 2007 international conference on data mining, DMIN2007, 25–28 June 2007, Las Vegas, Nevada, USA, pp 66–72 Liu A, Ghosh J, Martin CE (2007) Generative oversampling for mining imbalanced datasets. In: Proceedings of the 2007 international conference on data mining, DMIN2007, 25–28 June 2007, Las Vegas, Nevada, USA, pp 66–72
Zurück zum Zitat Marqués Marzal AI, Garc’ıa Jim’enez V, Sánchez Garreta JS (2013) On the suitability of resampling techniques for the class imbalance problem in credit scoring. J Oper Res Soc 64(7):1060–1070CrossRef Marqués Marzal AI, Garc’ıa Jim’enez V, Sánchez Garreta JS (2013) On the suitability of resampling techniques for the class imbalance problem in credit scoring. J Oper Res Soc 64(7):1060–1070CrossRef
Zurück zum Zitat Martinez GD, Eduardo L, Alfonso O, Antonio M (2012a) Score level versus audio level fusion for voice pathology detection on the Saarbrucken voice database. In: Advances in speech and language technologies for Iberian languages—Iber SPEECH, 2012 conference, Madrid, Spain, 21–23 Nov 2012. Proceedings, pp 110–120. https://doi.org/10.1007/978-3-642-35292-8_12 Martinez GD, Eduardo L, Alfonso O, Antonio M (2012a) Score level versus audio level fusion for voice pathology detection on the Saarbrucken voice database. In: Advances in speech and language technologies for Iberian languages—Iber SPEECH, 2012 conference, Madrid, Spain, 21–23 Nov 2012. Proceedings, pp 110–120. https://​doi.​org/​10.​1007/​978-3-642-35292-8_​12
Zurück zum Zitat Martinez GD, Lleida E, Ortega A, Miguel A, Villalba JA (2012b) Voice pathology detection on the Saarbrucken voice database with calibration and fusion of scores using multifocal toolkit. In: Advances in speech and language technologies for Iberian languages—IberSPEECH 2012 conference, Madrid, Spain, 21–23 Nov 2012. Proceedings, pp 99–109. https://doi.org/10.1007/978-3-642-35292-8_11 Martinez GD, Lleida E, Ortega A, Miguel A, Villalba JA (2012b) Voice pathology detection on the Saarbrucken voice database with calibration and fusion of scores using multifocal toolkit. In: Advances in speech and language technologies for Iberian languages—IberSPEECH 2012 conference, Madrid, Spain, 21–23 Nov 2012. Proceedings, pp 99–109. https://​doi.​org/​10.​1007/​978-3-642-35292-8_​11
Zurück zum Zitat Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203CrossRef Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203CrossRef
Zurück zum Zitat Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437CrossRef Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437CrossRef
Zurück zum Zitat Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B Cybern 39(1):281–288CrossRef Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B Cybern 39(1):281–288CrossRef
Zurück zum Zitat Teixeira JP, Fernandes PO (2014) Jitter, shimmer and HNR classification within gender, tones and vowels in healthy voices. Procedia Technol 16(2014):1228–1237CrossRef Teixeira JP, Fernandes PO (2014) Jitter, shimmer and HNR classification within gender, tones and vowels in healthy voices. Procedia Technol 16(2014):1228–1237CrossRef
Zurück zum Zitat Varpa K, Iltanen K, Juhola M (2014) Genetic algorithm based approach in attribute weighting for a medical data set. J Comput Med Varpa K, Iltanen K, Juhola M (2014) Genetic algorithm based approach in attribute weighting for a medical data set. J Comput Med
Zurück zum Zitat Yen S-J, Lee Y-S (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727MathSciNetCrossRef Yen S-J, Lee Y-S (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727MathSciNetCrossRef
Zurück zum Zitat Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Workshop on learning from imbalanced datasets II ICML Washington, DC, pp 42–48 Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Workshop on learning from imbalanced datasets II ICML Washington, DC, pp 42–48
Zurück zum Zitat Zhang YP, Zhang LN, Wang YC (2010) Cluster-based majority under-sampling approaches for class imbalance learning. In: 2010 2nd ieee international conference on information and financial engineering (ICIFE), pp 400–404. IEEE Zhang YP, Zhang LN, Wang YC (2010) Cluster-based majority under-sampling approaches for class imbalance learning. In: 2010 2nd ieee international conference on information and financial engineering (ICIFE), pp 400–404. IEEE
Zurück zum Zitat Zheng Y, Yi X, Li M, Li R, Shan Z, Chang E, Li T (2015) Forecasting fine-grained air quality based on big data. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’15). ACM, New York, NY, pp 2267–2276. https://doi.org/10.1145/2783258.2788573 Zheng Y, Yi X, Li M, Li R, Shan Z, Chang E, Li T (2015) Forecasting fine-grained air quality based on big data. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’15). ACM, New York, NY, pp 2267–2276. https://​doi.​org/​10.​1145/​2783258.​2788573
Metadaten
Titel
Diagnosis system for imbalanced multi-minority medical dataset
verfasst von
Swati Shilaskar
Ashok Ghatol
Publikationsdatum
03.04.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 13/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-018-3133-x

Weitere Artikel der Ausgabe 13/2019

Soft Computing 13/2019 Zur Ausgabe

Premium Partner