Skip to main content

2016 | OriginalPaper | Buchkapitel

Weak Classifiers Performance Measure in Handling Noisy Clinical Trial Data

verfasst von : Ezzatul Akmal Kamaru-Zaman, Andrew Brass, James Weatherall, Shuzlina Abdul Rahman

Erschienen in: Soft Computing in Data Science

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most research concluded that machine learning performance is better when dealing with cleaned dataset compared to dirty dataset. In this paper, we experimented three weak or base machine learning classifiers: Decision Table, Naive Bayes and k-Nearest Neighbor to see their performance on real-world, noisy and messy clinical trial dataset rather than employing beautifully designed dataset. We involved the clinical trial data scientist in leading us to a better data analysis exploration and enhancing the performance result evaluation. The classifiers performances were analyzed using Accuracy and Receiver Operating Characteristic (ROC), supported with sensitivity, specificity and precision values which resulted to contradiction of conclusion made by previous research. We employed pre-processing techniques such as interquartile range technique to remove the outliers and mean imputation to handle missing values and these techniques resulted to; all three classifiers work better in dirty dataset compared to imputed and clean dataset by showing highest accuracy and ROC measure. Decision Table turns out to be the best classifier when dealing with real-world noisy clinical trial.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Rogers, S., Girolami, M.: A First Course in Machine Learning. CRC Press, Boca Raton (2015)MATH Rogers, S., Girolami, M.: A First Course in Machine Learning. CRC Press, Boca Raton (2015)MATH
2.
Zurück zum Zitat Simon, H.A.: Applications of Machine Learning and Rule Induction (1995) Simon, H.A.: Applications of Machine Learning and Rule Induction (1995)
3.
Zurück zum Zitat Gamberger, D., Lavrač, N.: Noise detection and elimination applied to noise handling in KRK chess endgame. In: International Conference Inductive Logic Programming (1997) Gamberger, D., Lavrač, N.: Noise detection and elimination applied to noise handling in KRK chess endgame. In: International Conference Inductive Logic Programming (1997)
5.
Zurück zum Zitat Little, R.J., D’Agostino, R., Cohen, M.L., Dickersin, K., Emerson, S.S., Farrar, J.T., Frangakis, C., Hogan, J.W., Molenberghs, G., Murphy, S.A., Neaton, J.D., Rotnitzky, A., Scharfstein, D., Shih, W.J., Siegel, J.P., Stern, H.: The prevention and treatment of missing data in clinical trials. N. Engl. J. Med. 367(14), 1355–1360 (2012)CrossRef Little, R.J., D’Agostino, R., Cohen, M.L., Dickersin, K., Emerson, S.S., Farrar, J.T., Frangakis, C., Hogan, J.W., Molenberghs, G., Murphy, S.A., Neaton, J.D., Rotnitzky, A., Scharfstein, D., Shih, W.J., Siegel, J.P., Stern, H.: The prevention and treatment of missing data in clinical trials. N. Engl. J. Med. 367(14), 1355–1360 (2012)CrossRef
6.
Zurück zum Zitat Grubbs, F.E.: Procedures for detecting outlying observations in samples (1974) Grubbs, F.E.: Procedures for detecting outlying observations in samples (1974)
7.
Zurück zum Zitat Gamberger, D., Lavrač, N., Duzeroski, S.: Noise detection and elimination in data preprocessing: experiments in medical domains. Appl. Artif. Intell. 14(2), 205–223 (2000)CrossRef Gamberger, D., Lavrač, N., Duzeroski, S.: Noise detection and elimination in data preprocessing: experiments in medical domains. Appl. Artif. Intell. 14(2), 205–223 (2000)CrossRef
8.
Zurück zum Zitat Van Hulse, J., Khoshgoftaar, T.: Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 68(12), 1513–1542 (2009)CrossRef Van Hulse, J., Khoshgoftaar, T.: Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 68(12), 1513–1542 (2009)CrossRef
9.
Zurück zum Zitat Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets. In: ICML, pp. 920–927 (2003) Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets. In: ICML, pp. 920–927 (2003)
10.
Zurück zum Zitat Hall, M.A.: Correlation-based feature selection for machine learning. Methodology 21i195–i20, 1–5 (1999) Hall, M.A.: Correlation-based feature selection for machine learning. Methodology 21i195–i20, 1–5 (1999)
11.
Zurück zum Zitat Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986) Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
12.
Zurück zum Zitat Zupan, B., Demšar, J., Kattan, M.W., Beck, J.R., Bratko, I.: Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif. Intell. Med. 20(1), 59–75 (2000)CrossRef Zupan, B., Demšar, J., Kattan, M.W., Beck, J.R., Bratko, I.: Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif. Intell. Med. 20(1), 59–75 (2000)CrossRef
13.
Zurück zum Zitat Kalapanidas, E., Avouris, N., Craciun, M., Neagu, D.: Machine learning algorithms: a study on noise sensitivity. In: Proceedings of the 1st Balcan Conference on Informatics, pp. 356–365, October 2003 Kalapanidas, E., Avouris, N., Craciun, M., Neagu, D.: Machine learning algorithms: a study on noise sensitivity. In: Proceedings of the 1st Balcan Conference on Informatics, pp. 356–365, October 2003
14.
Zurück zum Zitat Vannucci, M., Colla, V., Cateni, S.: An hybrid ensemble method based on data clustering and weak learners reliabilities estimated through neural networks. In: Rojas, I., Joya, G., Catala, A. (eds.) IWANN 2015. LNCS, vol. 9095, pp. 400–411. Springer, Heidelberg (2015)CrossRef Vannucci, M., Colla, V., Cateni, S.: An hybrid ensemble method based on data clustering and weak learners reliabilities estimated through neural networks. In: Rojas, I., Joya, G., Catala, A. (eds.) IWANN 2015. LNCS, vol. 9095, pp. 400–411. Springer, Heidelberg (2015)CrossRef
15.
Zurück zum Zitat Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)CrossRef Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)CrossRef
16.
Zurück zum Zitat Maclin, R., Opitz, D.: Popular Ensemble Methods: An Empirical Study, arXiv.org, vol. cs.AI, pp. 169–198 (2011) Maclin, R., Opitz, D.: Popular Ensemble Methods: An Empirical Study, arXiv.org, vol. cs.AI, pp. 169–198 (2011)
17.
Zurück zum Zitat Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRef Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRef
18.
Zurück zum Zitat Kohavi, R.: The power of decision tables. In: Machine Learning, ECML 1995, pp. 174–189 (1995) Kohavi, R.: The power of decision tables. In: Machine Learning, ECML 1995, pp. 174–189 (1995)
19.
Zurück zum Zitat Wets, G., Vanthienen, J., Timmermans, H.: Modelling decision tables from data. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394. Springer, Heidelberg (1998)CrossRef Wets, G., Vanthienen, J., Timmermans, H.: Modelling decision tables from data. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394. Springer, Heidelberg (1998)CrossRef
20.
Zurück zum Zitat John, G.H.G., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada, vol. 1, pp. 338–345 (1995) John, G.H.G., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada, vol. 1, pp. 338–345 (1995)
21.
Zurück zum Zitat Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991) Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
22.
Zurück zum Zitat Zweig, M.H., Campbell, G.: Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39(4), 561–577 (1993) Zweig, M.H., Campbell, G.: Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39(4), 561–577 (1993)
23.
Zurück zum Zitat Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques (Google eBook) (2011) Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques (Google eBook) (2011)
24.
Zurück zum Zitat Li, M., Shang, C., Feng, S., Fan, J.: Quick attribute reduction in inconsistent decision tables. Inf. Sci. (Ny) 254, 155–180 (2014)MathSciNetCrossRefMATH Li, M., Shang, C., Feng, S., Fan, J.: Quick attribute reduction in inconsistent decision tables. Inf. Sci. (Ny) 254, 155–180 (2014)MathSciNetCrossRefMATH
25.
Zurück zum Zitat Tomar, D., Agarwal, S.: A survey on data mining approaches for healthcare. Int. J. Bio-Sci. Bio-Technol. 5(5), 241–266 (2013)CrossRef Tomar, D., Agarwal, S.: A survey on data mining approaches for healthcare. Int. J. Bio-Sci. Bio-Technol. 5(5), 241–266 (2013)CrossRef
26.
Zurück zum Zitat Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Miscellaneous Clustering Methods (2011) Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Miscellaneous Clustering Methods (2011)
Metadaten
Titel
Weak Classifiers Performance Measure in Handling Noisy Clinical Trial Data
verfasst von
Ezzatul Akmal Kamaru-Zaman
Andrew Brass
James Weatherall
Shuzlina Abdul Rahman
Copyright-Jahr
2016
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-2777-2_13

Premium Partner