Skip to main content
Erschienen in: International Journal of Computer Assisted Radiology and Surgery 1/2014

01.01.2014 | Original Article

ROC operating point selection for classification of imbalanced data with application to computer-aided polyp detection in CT colonography

verfasst von: Bowen Song, Guopeng Zhang, Wei Zhu, Zhengrong Liang

Erschienen in: International Journal of Computer Assisted Radiology and Surgery | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Purpose

   Computer-aided detection and diagnosis (CAD) of colonic polyps always faces the challenge of classifying imbalanced data. In this paper, three new operating point selection strategies based on receiver operating characteristic curve are proposed to address the problem.

Methods

   Classification on imbalanced data performs inferiorly because of a major reason that the best differentiation threshold shifts due to the degree of data imbalance. To address this decision threshold shifting issue, three operating point selection strategies, i.e., shortest distance, harmonic mean and anti-harmonic mean, are proposed and their performances are investigated.

Results

   Experiments were conducted on a class-imbalanced database, which contains 64 polyps in 786 polyp candidates. Support vector machine (SVM) and random forests (RFs) were employed as basic classifiers. Two imbalanced data correcting techniques, i.e., cost-sensitive learning and training data down sampling, were applied to SVM and RFs, and their performances were compared with the proposed strategies. Comparing to the original thresholding method, i.e., 0.488 sensitivity and 0.986 specificity for RFs and 0.526 sensitivity and 0.977 specificity for SVM, our strategies achieved more balanced results, which are around 0.89 sensitivity and 0.92 specificity for RFs and 0.88 sensitivity and 0.90 specificity for SVM. Meanwhile, their performance remained at the same level regardless of whether other correcting methods are used.

Conclusions

   Based on the above experiments, the gain of our proposed strategies is noticeable: the sensitivity improved from 0.5 to around 0.88 for RFs and 0.89 for SVM while remaining a relatively high level of specificity, i.e., 0.92 for RFs and 0.90 for SVM. The performance of our proposed strategies was adaptive and robust with different levels of imbalanced data. This indicates a feasible solution to the shifting problem for favorable sensitivity and specificity in CAD of polyps from imbalanced data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat American Cancer Society (2012) Cancer facts & figures 2012. American Cancer Society, Atlanta American Cancer Society (2012) Cancer facts & figures 2012. American Cancer Society, Atlanta
3.
Zurück zum Zitat Gluecker T, Johnson C, Harmsen W, Offord K, Harris A, Wilson L, Ahlquist D (2003) Colorectal cancer screening with CT colonography, colonoscopy, and double-contrast barium enema examination: prospective assessment of patient perceptions and preferences. Radiology 227(2):378–384PubMedCrossRef Gluecker T, Johnson C, Harmsen W, Offord K, Harris A, Wilson L, Ahlquist D (2003) Colorectal cancer screening with CT colonography, colonoscopy, and double-contrast barium enema examination: prospective assessment of patient perceptions and preferences. Radiology 227(2):378–384PubMedCrossRef
4.
Zurück zum Zitat Pickhardt P, Choi J, Hwang I, Butler J, Puckett M, Hildebrandt H, Wong R, Nugent P, Mysliwiec P, Schindler W (2003) Computed tomographic virtual colonoscopy to screen for colorectal neoplasia in asymptomatic adults. N Engl J Med 349:2191–2200PubMedCrossRef Pickhardt P, Choi J, Hwang I, Butler J, Puckett M, Hildebrandt H, Wong R, Nugent P, Mysliwiec P, Schindler W (2003) Computed tomographic virtual colonoscopy to screen for colorectal neoplasia in asymptomatic adults. N Engl J Med 349:2191–2200PubMedCrossRef
5.
Zurück zum Zitat Summers RM, Yao J, Pickhardt P, Franaszek M, Bitter I, Brickman D, Krishna V, Choi R (2005) Computed tomographic virtual colonoscopy computer-aided polyp detection in a screening population. Gastroenterology 129:1832–1844PubMedCentralPubMedCrossRef Summers RM, Yao J, Pickhardt P, Franaszek M, Bitter I, Brickman D, Krishna V, Choi R (2005) Computed tomographic virtual colonoscopy computer-aided polyp detection in a screening population. Gastroenterology 129:1832–1844PubMedCentralPubMedCrossRef
6.
Zurück zum Zitat Wang S, Zhu H, Lu H, Liang Z (2008) Volume-based feature analysis of mucosa for automatic initial polyp detection in virtual colonoscopy. Int J Comput Assist Radiol Surg 3(1–2):131–142PubMedCentralPubMedCrossRef Wang S, Zhu H, Lu H, Liang Z (2008) Volume-based feature analysis of mucosa for automatic initial polyp detection in virtual colonoscopy. Int J Comput Assist Radiol Surg 3(1–2):131–142PubMedCentralPubMedCrossRef
8.
Zurück zum Zitat Hossain M, Hassan M, Kirley M, Bailey J (2008) ROC-tree: a novel decision tree induction algorithm based on receiver operating characteristics to classify gene expression data. In: Proceedings of the 2008 SIAM international conference on data mining (SDM), pp 455–465 Hossain M, Hassan M, Kirley M, Bailey J (2008) ROC-tree: a novel decision tree induction algorithm based on receiver operating characteristics to classify gene expression data. In: Proceedings of the 2008 SIAM international conference on data mining (SDM), pp 455–465
9.
Zurück zum Zitat Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874CrossRef Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874CrossRef
10.
Zurück zum Zitat Rakotomamonjy A (2004) Optimizing area under ROC curve with SVMs. ROC Analysis in Artificial Intelligence, pp 71–80 Rakotomamonjy A (2004) Optimizing area under ROC curve with SVMs. ROC Analysis in Artificial Intelligence, pp 71–80
11.
Zurück zum Zitat Zhao P, Hoi SCH, Jin R, Yang T (2011) Online AUC maximization. In: Proceeding of international conference of machine learning Zhao P, Hoi SCH, Jin R, Yang T (2011) Online AUC maximization. In: Proceeding of international conference of machine learning
12.
Zurück zum Zitat Yoshida H, Nappi J (2001) Three-dimensional computer-aided diagnosis scheme for detection of colonic polyps. IEEE Trans Med Imag 20(12):1261–1274CrossRef Yoshida H, Nappi J (2001) Three-dimensional computer-aided diagnosis scheme for detection of colonic polyps. IEEE Trans Med Imag 20(12):1261–1274CrossRef
13.
Zurück zum Zitat Wang Z, Liang Z, Li L, Li X, Li B, Anderson J, Harrington D (2005) Reduction of false positives by internal features for polyp detection in CT-based virtual colonoscopy. Med Phys 32(12):3602–3616PubMedCentralPubMedCrossRef Wang Z, Liang Z, Li L, Li X, Li B, Anderson J, Harrington D (2005) Reduction of false positives by internal features for polyp detection in CT-based virtual colonoscopy. Med Phys 32(12):3602–3616PubMedCentralPubMedCrossRef
14.
Zurück zum Zitat Liu J, Yao J, Summers R (2008) Scale-based scatter correction for computer-aided polyp detection in CT colonography. Med Phys 35(12):5664–5671PubMedCrossRef Liu J, Yao J, Summers R (2008) Scale-based scatter correction for computer-aided polyp detection in CT colonography. Med Phys 35(12):5664–5671PubMedCrossRef
15.
Zurück zum Zitat Zhu H, Duan C, Pickhardt P, Wang S, Liang Z (2009) CAD of colonic polyps with level set-based adaptive convolution in volumetric mucosa to advance CT colonography toward a screening modality. J Cancer Manag Res DOVE Med Press 1:1–13 Zhu H, Duan C, Pickhardt P, Wang S, Liang Z (2009) CAD of colonic polyps with level set-based adaptive convolution in volumetric mucosa to advance CT colonography toward a screening modality. J Cancer Manag Res DOVE Med Press 1:1–13
16.
Zurück zum Zitat Marelo F, Musé P, Aguirre S, Sapiro G (2010) Automatic colon polyp flagging via geometric and texture features. Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, pp 3170–3173 Marelo F, Musé P, Aguirre S, Sapiro G (2010) Automatic colon polyp flagging via geometric and texture features. Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, pp 3170–3173
17.
Zurück zum Zitat Zhu H, Fan Y, Lu H, Liang Z (2011) Improved curvature estimation for computer-aided detection of colonic polyps in CT colonography. Acad Radiol 18(8):1024–1034PubMedCentralPubMedCrossRef Zhu H, Fan Y, Lu H, Liang Z (2011) Improved curvature estimation for computer-aided detection of colonic polyps in CT colonography. Acad Radiol 18(8):1024–1034PubMedCentralPubMedCrossRef
18.
Zurück zum Zitat American College of Radiology (2005) ACR practice guideline for the performance of computed tomography (CT) colonography in adults. ACR Pract Guidel 29:295–298 American College of Radiology (2005) ACR practice guideline for the performance of computed tomography (CT) colonography in adults. ACR Pract Guidel 29:295–298
19.
Zurück zum Zitat Breiman L (1996) Bagging predictors. Mach Learn 24:123–140 Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
20.
Zurück zum Zitat Breiman L (2001) Random forests. Mach Learn 45(1):5–32 Breiman L (2001) Random forests. Mach Learn 45(1):5–32
21.
Zurück zum Zitat Vapnik V (1998) Statistical learning theory. Wiley, New York Vapnik V (1998) Statistical learning theory. Wiley, New York
22.
Zurück zum Zitat Morik K, Brokhausen P, Joachims T (1999) Combining statistical learning with a knowledge-based approach—a case study in intensive care monitoring. In: Proceedings 16th international conference on machine learning Morik K, Brokhausen P, Joachims T (1999) Combining statistical learning with a knowledge-based approach—a case study in intensive care monitoring. In: Proceedings 16th international conference on machine learning
24.
Zurück zum Zitat Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection. In: Proceedings computer vision and pattern recognition pp 130–136 Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection. In: Proceedings computer vision and pattern recognition pp 130–136
25.
Zurück zum Zitat Pontil M, Verri A (1998) Object recognition with support vector machines. IEEE Trans Pattern Anal Mach Intell 20:637–646 Pontil M, Verri A (1998) Object recognition with support vector machines. IEEE Trans Pattern Anal Mach Intell 20:637–646
26.
Zurück zum Zitat Diaz-Uriarte R, Alvarez de Andres S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics. doi:10.1186/1471-2105-7-3 Diaz-Uriarte R, Alvarez de Andres S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics. doi:10.​1186/​1471-2105-7-3
27.
Zurück zum Zitat Alexandre LA, Casteleiro J, Nobreinst N (2007) Polyp detection in endoscopic video using SVMs. Lect Notes Comput Sci 4702:358–365 Alexandre LA, Casteleiro J, Nobreinst N (2007) Polyp detection in endoscopic video using SVMs. Lect Notes Comput Sci 4702:358–365
28.
Zurück zum Zitat Zhu H, Liang Z, Barish M, Pickhardt P, You J, Wang S, Fan Y, Lu H, Richards R, Posniak E, Cohen H (2010) Increasing computer-aided detection specificity by projection features for CT colonography. Med Phys 37(4):1468–1481PubMedCrossRef Zhu H, Liang Z, Barish M, Pickhardt P, You J, Wang S, Fan Y, Lu H, Richards R, Posniak E, Cohen H (2010) Increasing computer-aided detection specificity by projection features for CT colonography. Med Phys 37(4):1468–1481PubMedCrossRef
29.
Zurück zum Zitat Liu M, Lu L, Bi J, Raykar V, Wolf M, Salganicoff M (2011) Robust large scale prone-supine polyp matching using local features: a metric learning approach. Med Image Comput Assist Interv 14(3):75–82 Liu M, Lu L, Bi J, Raykar V, Wolf M, Salganicoff M (2011) Robust large scale prone-supine polyp matching using local features: a metric learning approach. Med Image Comput Assist Interv 14(3):75–82
30.
Zurück zum Zitat Liu M, Lu L, Ye X, Yu J, Salganicoff M (2011) Sparse classification for computer aided diagnosis using learned dictionaries. In: Proceedings of the 14th international conference on medical image computing and computer assisted intervention (MICCAI), September 18–22, 2011, Toronto, Canada Liu M, Lu L, Ye X, Yu J, Salganicoff M (2011) Sparse classification for computer aided diagnosis using learned dictionaries. In: Proceedings of the 14th international conference on medical image computing and computer assisted intervention (MICCAI), September 18–22, 2011, Toronto, Canada
32.
Zurück zum Zitat Chen C, Liaw A, Breiman L (2004) Using random forest to learn Imbalanced data. Technical Report of Dept. of Stat., UC, Berkeley Chen C, Liaw A, Breiman L (2004) Using random forest to learn Imbalanced data. Technical Report of Dept. of Stat., UC, Berkeley
33.
Zurück zum Zitat He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
35.
Zurück zum Zitat Maloof M (2003) Learning when data sets are imbalanced and when cost are unequal and unknown. In: Proceedings ICML workshop learn imbalanced data sets, pp 73–80 Maloof M (2003) Learning when data sets are imbalanced and when cost are unequal and unknown. In: Proceedings ICML workshop learn imbalanced data sets, pp 73–80
Metadaten
Titel
ROC operating point selection for classification of imbalanced data with application to computer-aided polyp detection in CT colonography
verfasst von
Bowen Song
Guopeng Zhang
Wei Zhu
Zhengrong Liang
Publikationsdatum
01.01.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Computer Assisted Radiology and Surgery / Ausgabe 1/2014
Print ISSN: 1861-6410
Elektronische ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-013-0913-8

Weitere Artikel der Ausgabe 1/2014

International Journal of Computer Assisted Radiology and Surgery 1/2014 Zur Ausgabe