Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 1/2013

01.02.2013 | Original Article

Comparative study on classification performance between support vector machine and logistic regression

verfasst von: Abdallah Bashir Musa

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Support vector machine (SVM) is a comparatively new machine learning algorithm for classification, while logistic regression (LR) is an old standard statistical classification method. Although there have been many comprehensive studies comparing SVM and LR, since they were made, there have been many new improvements applied to them such as bagging and ensemble. Recently, bagging and ensemble learning have become hot topics, widely used to improve the generalization performance of single learning algorithm. Therefore, comparing classification performance between SVM and LR using bagging and ensemble is an interesting issue. The average of estimated probabilities’ strategy was used for combining classifiers in this paper. Different evaluation metrics assess different characteristics of machine learning algorithm. It is possible for a learning method to perform well on one metric, but be suboptimal on other metrics. Therefore this study includes a variety of criteria to evaluate the classification performance of the learning methods: accuracy, sensitivity, specificity, precision, F-score and the area under the receiver operating characteristic curve. This has not been included in previous studies of SVM, owing to the fact that it did not support estimated probabilities at that time. Other metrics used in medical diagnosis, such as, Youden’s index (γ), positive and negative likelihoods (ρ+, ρ−) and diagnostic odds ratio were evaluated to convey and compare the qualities of the two algorithms. This study is distinct by its inclusion of a comprehensive statistical analysis for the results of the SVM and LR algorithms on various data sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley series in probability and statistics, Wiley, Inc, New York Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley series in probability and statistics, Wiley, Inc, New York
2.
Zurück zum Zitat Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, 4th edn. Irwin, Chicago Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, 4th edn. Irwin, Chicago
3.
Zurück zum Zitat Wang L (ed) (2005) Support vector machines theory and applications. Springer, BerlinMATH Wang L (ed) (2005) Support vector machines theory and applications. Springer, BerlinMATH
4.
Zurück zum Zitat Kecman V (2001) Learning and soft computing: support vector machines, neural networks, and fuzzy logic models. MIT, CambridgeMATH Kecman V (2001) Learning and soft computing: support vector machines, neural networks, and fuzzy logic models. MIT, CambridgeMATH
5.
Zurück zum Zitat Vapnik VN (1995) The nature of statistical learning theory. Springer, New YorkMATH Vapnik VN (1995) The nature of statistical learning theory. Springer, New YorkMATH
7.
Zurück zum Zitat King RD, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Applied Artif Intell 9(3):289–333CrossRef King RD, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Applied Artif Intell 9(3):289–333CrossRef
8.
Zurück zum Zitat Muniz AMS, Nadal J, Liu H, Liu W, Lyons KE, Pahwa R (2010) Comparison among probabilistic neural network, support vector machine and logistic regression for evaluating the effect of subthalamic stimulation in Parkinson disease on ground reaction force during gait. J Biomech 43(4):720–726CrossRef Muniz AMS, Nadal J, Liu H, Liu W, Lyons KE, Pahwa R (2010) Comparison among probabilistic neural network, support vector machine and logistic regression for evaluating the effect of subthalamic stimulation in Parkinson disease on ground reaction force during gait. J Biomech 43(4):720–726CrossRef
9.
Zurück zum Zitat Xu L, Chow M-C, Gao X-Z (2005) Comparisons of logistic regression and artificial neural network on power distribution systems fault cause identification. Proceedings of 2005 IEEE Mid-Summer Workshop on Soft Computing in Industrial Applications (SMCia/05) Xu L, Chow M-C, Gao X-Z (2005) Comparisons of logistic regression and artificial neural network on power distribution systems fault cause identification. Proceedings of 2005 IEEE Mid-Summer Workshop on Soft Computing in Industrial Applications (SMCia/05)
10.
Zurück zum Zitat Chen W-H, Shih J-Y, Wu S (2006) Comparison of support-vector machines and back propagation neural networks in forecasting the six major Asian stock markets. Int J Electron Fin 1(1):49–67. doi:10.1504/IJEF.2006.008837 Chen W-H, Shih J-Y, Wu S (2006) Comparison of support-vector machines and back propagation neural networks in forecasting the six major Asian stock markets. Int J Electron Fin 1(1):49–67. doi:10.​1504/​IJEF.​2006.​008837
11.
Zurück zum Zitat Song JH, Venkatesh SS, Conan EA (2005) Comparative analysis of logistic regression and artificial neural network for computer-aided diagnosis of breast masses. Acad Radiol 12(4):487–495CrossRef Song JH, Venkatesh SS, Conan EA (2005) Comparative analysis of logistic regression and artificial neural network for computer-aided diagnosis of breast masses. Acad Radiol 12(4):487–495CrossRef
12.
Zurück zum Zitat Verplancke T, Van Looy S, Benoit D, Vansteelandt S, Depuydt P, De Turck F, Decruyenaere J (2008) Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with hematological malignancies. BMC Med Inform Decis Mak 8:56. doi:10.1186/1472-6947-8-56 CrossRef Verplancke T, Van Looy S, Benoit D, Vansteelandt S, Depuydt P, De Turck F, Decruyenaere J (2008) Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with hematological malignancies. BMC Med Inform Decis Mak 8:56. doi:10.​1186/​1472-6947-8-56 CrossRef
13.
Zurück zum Zitat Kuncheva LI (2004) Combining pattern classifiers methods and algorithms. Wiley, HobokenMATHCrossRef Kuncheva LI (2004) Combining pattern classifiers methods and algorithms. Wiley, HobokenMATHCrossRef
14.
Zurück zum Zitat Zhang L (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recogn 44(1):97–106MATHCrossRef Zhang L (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recogn 44(1):97–106MATHCrossRef
16.
Zurück zum Zitat Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybernet 1(1–4):3–25CrossRef Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybernet 1(1–4):3–25CrossRef
17.
Zurück zum Zitat He Q, Wang X, Chen J, Yan L (2006) A parallel genetic algorithm for solving the inverse problem of support vector machines. ICMLC 2005 LNAI 3930:871–879 He Q, Wang X, Chen J, Yan L (2006) A parallel genetic algorithm for solving the inverse problem of support vector machines. ICMLC 2005 LNAI 3930:871–879
18.
Zurück zum Zitat Wang X-Z, He Q, Chen D-G, Yeung D (2005) A genetic algorithm for solving the inverse problem of support vector machines. Neurocomputing 68:225–238CrossRef Wang X-Z, He Q, Chen D-G, Yeung D (2005) A genetic algorithm for solving the inverse problem of support vector machines. Neurocomputing 68:225–238CrossRef
19.
Zurück zum Zitat He Q, Congxin Wu (2011) Separating theorem of samples in Banach space for support vector machine learning. Int J Mach Learn Cybernet (IJMLC) 2(1):49–54CrossRef He Q, Congxin Wu (2011) Separating theorem of samples in Banach space for support vector machine learning. Int J Mach Learn Cybernet (IJMLC) 2(1):49–54CrossRef
20.
Zurück zum Zitat Sathiya Keerthi S, Lin C-J (2003) Asymptotic behaviors of support vector machines with Gaussian Kernel. Neural Comput 15(7):1667–1689MATHCrossRef Sathiya Keerthi S, Lin C-J (2003) Asymptotic behaviors of support vector machines with Gaussian Kernel. Neural Comput 15(7):1667–1689MATHCrossRef
21.
Zurück zum Zitat Zhang S, McCullagh P, Nugent C, Zheng H, Baumgarten M (2011) Optimal model selection for posture recognition in home-based healthcare. Int J Mach Learn Cybernet (IJMLC) 2(1):1–14CrossRef Zhang S, McCullagh P, Nugent C, Zheng H, Baumgarten M (2011) Optimal model selection for posture recognition in home-based healthcare. Int J Mach Learn Cybernet (IJMLC) 2(1):1–14CrossRef
22.
Zurück zum Zitat Wang X-Z, Shu-Xia Lu, Zhai J-H (2008) Fast fuzzy multi-category SVM based on support vector domain description. Int J Pattern Recognit Artif Intell 22(1):109–120CrossRef Wang X-Z, Shu-Xia Lu, Zhai J-H (2008) Fast fuzzy multi-category SVM based on support vector domain description. Int J Pattern Recognit Artif Intell 22(1):109–120CrossRef
23.
Zurück zum Zitat Kuss O (2002) Global goodness-of-t tests in logistic regression with sparse data. Statist Med 21:380–3789CrossRef Kuss O (2002) Global goodness-of-t tests in logistic regression with sparse data. Statist Med 21:380–3789CrossRef
25.
Zurück zum Zitat Valentini G, Dietterich TG (2004) Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. J Mach Learn Res 5:725–775MathSciNetMATH Valentini G, Dietterich TG (2004) Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. J Mach Learn Res 5:725–775MathSciNetMATH
26.
Zurück zum Zitat Valentini G, Dietterich TG (2003) Low Bias Bagged Support Vector Machines. Machine Learning, Proceedings of the Twentieth International Conference (ICML) Washington, DC, USA, pp 752–759 Valentini G, Dietterich TG (2003) Low Bias Bagged Support Vector Machines. Machine Learning, Proceedings of the Twentieth International Conference (ICML) Washington, DC, USA, pp 752–759
27.
Zurück zum Zitat Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. AI 2006: advances in artificial intelligence. LNCS 4304:1015–1021. doi:10.1007/11941439_114 Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. AI 2006: advances in artificial intelligence. LNCS 4304:1015–1021. doi:10.​1007/​11941439_​114
28.
Zurück zum Zitat Pereira BdeB, Pereira CAdeB (2005) A likelihood approach to diagnostic tests in clinical medicine. Revstat Stat J, Lisboa 3(1):77–98MATH Pereira BdeB, Pereira CAdeB (2005) A likelihood approach to diagnostic tests in clinical medicine. Revstat Stat J, Lisboa 3(1):77–98MATH
29.
Zurück zum Zitat Glasa AS, Lijmer JG, Bossuyta PMM (2003) The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol 56:1129–1135CrossRef Glasa AS, Lijmer JG, Bossuyta PMM (2003) The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol 56:1129–1135CrossRef
30.
Zurück zum Zitat Bradley AP (1997) The use of the area under the roc curves in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159CrossRef Bradley AP (1997) The use of the area under the roc curves in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159CrossRef
31.
Zurück zum Zitat Avergara I, Norambuena T, Ferrada E, Slater AW, Melo F (2008) A simple tool for the statistical comparison of ROC curves. BMC Bioinform 9:265CrossRef Avergara I, Norambuena T, Ferrada E, Slater AW, Melo F (2008) A simple tool for the statistical comparison of ROC curves. BMC Bioinform 9:265CrossRef
32.
Zurück zum Zitat Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating graph. J Math Psychol 12(4):387–415 Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating graph. J Math Psychol 12(4):387–415
33.
Zurück zum Zitat DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837–845MATHCrossRef DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837–845MATHCrossRef
34.
Zurück zum Zitat Montgomery DC (2001) Design and analysis of experiments, 5th edn. Wiley Inc, New York, pp 21–54 Montgomery DC (2001) Design and analysis of experiments, 5th edn. Wiley Inc, New York, pp 21–54
35.
Zurück zum Zitat Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
36.
Zurück zum Zitat Liu Z, Wu Q, Zhang Y, Philip Chen CL (2011) Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cyber 2(1):37–47CrossRef Liu Z, Wu Q, Zhang Y, Philip Chen CL (2011) Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cyber 2(1):37–47CrossRef
37.
Zurück zum Zitat Hsu C-W, Chang C-C, Lin C-J (2010) A practical guide to support vector classification. Citeseer 1(1):1–16 Hsu C-W, Chang C-C, Lin C-J (2010) A practical guide to support vector classification. Citeseer 1(1):1–16
38.
Zurück zum Zitat He Q, Congxin Wu (2011) Membership evaluation and feature selection for fuzzy support vector machine based on fuzzy rough sets. Soft Comput 15(6):1105–1114CrossRef He Q, Congxin Wu (2011) Membership evaluation and feature selection for fuzzy support vector machine based on fuzzy rough sets. Soft Comput 15(6):1105–1114CrossRef
39.
Zurück zum Zitat Stone M (1974) Cross-validatory choice and assessment of statistical prediction. J Royal Stat Soc B 36:111–147MATH Stone M (1974) Cross-validatory choice and assessment of statistical prediction. J Royal Stat Soc B 36:111–147MATH
40.
Zurück zum Zitat Mahmood Z (2009) On the use of K-fold cross-validation to choose cutoff values and assess the performance of predictive models in stepwise regression. Int J Biostat 5(1), Article 25 Mahmood Z (2009) On the use of K-fold cross-validation to choose cutoff values and assess the performance of predictive models in stepwise regression. Int J Biostat 5(1), Article 25
41.
Zurück zum Zitat Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27CrossRef Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27CrossRef
42.
Zurück zum Zitat Mood G (1974) Introduction to the theory of statistics, 3rd edn. McGraw Hill, New York, pp 2–32 Mood G (1974) Introduction to the theory of statistics, 3rd edn. McGraw Hill, New York, pp 2–32
Metadaten
Titel
Comparative study on classification performance between support vector machine and logistic regression
verfasst von
Abdallah Bashir Musa
Publikationsdatum
01.02.2013
Verlag
Springer-Verlag
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 1/2013
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-012-0068-x

Weitere Artikel der Ausgabe 1/2013

International Journal of Machine Learning and Cybernetics 1/2013 Zur Ausgabe

Neuer Inhalt