Skip to main content
Erschienen in: Knowledge and Information Systems 3/2024

08.11.2023 | Regular paper

Supervised feature selection using principal component analysis

verfasst von: Fariq Rahmat, Zed Zulkafli, Asnor Juraiza Ishak, Ribhan Zafira Abdul Rahman, Simon De Stercke, Wouter Buytaert, Wardah Tahir, Jamalludin Ab Rahman, Salwa Ibrahim, Muhamad Ismail

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The principal component analysis (PCA) is widely used in computational science branches such as computer science, pattern recognition, and machine learning, as it can effectively reduce the dimensionality of high-dimensional data. In particular, it is a popular transformation method used for feature extraction. In this study, we explore PCA’s ability for feature selection in regression applications. We introduce a new approach using PCA, called Targeted PCA to analyze a multivariate dataset that includes the dependent variable—it identifies the principal component with a high representation of the dependent variable and then examines the selected principal component to capture and rank the contribution of the non-dependent variables. The study also compares the feature selected with that resulting from a Least Absolute Shrinkage and Selection Operator (LASSO) regression. Finally, the selected features were tested in two regression models: multiple linear regression (MLR) and artificial neural network (ANN). The results are presented for three socioeconomic, environmental, and computer image processing datasets. Our study found that 2 of 3 random datasets have more than 50% similarity in the selected features by the PCA and LASSO regression methods. In the regression predictions, our PCA-selected features resulted in little difference compared to the LASSO regression-selected features in terms of the MLR prediction accuracy. However, the ANN regression demonstrated a faster convergence and a higher reduction of error.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182 Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
2.
Zurück zum Zitat Jain D, Singh V (2018) Feature selection and classification systems for chronic disease prediction: a review. Egypt Inf J 19(3):179–189 Jain D, Singh V (2018) Feature selection and classification systems for chronic disease prediction: a review. Egypt Inf J 19(3):179–189
3.
Zurück zum Zitat Tang J, Alelyani S, Liu H (2014) Feature selection for classification: A review. In: Algorithms and applications, data classification, p 37 Tang J, Alelyani S, Liu H (2014) Feature selection for classification: A review. In: Algorithms and applications, data classification, p 37
4.
Zurück zum Zitat Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550CrossRefPubMed Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550CrossRefPubMed
5.
Zurück zum Zitat Verma L, Srivastava S, Negi PC (2016) A hybrid data mining model to predict coronary artery disease cases using non-invasive clinical data. J Med Syst 40(7):1–7CrossRef Verma L, Srivastava S, Negi PC (2016) A hybrid data mining model to predict coronary artery disease cases using non-invasive clinical data. J Med Syst 40(7):1–7CrossRef
6.
Zurück zum Zitat Yu L, Liu H (2003) Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863 Yu L, Liu H (2003) Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863
7.
Zurück zum Zitat Wosiak A, Zakrzewska D (2018) Integrating correlation-based feature selection and clustering for improved cardiovascular disease diagnosis. Complexity, 2018 Wosiak A, Zakrzewska D (2018) Integrating correlation-based feature selection and clustering for improved cardiovascular disease diagnosis. Complexity, 2018
8.
Zurück zum Zitat Hall MA (1999) Correlation-based feature selection for machine learning Hall MA (1999) Correlation-based feature selection for machine learning
9.
Zurück zum Zitat Kumar V, Minz S (2014) Feature selection: a literature review. SmartCR 4(3):211–229CrossRef Kumar V, Minz S (2014) Feature selection: a literature review. SmartCR 4(3):211–229CrossRef
10.
Zurück zum Zitat Shahana AH, Preeja V (2016) Survey on feature subset selection for high dimensional data. In: 2016 International conference on circuit, power and computing technologies (ICCPCT), pp 1–4. IEEE Shahana AH, Preeja V (2016) Survey on feature subset selection for high dimensional data. In: 2016 International conference on circuit, power and computing technologies (ICCPCT), pp 1–4. IEEE
11.
Zurück zum Zitat Song F, Guo Z, Mei D (2010) Feature selection using principal component analysis. In: 2010 international conference on system science, engineering design and manufacturing informatization, vol 1, pp 27–30. IEEE Song F, Guo Z, Mei D (2010) Feature selection using principal component analysis. In: 2010 international conference on system science, engineering design and manufacturing informatization, vol 1, pp 27–30. IEEE
12.
Zurück zum Zitat Mubarak S, Darwis H, Umar F, Ilmawan LB, Anraeni S, Mude MA (2018) Feature selection of oral cyst and tumor images using principal component analysis. In: 2018 2nd east indonesia conference on computer and information technology (EIConCIT), pp 322–325. IEEE Mubarak S, Darwis H, Umar F, Ilmawan LB, Anraeni S, Mude MA (2018) Feature selection of oral cyst and tumor images using principal component analysis. In: 2018 2nd east indonesia conference on computer and information technology (EIConCIT), pp 322–325. IEEE
13.
Zurück zum Zitat Wang XD, Chen RC, Zeng ZQ, Hong CQ, Yan F (2018) Robust dimension reduction for clustering with local adaptive learning. IEEE Trans Neural Netw Learn Syst 30(3):657–669MathSciNetCrossRefPubMed Wang XD, Chen RC, Zeng ZQ, Hong CQ, Yan F (2018) Robust dimension reduction for clustering with local adaptive learning. IEEE Trans Neural Netw Learn Syst 30(3):657–669MathSciNetCrossRefPubMed
14.
Zurück zum Zitat Hair JF (2009) Multivariate data analysis Hair JF (2009) Multivariate data analysis
15.
Zurück zum Zitat Kassambara A (2017) Practical guide to principal component methods. In: R: PCA, M (CA), FAMD, MFA, HCPC, factoextra (Vol. 2). Sthda Kassambara A (2017) Practical guide to principal component methods. In: R: PCA, M (CA), FAMD, MFA, HCPC, factoextra (Vol. 2). Sthda
16.
Zurück zum Zitat Abdi H, Williams LJ (2010) Principal component analysis. WIREs Comp Stat 2:433–459CrossRef Abdi H, Williams LJ (2010) Principal component analysis. WIREs Comp Stat 2:433–459CrossRef
17.
Zurück zum Zitat Xu Y, Zhang D, Yang JY (2010) A feature extraction method for use with bimodal biometrics. Patt Recogn 43(3):1106–1115ADSCrossRef Xu Y, Zhang D, Yang JY (2010) A feature extraction method for use with bimodal biometrics. Patt Recogn 43(3):1106–1115ADSCrossRef
18.
Zurück zum Zitat Giersdorf J, Conzelmann M (2017) Analysis of feature-selection for LASSO regression models Giersdorf J, Conzelmann M (2017) Analysis of feature-selection for LASSO regression models
19.
Zurück zum Zitat Hamming R (2012) Numerical methods for scientists and engineers. Courier Corporation Hamming R (2012) Numerical methods for scientists and engineers. Courier Corporation
20.
Zurück zum Zitat Zhai D, Liu X, Chang H, Zhen Y, Chen X, Guo M, Gao W (2018) Parametric local multiview hamming distance metric learning. Patt Recogn 75:250–262ADSCrossRef Zhai D, Liu X, Chang H, Zhen Y, Chen X, Guo M, Gao W (2018) Parametric local multiview hamming distance metric learning. Patt Recogn 75:250–262ADSCrossRef
21.
Zurück zum Zitat Tang M, Yu Y, Aref WG, Malluhi QM, Ouzzani M (2015) Efficient processing of hamming-distance-based similarity-search queries over MapReduce. In EDBT, pp 361–372 Tang M, Yu Y, Aref WG, Malluhi QM, Ouzzani M (2015) Efficient processing of hamming-distance-based similarity-search queries over MapReduce. In EDBT, pp 361–372
22.
Zurück zum Zitat Uyanık GK, Güler N (2013) A study on multiple linear regression analysis. Proc Soc Behav Sci 106:234–240CrossRef Uyanık GK, Güler N (2013) A study on multiple linear regression analysis. Proc Soc Behav Sci 106:234–240CrossRef
23.
Zurück zum Zitat Fischer MM (2015) Neural networks: a class of flexible non-linear models for regression and classification. In: Handbook of research methods and applications in economic geography. Edward Elgar Publishing Fischer MM (2015) Neural networks: a class of flexible non-linear models for regression and classification. In: Handbook of research methods and applications in economic geography. Edward Elgar Publishing
24.
Zurück zum Zitat Rabunal JR, Dorado J (Eds.) (2006) Artificial neural networks in real-life applications. IGI Global Rabunal JR, Dorado J (Eds.) (2006) Artificial neural networks in real-life applications. IGI Global
25.
Zurück zum Zitat Redmond M, Baveja A (2002) A data-driven software tool for enabling cooperative information sharing among police departments. Eur J Oper Res 141(3):660–678CrossRef Redmond M, Baveja A (2002) A data-driven software tool for enabling cooperative information sharing among police departments. Eur J Oper Res 141(3):660–678CrossRef
26.
Zurück zum Zitat Graf F, Kriegel HP, Schubert M, Pölsterl S, Cavallaro A (2011) 2D image registration in CT images using radial image descriptors. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, Heidelberg, pp 607–614 Graf F, Kriegel HP, Schubert M, Pölsterl S, Cavallaro A (2011) 2D image registration in CT images using radial image descriptors. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, Heidelberg, pp 607–614
27.
Zurück zum Zitat Mandalapu V, Elluri L, Vyas P, Roy N (2023) Crime prediction using machine learning and deep learning: a systematic review and future directions. IEEE Access Mandalapu V, Elluri L, Vyas P, Roy N (2023) Crime prediction using machine learning and deep learning: a systematic review and future directions. IEEE Access
28.
Zurück zum Zitat Adelman R, Reid LW, Markle G, Weiss S, Jaret C (2017) Urban crime rates and the changing face of immigration: evidence across four decades. J Ethn Crim Just 15(1):52–77 Adelman R, Reid LW, Markle G, Weiss S, Jaret C (2017) Urban crime rates and the changing face of immigration: evidence across four decades. J Ethn Crim Just 15(1):52–77
29.
Zurück zum Zitat Furuhashi S, Abe K, Takahashi M, Aizawa T, Shizukuishi T, Sakaguchi M, Sasaki Y (2009) A computer-assisted system for diagnostic workstations: automated bone labeling for CT images. J Digit Imag 22:689–695CrossRef Furuhashi S, Abe K, Takahashi M, Aizawa T, Shizukuishi T, Sakaguchi M, Sasaki Y (2009) A computer-assisted system for diagnostic workstations: automated bone labeling for CT images. J Digit Imag 22:689–695CrossRef
30.
Zurück zum Zitat Ng M (2016) Environmental factors associated with increased rat populations: a focused practice question Ng M (2016) Environmental factors associated with increased rat populations: a focused practice question
31.
Zurück zum Zitat Byers KA, Lee MJ, Patrick DM, Himsworth CG (2019) Rats about town: a systematic review of rat movement in urban ecosystems. Front Ecol Evol 7:13CrossRef Byers KA, Lee MJ, Patrick DM, Himsworth CG (2019) Rats about town: a systematic review of rat movement in urban ecosystems. Front Ecol Evol 7:13CrossRef
32.
Zurück zum Zitat Navarrete EJ, Rivas SB, Soriano RML (2015) Leptospirosis prevalence and associated factors in school children from Valle de Chalco-Solidaridad, State of Mexico. Int J Pediatr Res 1:8 Navarrete EJ, Rivas SB, Soriano RML (2015) Leptospirosis prevalence and associated factors in school children from Valle de Chalco-Solidaridad, State of Mexico. Int J Pediatr Res 1:8
33.
Zurück zum Zitat Tan WL, Soelar SA, Mohd Suan MA, Hussin N, Cheah WK, Verasahib K, Goh PP (2016) Leptospirosis incidence and mortality in Malaysia. Southeast Asian J Trop Med Public Health 47(3):434–40PubMed Tan WL, Soelar SA, Mohd Suan MA, Hussin N, Cheah WK, Verasahib K, Goh PP (2016) Leptospirosis incidence and mortality in Malaysia. Southeast Asian J Trop Med Public Health 47(3):434–40PubMed
34.
Zurück zum Zitat Mohamed-Hassan SN, Bahaman AR, Mutalib AR, Khairani-Bejo S (2012) Prevalence of pathogenic leptospires in rats from selected locations in peninsular Malaysia. Res J Animal Sci 6(1):12–25CrossRef Mohamed-Hassan SN, Bahaman AR, Mutalib AR, Khairani-Bejo S (2012) Prevalence of pathogenic leptospires in rats from selected locations in peninsular Malaysia. Res J Animal Sci 6(1):12–25CrossRef
35.
Zurück zum Zitat Ridzuan J, Aziah BD, Zahiruddin WM (2016) The occupational hazard study for leptospirosis among agriculture workers. Int J Collab Res Intern Med Public Health 8:MA13–MA22 Ridzuan J, Aziah BD, Zahiruddin WM (2016) The occupational hazard study for leptospirosis among agriculture workers. Int J Collab Res Intern Med Public Health 8:MA13–MA22
36.
Zurück zum Zitat Lemhadri I, Ruan F, Tibshirani R (2021) Lassonet: neural networks with feature sparsity. In: International conference on artificial intelligence and statistics, pp 10–18. PMLR Lemhadri I, Ruan F, Tibshirani R (2021) Lassonet: neural networks with feature sparsity. In: International conference on artificial intelligence and statistics, pp 10–18. PMLR
37.
Zurück zum Zitat Krakovska O, Christie G, Sixsmith A, Ester M, Moreno S (2019) Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets. Plos one 14(3):e0213584CrossRefPubMedPubMedCentral Krakovska O, Christie G, Sixsmith A, Ester M, Moreno S (2019) Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets. Plos one 14(3):e0213584CrossRefPubMedPubMedCentral
Metadaten
Titel
Supervised feature selection using principal component analysis
verfasst von
Fariq Rahmat
Zed Zulkafli
Asnor Juraiza Ishak
Ribhan Zafira Abdul Rahman
Simon De Stercke
Wouter Buytaert
Wardah Tahir
Jamalludin Ab Rahman
Salwa Ibrahim
Muhamad Ismail
Publikationsdatum
08.11.2023
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2024
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-023-01993-5

Weitere Artikel der Ausgabe 3/2024

Knowledge and Information Systems 3/2024 Zur Ausgabe

Premium Partner