Skip to main content

2019 | OriginalPaper | Buchkapitel

Performance Evaluation and Analysis of Feature Selection Algorithms

verfasst von : Tanuja Pattanshetti, Vahida Attar

Erschienen in: Data Management, Analytics and Innovation

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Exorbitant data of huge dimensionality is generated because of wide application of technologies nowadays. Intent of using this data for decision-making is greatly affected because of the curse of dimensionality as selection of all features will lead to over-fitting and ignoring the relevant ones can lead to information loss. Feature selection algorithms help to overcome this problem by identifying the subset of original features by retaining relevant features and by removing the redundant ones. This paper aims to evaluate and analyze some of the most popular feature selection algorithms using different benchmarked datasets K-means Clustering, Relief, Relief-F, Random Forest (RF) algorithms are evaluated and analyzed in the form of combinations of different rankers and classifiers. It is observed empirically that the accuracy of the ranker and classifier varies from dataset to dataset. Novel concept of applying Multivariate co-relation analysis (MCA) for feature selection is made and results show improved performance over legacy based feature selection algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Pattanshetti, T., & Attar, V. (2017). Survey of Performance Modeling of big data applications. In 7th IEEE International Conference on Cloud Computing, Data Science and Engineering, Confluence-2017. Pattanshetti, T., & Attar, V. (2017). Survey of Performance Modeling of big data applications. In 7th IEEE International Conference on Cloud Computing, Data Science and Engineering, Confluence-2017.
2.
Zurück zum Zitat Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning, pp. 1157–1182. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning, pp. 1157–1182.
3.
Zurück zum Zitat Chandrashekar, G., & Sahin, F. (2013). A survey on feature selection methods (pp 16–28). Amsterdam: Elsevier. Chandrashekar, G., & Sahin, F. (2013). A survey on feature selection methods (pp 16–28). Amsterdam: Elsevier.
4.
Zurück zum Zitat Genuer, R., Poggi, J., & Tuleau-Malot, C. (2010). Variable selection using random forests. Pattern Recognition, 31 (14), 2225–2236. Genuer, R., Poggi, J., & Tuleau-Malot, C. (2010). Variable selection using random forests. Pattern Recognition, 31 (14), 2225–2236.
5.
Zurück zum Zitat Wang, S., Tang, J., & Liu, H. (2015). Embedded unsupervised feature selection. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Wang, S., Tang, J., & Liu, H. (2015). Embedded unsupervised feature selection. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence.
6.
Zurück zum Zitat Mitra, P., Murthy, C., & Pal, S. K. (2002). Unsupervised feature selection using feature similarity. IEEE Transaction Pattern Analysis Machine Intelligence, 24(4). Mitra, P., Murthy, C., & Pal, S. K. (2002). Unsupervised feature selection using feature similarity. IEEE Transaction Pattern Analysis Machine Intelligence, 24(4).
7.
Zurück zum Zitat Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, pp 273–324. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, pp 273–324.
8.
Zurück zum Zitat Caruana, R. A., & Freitag, D. (1994) Greedy attribute selection. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 28–36). Caruana, R. A., & Freitag, D. (1994) Greedy attribute selection. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 28–36).
9.
Zurück zum Zitat Kira, K., & Rendell, L. (1992). The feature selection problem: Traditional methods and a new algorithm. In AAAI Proceedings. Kira, K., & Rendell, L. (1992). The feature selection problem: Traditional methods and a new algorithm. In AAAI Proceedings.
10.
Zurück zum Zitat Almuallim, H., & Dietterich, T. G. (1991). Learning with many irrelevant features”, Proceedings of the Ninth National Conference on Artificial Intelligence, San Jose, CA: AAAI Press, pp. 547–552,1991. Almuallim, H., & Dietterich, T. G. (1991). Learning with many irrelevant features”, Proceedings of the Ninth National Conference on Artificial Intelligence, San Jose, CA: AAAI Press, pp. 547–552,1991.
11.
Zurück zum Zitat Kira, K., & Rendell, L. A. (1999). A practical approach to Feature Selection. In 9th International Conference on Machine Learning (pp. 249–256). Kira, K., & Rendell, L. A. (1999). A practical approach to Feature Selection. In 9th International Conference on Machine Learning (pp. 249–256).
12.
Zurück zum Zitat Robnik-Šikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of Relief-F and R-Relief-F. Machine Learning, 53, 23–69.CrossRef Robnik-Šikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of Relief-F and R-Relief-F. Machine Learning, 53, 23–69.CrossRef
13.
Zurück zum Zitat Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF. In European Conference on Machine Learning (pp. 171–182). Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF. In European Conference on Machine Learning (pp. 171–182).
14.
Zurück zum Zitat Sun, Y. (2007). Iterative RELIEF for feature weighting: Algorithms, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6). Sun, Y. (2007). Iterative RELIEF for feature weighting: Algorithms, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6).
15.
Zurück zum Zitat Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast co-relation-based filter solution. In Proceedings of the Twentieth International Conference on Machine Learning. Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast co-relation-based filter solution. In Proceedings of the Twentieth International Conference on Machine Learning.
16.
Zurück zum Zitat Duch, W., & Biesiada, J. (2005). Feature selection for high-dimensional data: A Kolmogorov-Smirnov co-relation-based filter solution. In Advances in soft computing (pp. 95–104). Berlin: Springer. Duch, W., & Biesiada, J. (2005). Feature selection for high-dimensional data: A Kolmogorov-Smirnov co-relation-based filter solution. In Advances in soft computing (pp. 95–104). Berlin: Springer.
17.
Zurück zum Zitat Moore, A. W., & Lee, M. S. (1994). Efficient algorithms for minimizing cross validation error. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 190–198). Moore, A. W., & Lee, M. S. (1994). Efficient algorithms for minimizing cross validation error. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 190–198).
18.
Zurück zum Zitat Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., & Levy, S. (2005). A comprehensive evaluation of multi-category classification methods for microarray gene expression cancer diagnosis. Bioinformatics, pp 631–643. Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., & Levy, S. (2005). A comprehensive evaluation of multi-category classification methods for microarray gene expression cancer diagnosis. Bioinformatics, pp 631–643.
19.
Zurück zum Zitat Vapnik, V. (1998). The nature of statistical learning (2nd ed.). New York: Springer.MATH Vapnik, V. (1998). The nature of statistical learning (2nd ed.). New York: Springer.MATH
20.
Zurück zum Zitat Gilad-Bachrach, R., Navot, A., & Tishby, N. (2004). Margin based feature selection—Theory and algorithms. In 21st International Conference on Machine Learning. Gilad-Bachrach, R., Navot, A., & Tishby, N. (2004). Margin based feature selection—Theory and algorithms. In 21st International Conference on Machine Learning.
21.
Zurück zum Zitat Koller, D., Sahami, M. (1996). Toward optimal feature selection. In International Conference on Machine Learning (pp 284–292). Koller, D., Sahami, M. (1996). Toward optimal feature selection. In International Conference on Machine Learning (pp 284–292).
22.
Zurück zum Zitat Langley, P., & Iba, W. (1993). Average-case analysis of a nearest neighbor algorithm. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence Chambery, France (pp. 889–894). Langley, P., & Iba, W. (1993). Average-case analysis of a nearest neighbor algorithm. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence Chambery, France (pp. 889–894).
Metadaten
Titel
Performance Evaluation and Analysis of Feature Selection Algorithms
verfasst von
Tanuja Pattanshetti
Vahida Attar
Copyright-Jahr
2019
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-1402-5_4

Neuer Inhalt