Skip to main content
Top
Published in:
Cover of the book

2016 | OriginalPaper | Chapter

A New Fuzzy-Rough Hybrid Merit to Feature Selection

Authors : Javad Rahimipour Anaraki, Saeed Samet, Wolfgang Banzhaf, Mahdi Eftekhari

Published in: Transactions on Rough Sets XX

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Feature selecting is considered as one of the most important pre-process methods in machine learning, data mining and bioinformatics. By applying pre-process techniques, we can defy the curse of dimensionality by reducing computational and storage costs, facilitate data understanding and visualization, and diminish training and testing times, leading to overall performance improvement, especially when dealing with large datasets. Correlation feature selection method uses a conventional merit to evaluate different feature subsets. In this paper, we propose a new merit by adapting and employing of correlation feature selection in conjunction with fuzzy-rough feature selection, to improve the effectiveness and quality of the conventional methods. It also outperforms the newly introduced gradient boosted feature selection, by selecting more relevant and less redundant features. The two-step experimental results show the applicability and efficiency of our proposed method over some well known and mostly used datasets, as well as newly introduced ones, especially from the UCI collection with various sizes from small to large numbers of features and samples.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Hall, M.A., Smith, L.A.: Feature subset selection: a correlation based filter approach. In: Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems, New Zealand, pp. 855–858 (1997) Hall, M.A., Smith, L.A.: Feature subset selection: a correlation based filter approach. In: Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems, New Zealand, pp. 855–858 (1997)
2.
go back to reference Javed, K., Babri, H.A., Saeed, M.: Feature selection based on class-dependent densities for high-dimensional binary data. IEEE Trans. Knowl. Data Eng. 24, 465–477 (2012)CrossRef Javed, K., Babri, H.A., Saeed, M.: Feature selection based on class-dependent densities for high-dimensional binary data. IEEE Trans. Knowl. Data Eng. 24, 465–477 (2012)CrossRef
3.
go back to reference Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefMATH Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefMATH
4.
go back to reference Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: ICML, vol. 1, pp. 74–81. Citeseer (2001) Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: ICML, vol. 1, pp. 74–81. Citeseer (2001)
5.
go back to reference Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: AAAI, pp. 129–134 (1992) Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: AAAI, pp. 129–134 (1992)
6.
go back to reference Jensen, R., Shen, Q.: New approaches to fuzzy-rough feature selection. IEEE Trans. Fuzzy Syst. 17, 824–838 (2009)CrossRef Jensen, R., Shen, Q.: New approaches to fuzzy-rough feature selection. IEEE Trans. Fuzzy Syst. 17, 824–838 (2009)CrossRef
7.
go back to reference Anaraki, J.R., Eftekhari, M., Ahn, C.W.: Novel improvements on the fuzzy-rough quickreduct algorithm. IEICE Trans. Inf. Syst. E98.D(2), 453–456 (2015)CrossRef Anaraki, J.R., Eftekhari, M., Ahn, C.W.: Novel improvements on the fuzzy-rough quickreduct algorithm. IEICE Trans. Inf. Syst. E98.D(2), 453–456 (2015)CrossRef
8.
go back to reference Anaraki, J.R., Eftekhari, M.: Improving fuzzy-rough quick reduct for feature selection. In: 2011 19th Iranian Conference on Electrical Engineering (ICEE), pp. 1502–1506 (2011) Anaraki, J.R., Eftekhari, M.: Improving fuzzy-rough quick reduct for feature selection. In: 2011 19th Iranian Conference on Electrical Engineering (ICEE), pp. 1502–1506 (2011)
9.
go back to reference Qian, Y., Wang, Q., Cheng, H., Liang, J., Dang, C.: Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst. 258, 61–78 (2015). Special issue: Uncertainty in Learning from Big DataMathSciNetCrossRefMATH Qian, Y., Wang, Q., Cheng, H., Liang, J., Dang, C.: Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst. 258, 61–78 (2015). Special issue: Uncertainty in Learning from Big DataMathSciNetCrossRefMATH
10.
go back to reference Jensen, R., Vluymans, S., Parthaláin, N.M., Cornelis, C., Saeys, Y.: Semi-supervised fuzzy-rough feature selection. In: Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J.W. (eds.) RSFDGrC 2015. LNCS (LNAI), vol. 9437, pp. 185–195. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25783-9_17 CrossRef Jensen, R., Vluymans, S., Parthaláin, N.M., Cornelis, C., Saeys, Y.: Semi-supervised fuzzy-rough feature selection. In: Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J.W. (eds.) RSFDGrC 2015. LNCS (LNAI), vol. 9437, pp. 185–195. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-25783-9_​17 CrossRef
11.
go back to reference Shang, C., Barnes, D.: Fuzzy-rough feature selection aided support vector machines for mars image classification. Comput. Vis. Image Underst. 117, 202–213 (2013)CrossRef Shang, C., Barnes, D.: Fuzzy-rough feature selection aided support vector machines for mars image classification. Comput. Vis. Image Underst. 117, 202–213 (2013)CrossRef
12.
go back to reference Derrac, J., Verbiest, N., García, S., Cornelis, C., Herrera, F.: On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection. Soft Comput. 17, 223–238 (2012)CrossRef Derrac, J., Verbiest, N., García, S., Cornelis, C., Herrera, F.: On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection. Soft Comput. 17, 223–238 (2012)CrossRef
13.
go back to reference Dai, J., Xu, Q.: Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl. Soft Comput. 13, 211–221 (2013)CrossRef Dai, J., Xu, Q.: Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl. Soft Comput. 13, 211–221 (2013)CrossRef
14.
go back to reference Xu, Z., Huang, G., Weinberger, K.Q., Zheng, A.X.: Gradient boosted feature selection. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 522–531. ACM (2014) Xu, Z., Huang, G., Weinberger, K.Q., Zheng, A.X.: Gradient boosted feature selection. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 522–531. ACM (2014)
15.
go back to reference Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetMATH Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetMATH
17.
go back to reference Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: a tutorial. In: Pal, S.K., Skowron, A. (eds.) Rough-Fuzzy Hybridization: A New Trend in Decision Making, pp. 3–98. Springer-Verlag New York, Inc., Secaucus (1998) Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: a tutorial. In: Pal, S.K., Skowron, A. (eds.) Rough-Fuzzy Hybridization: A New Trend in Decision Making, pp. 3–98. Springer-Verlag New York, Inc., Secaucus (1998)
19.
go back to reference Boln-Canedo, V., Snchez-Maroo, N., Alonso-Betanzos, A.: Feature Selection for High-Dimensional Data. Springer, Switzerland (2016) Boln-Canedo, V., Snchez-Maroo, N., Alonso-Betanzos, A.: Feature Selection for High-Dimensional Data. Springer, Switzerland (2016)
20.
go back to reference John, G.H., Kohavi, R., Pfleger, K., et al.: Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference, pp. 121–129 (1994) John, G.H., Kohavi, R., Pfleger, K., et al.: Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference, pp. 121–129 (1994)
21.
go back to reference Kim, G., Kim, Y., Lim, H., Kim, H.: An mlp-based feature subset selection for HIV-1 protease cleavage site analysis. Artif. Intell. Med. 48, 83–89 (2010). Artificial Intelligence in Biomedical Engineering and InformaticsCrossRef Kim, G., Kim, Y., Lim, H., Kim, H.: An mlp-based feature subset selection for HIV-1 protease cleavage site analysis. Artif. Intell. Med. 48, 83–89 (2010). Artificial Intelligence in Biomedical Engineering and InformaticsCrossRef
22.
go back to reference Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press, New York (1984)MATH Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press, New York (1984)MATH
23.
go back to reference Wnek, J., Michalski, R.S.: Comparing symbolic and subsymbolic learning: three studies. Mach. Learn. A Multistrategy Approach 4, 318–362 (1994) Wnek, J., Michalski, R.S.: Comparing symbolic and subsymbolic learning: three studies. Mach. Learn. A Multistrategy Approach 4, 318–362 (1994)
24.
go back to reference Zhu, Z., Ong, Y.S., Zurada, J.M.: Identification of full and partial class relevant genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 7, 263–277 (2010)CrossRef Zhu, Z., Ong, Y.S., Zurada, J.M.: Identification of full and partial class relevant genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 7, 263–277 (2010)CrossRef
25.
go back to reference Bache, K., Lichman, M.: UCI machine learning repository (2013) Bache, K., Lichman, M.: UCI machine learning repository (2013)
26.
go back to reference Zieba, M., Tomczak, J.M., Lubicz, M., Swiatek, J.: Boosted svm for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl. Soft Comput. 14, 99–108 (2014)CrossRef Zieba, M., Tomczak, J.M., Lubicz, M., Swiatek, J.: Boosted svm for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl. Soft Comput. 14, 99–108 (2014)CrossRef
27.
go back to reference Lucas, D.D., Klein, R., Tannahill, J., Ivanova, D., Brandon, S., Domyancic, D., Zhang, Y.: Failure analysis of parameter-induced simulation crashes in climate models. Geoscientific Model Devel. 6, 1157–1171 (2013)CrossRef Lucas, D.D., Klein, R., Tannahill, J., Ivanova, D., Brandon, S., Domyancic, D., Zhang, Y.: Failure analysis of parameter-induced simulation crashes in climate models. Geoscientific Model Devel. 6, 1157–1171 (2013)CrossRef
28.
go back to reference Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 47, 547–553 (2009)CrossRef Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 47, 547–553 (2009)CrossRef
29.
go back to reference Tsanas, A., Little, M., Fox, C., Ramig, L.: Objective automatic assessment of rehabilitative speech treatment in parkinson’s disease. IEEE Trans. Neural Syst. Rehabil. Eng. 22, 181–190 (2014)CrossRef Tsanas, A., Little, M., Fox, C., Ramig, L.: Objective automatic assessment of rehabilitative speech treatment in parkinson’s disease. IEEE Trans. Neural Syst. Rehabil. Eng. 22, 181–190 (2014)CrossRef
30.
go back to reference Sikora, M., Wróbel, Ł.: Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Arch. Min. Sci. 55, 91–114 (2010) Sikora, M., Wróbel, Ł.: Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Arch. Min. Sci. 55, 91–114 (2010)
31.
go back to reference Putten, P.V.D., Someren, M.V.: Coil challenge 2000: the insurance company case. Technical report 2000–2009. Leiden Institute of Advanced Computer Science, Universiteit van Leiden (2000) Putten, P.V.D., Someren, M.V.: Coil challenge 2000: the insurance company case. Technical report 2000–2009. Leiden Institute of Advanced Computer Science, Universiteit van Leiden (2000)
32.
go back to reference Manikandan, S.: Measures of central tendency: the mean. J. Pharmacol. Pharmacotherapeutics 2, 140 (2011)CrossRef Manikandan, S.: Measures of central tendency: the mean. J. Pharmacol. Pharmacotherapeutics 2, 140 (2011)CrossRef
33.
go back to reference Alcala-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., Garcia, S.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic Soft Comput. 17, 255–287 (2011) Alcala-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., Garcia, S.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic Soft Comput. 17, 255–287 (2011)
34.
go back to reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRef
35.
go back to reference Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, pp. 545–552 (2004) Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, pp. 545–552 (2004)
36.
go back to reference Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.J., Sandhu, S., Guppy, K.H., Lee, S., Froelicher, V.: International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64, 304–310 (1989)CrossRef Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.J., Sandhu, S., Guppy, K.H., Lee, S., Froelicher, V.: International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64, 304–310 (1989)CrossRef
Metadata
Title
A New Fuzzy-Rough Hybrid Merit to Feature Selection
Authors
Javad Rahimipour Anaraki
Saeed Samet
Wolfgang Banzhaf
Mahdi Eftekhari
Copyright Year
2016
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-53611-7_1

Premium Partner