Skip to main content

2016 | OriginalPaper | Buchkapitel

Feature Selection Methods Based on Decision Rule and Tree Models

verfasst von : Wiesław Paja

Erschienen in: Intelligent Decision Technologies 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Feature selection methods, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, a novel concepts of relevant feature selection based on information gathered from decision rule and decision tree models were introduced. A new measures DRQualityImp and DTLevelImp were additionally defined. The first one is based on feature presence frequency and rule quality, while the second is based on feature presence on different levels inside decision tree. The efficiency and effectiveness of that method is demonstrated through the exemplary use of five real-world datasets. Promising initial results of classification efficiency could be gained together with substantial reduction of problem dimensionality.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefMATH Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefMATH
2.
Zurück zum Zitat Bermingham, M.L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A.F., Wilson, J.F., Agakov, F., Navarro, P., Haley, C.S.: Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 5, (2015) Bermingham, M.L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A.F., Wilson, J.F., Agakov, F., Navarro, P., Haley, C.S.: Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 5, (2015)
3.
Zurück zum Zitat Phuong, T.M., Lin, Z., Altman, R.B.: Choosing SNPs using feature selection. In: Proceedings of 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005, pp. 301–309 (2005) Phuong, T.M., Lin, Z., Altman, R.B.: Choosing SNPs using feature selection. In: Proceedings of 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005, pp. 301–309 (2005)
4.
Zurück zum Zitat Paja, W., Wrzesien, M., Niemiec, R., Rudnicki, W.R.: Application of all-relevant feature selection for the failure analysis of parameter-induced simulation crashes in climate models. Geosci. Model Dev. 9, 1065–1072 (2016)CrossRef Paja, W., Wrzesien, M., Niemiec, R., Rudnicki, W.R.: Application of all-relevant feature selection for the failure analysis of parameter-induced simulation crashes in climate models. Geosci. Model Dev. 9, 1065–1072 (2016)CrossRef
5.
Zurück zum Zitat Zhu, Z., Ong, Y.S., Dash, M.: Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man, Cybern. Part B Cybern. 37, 70–76 (2007) Zhu, Z., Ong, Y.S., Dash, M.: Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man, Cybern. Part B Cybern. 37, 70–76 (2007)
6.
Zurück zum Zitat Nilsson, R., Peña, J.M., Björkegren, J., Tegnér, J.: Detecting multivariate differentially expressed genes. BMC Bioinf. 8, 150 (2007)CrossRef Nilsson, R., Peña, J.M., Björkegren, J., Tegnér, J.: Detecting multivariate differentially expressed genes. BMC Bioinf. 8, 150 (2007)CrossRef
7.
Zurück zum Zitat Rudnicki, W.R., Wrzesień, M., Paja, W.: All Relevant feature selection methods and applications. In: Stańczyk, U., Lakhmi, C.J. (eds.) Feature Selection for Data and Pattern Recognition, pp. 11–28. Springer-Verlag, Berlin Heidelberg, Berlin (2015)CrossRef Rudnicki, W.R., Wrzesień, M., Paja, W.: All Relevant feature selection methods and applications. In: Stańczyk, U., Lakhmi, C.J. (eds.) Feature Selection for Data and Pattern Recognition, pp. 11–28. Springer-Verlag, Berlin Heidelberg, Berlin (2015)CrossRef
8.
Zurück zum Zitat Greco, S., Słowinski, R., Stefanowski, J.: Evaluating importance of conditions in the set of discovered rules. In: RSFDGrC’07: Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Toronto, Ontario, Canada, pp. 314–321 (2007) Greco, S., Słowinski, R., Stefanowski, J.: Evaluating importance of conditions in the set of discovered rules. In: RSFDGrC’07: Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Toronto, Ontario, Canada, pp. 314–321 (2007)
9.
Zurück zum Zitat Sikora, M., Gruca, A.: Quality improvement of rules based gene groups descriptions using information about GO terms importance occurring in premises of determined rules. Int. J. Appl. Math. Comput. Sci. 20(3), 555–570 (2010)MathSciNetCrossRefMATH Sikora, M., Gruca, A.: Quality improvement of rules based gene groups descriptions using information about GO terms importance occurring in premises of determined rules. Int. J. Appl. Math. Comput. Sci. 20(3), 555–570 (2010)MathSciNetCrossRefMATH
10.
Zurück zum Zitat Stoppiglia, H., Dreyfus, G., Dubois, R., Oussar, Y.: Ranking a random feature for variable and feature selection. J. Mach. Learn. Res. 3, 1399–1414 (2003)MATH Stoppiglia, H., Dreyfus, G., Dubois, R., Oussar, Y.: Ranking a random feature for variable and feature selection. J. Mach. Learn. Res. 3, 1399–1414 (2003)MATH
11.
Zurück zum Zitat Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: International Symposium on Neural Networks, pp. 2181–2186 (2006) Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: International Symposium on Neural Networks, pp. 2181–2186 (2006)
12.
Zurück zum Zitat Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)CrossRef Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)CrossRef
13.
Zurück zum Zitat Hippe, Z.S., Bajcar, S., Blajdo, P., Grzymala-Busse, J.P., Grzymala-Busse, J.W., Knap, M., Paja, W., Wrzesien, M.: Diagnosing skin melanoma: current versus future directions. TASK Q. 7, 289–293 (2003) Hippe, Z.S., Bajcar, S., Blajdo, P., Grzymala-Busse, J.P., Grzymala-Busse, J.W., Knap, M., Paja, W., Wrzesien, M.: Diagnosing skin melanoma: current versus future directions. TASK Q. 7, 289–293 (2003)
14.
Zurück zum Zitat Hernández-Orallo, J., Flach, P., Ferri, C.: A unified view of performance metrics: translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 2813–2869 (2012)MathSciNetMATH Hernández-Orallo, J., Flach, P., Ferri, C.: A unified view of performance metrics: translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 2813–2869 (2012)MathSciNetMATH
Metadaten
Titel
Feature Selection Methods Based on Decision Rule and Tree Models
verfasst von
Wiesław Paja
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-39627-9_6