Skip to main content
Top

2017 | OriginalPaper | Chapter

Feature Ranking for Multi-target Regression with Tree Ensemble Methods

Authors : Matej Petković, Sašo Džeroski, Dragi Kocev

Published in: Discovery Science

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this work, we address the task of feature ranking for multi-target regression (MTR). The task of MTR concerns problems where there are multiple continuous dependent variables and the goal is to learn a model for predicting all of the targets simultaneously. This task is receiving an increasing attention from the research community. However, performing feature ranking in the context of MTR has not been studied. Here, we propose three feature ranking methods for MTR: Symbolic, Genie3 and Random Forest. These methods are then coupled with three types of ensemble methods: Bagging, Random Forest, and Extremely Randomized Trees. All of the ensemble methods use predictive clustering trees (PCTs) as base predictive models. PCTs are a generalization of decision trees capable of MTR. In total, we consider eight different ensemble-ranking pairs. We extensively evaluate these pairs on 26 benchmark MTR datasets. The results reveal that all of the methods produce relevant feature rankings and that the best performing method is Genie3 ranking used with Random Forests of PCTs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57(1), 289–300 (1995)MathSciNetMATH Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57(1), 289–300 (1995)MathSciNetMATH
4.
go back to reference Blockeel, H.: Top-down induction of first order logical decision trees. Ph.D. thesis, Katholieke Universiteit Leuven, Leuven, Belgium (1998) Blockeel, H.: Top-down induction of first order logical decision trees. Ph.D. thesis, Katholieke Universiteit Leuven, Leuven, Belgium (1998)
5.
go back to reference Borchani, H., Varando, G., Bielza, C., Larrañaga, P.: A survey on multi-output regression. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 5(5), 216–233 (2015)CrossRef Borchani, H., Varando, G., Bielza, C., Larrañaga, P.: A survey on multi-output regression. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 5(5), 216–233 (2015)CrossRef
6.
go back to reference Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MATH Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MATH
8.
go back to reference Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)MATH Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)MATH
9.
go back to reference Brobbey, A.: Variable Selection in Multivariate Multiple Regression. Master’s thesis, Department of Mathematics and Statistics, Memorial University, Newfoundland and Labrador, Canada (2015) Brobbey, A.: Variable Selection in Multivariate Multiple Regression. Master’s thesis, Department of Mathematics and Statistics, Memorial University, Newfoundland and Labrador, Canada (2015)
10.
go back to reference Cunningham, P., Delany, S.J.: k-Nearest Neighbour Classifiers. Technical report 2, University College Dublin (2007) Cunningham, P., Delany, S.J.: k-Nearest Neighbour Classifiers. Technical report 2, University College Dublin (2007)
11.
go back to reference Demšar, D., Debeljak, M., Džeroski, S., Lavigne, C.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: Proceedings of the 9th Annual Meeting of the Ecological Society of America. p. 152 (2005) Demšar, D., Debeljak, M., Džeroski, S., Lavigne, C.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: Proceedings of the 9th Annual Meeting of the Ecological Society of America. p. 152 (2005)
12.
go back to reference Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH
13.
go back to reference Džeroski, S., Demšar, D., Grbović, J.: Predicting chemical parameters of river water quality from bioindicator data. Appl. Intell. 1(13), 7–17 (2000)CrossRef Džeroski, S., Demšar, D., Grbović, J.: Predicting chemical parameters of river water quality from bioindicator data. Appl. Intell. 1(13), 7–17 (2000)CrossRef
14.
go back to reference Geurts, P., Erns, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 36(1), 3–42 (2006)CrossRefMATH Geurts, P., Erns, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 36(1), 3–42 (2006)CrossRefMATH
15.
go back to reference Goovaerts, P.: Geostatistics for Natural Resources Evaluation. Oxford University Press, New York (1997) Goovaerts, P.: Geostatistics for Natural Resources Evaluation. Oxford University Press, New York (1997)
16.
go back to reference Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)CrossRef Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)CrossRef
17.
go back to reference Hatzikos, E.V., Tsoumakas, G., Tzanis, G., Nick, B., Vlahavas, I.P.: An empirical study on sea water quality prediction. Knowl. Based Syst. 21(6), 471–478 (2008)CrossRef Hatzikos, E.V., Tsoumakas, G., Tzanis, G., Nick, B., Vlahavas, I.P.: An empirical study on sea water quality prediction. Knowl. Based Syst. 21(6), 471–478 (2008)CrossRef
18.
go back to reference Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., Geurts, P.: Inferring regulatory networks from expression data using tree-based methods. PLoS One 5(9), 1–10 (2010)CrossRef Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., Geurts, P.: Inferring regulatory networks from expression data using tree-based methods. PLoS One 5(9), 1–10 (2010)CrossRef
19.
go back to reference Kampichler, C., Džeroski, S., Wieland, R.: Application of machine learning techniques to the analysis of soil ecological data bases: relationships between habitat features and Collembolan community characteristics. Soil Biol. Biochem. 32(2), 197–209 (2000)CrossRef Kampichler, C., Džeroski, S., Wieland, R.: Application of machine learning techniques to the analysis of soil ecological data bases: relationships between habitat features and Collembolan community characteristics. Soil Biol. Biochem. 32(2), 197–209 (2000)CrossRef
20.
21.
go back to reference Kocev, D., Džeroski, S., White, M., Newell, G., Griffioen, P.: Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol. Model. 220(8), 1159–1168 (2009)CrossRef Kocev, D., Džeroski, S., White, M., Newell, G., Griffioen, P.: Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol. Model. 220(8), 1159–1168 (2009)CrossRef
22.
go back to reference Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)CrossRef Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)CrossRef
23.
go back to reference Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)MathSciNetCrossRef Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)MathSciNetCrossRef
24.
go back to reference Stańczyk, U., Jain, L.C. (eds.): Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence. Springer, Heidelberg (2015) Stańczyk, U., Jain, L.C. (eds.): Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence. Springer, Heidelberg (2015)
25.
go back to reference Stojanova, D.: Estimating Forest Properties from Remotely Sensed Data by using Machine Learning. Master’s thesis, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia (2009) Stojanova, D.: Estimating Forest Properties from Remotely Sensed Data by using Machine Learning. Master’s thesis, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia (2009)
26.
go back to reference Stojanova, D., Panov, P., Gjorgjioski, V., Kobler, A., Džeroski, S.: Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inform. 5(4), 256–266 (2000)CrossRef Stojanova, D., Panov, P., Gjorgjioski, V., Kobler, A., Džeroski, S.: Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inform. 5(4), 256–266 (2000)CrossRef
27.
go back to reference Todorovski, L., Blockeel, H., Dzeroski, S.: Ranking with predictive clustering trees. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 444–455. Springer, Heidelberg (2002). doi:10.1007/3-540-36755-1_37 CrossRef Todorovski, L., Blockeel, H., Dzeroski, S.: Ranking with predictive clustering trees. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 444–455. Springer, Heidelberg (2002). doi:10.​1007/​3-540-36755-1_​37 CrossRef
28.
go back to reference Tsanas, A., Xifara, A.: Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build. 49, 560–567 (2012)CrossRef Tsanas, A., Xifara, A.: Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build. 49, 560–567 (2012)CrossRef
29.
go back to reference Yeh, I.C.: Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cem. Concr. Compos. 29, 474–480 (2007)CrossRef Yeh, I.C.: Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cem. Concr. Compos. 29, 474–480 (2007)CrossRef
Metadata
Title
Feature Ranking for Multi-target Regression with Tree Ensemble Methods
Authors
Matej Petković
Sašo Džeroski
Dragi Kocev
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-67786-6_13

Premium Partner