Skip to main content
Top

2013 | OriginalPaper | Chapter

Factor Preselection and Multiple Measures of Dependence

Authors : Nina Büchel, Kay. F. Hildebrand, Ulrich Müller-Funk

Published in: Algorithms from and for Nature and Life

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Factor selection or factor reduction is carried out to reduce the complexity of a data analysis problems (classification, regression) or to improve the fit of a model (via parameter estimation). In data mining there are special needs for a process by which relevant factors of influence are identified in order to achieve a balance between bias and noise. Insurance companies, for example, face data sets that contain hundreds of attributes or factors per object. With a large number of factors, the selection procedure requires a suitable process model. A process like that becomes compelling once data analysis is to be (semi) automated.We suggest an approach that proceeds in two phases: In the first one, we cluster attributes that are highly correlated in order to identify factor combinations that—statistically speaking—are near duplicates. In the second phase, we choose factors from each cluster that are highly associated with a target variable. The implementation requires some form of non-linear canonical correlation analysis. We define a correlation measure for two blocks of factors that will be employed as a measure of similarity within the clustering process. Such measures, in turn, are based on multiple indices of dependence. Few indices have been introduced cf. Wolff (Stochastica 4(3):175–188, 1980), ‘Few indices have been introduced in the literature’. All of them, however, are hard to interpret if the number of dimensions considerably exceeds two. For that reason we come up with signed measures that can be interpreted in the usual way.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
We use Event-driven Process Chains (EPC) as a modeling language. Details can be found in Becker and Schütte (1996).
 
Literature
go back to reference Becker, J. & Schütte, R. (1996). Handelsinformationssysteme. Verl. Moderne Industrie, Landsberg/Lech. Becker, J. & Schütte, R. (1996). Handelsinformationssysteme. Verl. Moderne Industrie, Landsberg/Lech.
go back to reference Hall, M. A. (1999). Correlation-based feature subset selection for machine learning. PhD thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand. Hall, M. A. (1999). Correlation-based feature subset selection for machine learning. PhD thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand.
go back to reference Kiesl, H. (2003). Ordinale Streuungsmasse: Theoretische Fundierung und statistische Anwendung. PhD thesis, Universität Bamberg. Kiesl, H. (2003). Ordinale Streuungsmasse: Theoretische Fundierung und statistische Anwendung. PhD thesis, Universität Bamberg.
go back to reference Renyi, A. (1958). On measures of dependence. Acta mathematica hungarica, 9, 441–451. Renyi, A. (1958). On measures of dependence. Acta mathematica hungarica, 9, 441–451.
go back to reference Rüschendorf, L. (2009). On the distributional transform, sklar’s theorem, and the empirical copula process. Journal of Statistical Planning and Inference, 139, 3921–3927.MathSciNetMATHCrossRef Rüschendorf, L. (2009). On the distributional transform, sklar’s theorem, and the empirical copula process. Journal of Statistical Planning and Inference, 139, 3921–3927.MathSciNetMATHCrossRef
go back to reference Schmid, F., Blumentritt, T., Gaißer, S., Ruppert, M., & Schmidt, R. (2010). Copula-based measures of multivariate association. In F. Durante, W. Härdle, P. Jaworski, & T. Rychlik (Eds.), Workshop on copula theory and its applications, Warsaw. Berlin Heidelberg: Springer-Verlag. Schmid, F., Blumentritt, T., Gaißer, S., Ruppert, M., & Schmidt, R. (2010). Copula-based measures of multivariate association. In F. Durante, W. Härdle, P. Jaworski, & T. Rychlik (Eds.), Workshop on copula theory and its applications, Warsaw. Berlin Heidelberg: Springer-Verlag.
go back to reference Witting, H., & Müller-Funk, U. (1995). Mathematische Statistik II. Stuttgart: Teubner Verlag.MATHCrossRef Witting, H., & Müller-Funk, U. (1995). Mathematische Statistik II. Stuttgart: Teubner Verlag.MATHCrossRef
Metadata
Title
Factor Preselection and Multiple Measures of Dependence
Authors
Nina Büchel
Kay. F. Hildebrand
Ulrich Müller-Funk
Copyright Year
2013
DOI
https://doi.org/10.1007/978-3-319-00035-0_22

Premium Partner