Top

Published in:

2013 | OriginalPaper | Chapter

Factor Preselection and Multiple Measures of Dependence

Authors : Nina Büchel, Kay. F. Hildebrand, Ulrich Müller-Funk

Published in: Algorithms from and for Nature and Life

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Factor selection or factor reduction is carried out to reduce the complexity of a data analysis problems (classification, regression) or to improve the fit of a model (via parameter estimation). In data mining there are special needs for a process by which relevant factors of influence are identified in order to achieve a balance between bias and noise. Insurance companies, for example, face data sets that contain hundreds of attributes or factors per object. With a large number of factors, the selection procedure requires a suitable process model. A process like that becomes compelling once data analysis is to be (semi) automated.We suggest an approach that proceeds in two phases: In the first one, we cluster attributes that are highly correlated in order to identify factor combinations that—statistically speaking—are near duplicates. In the second phase, we choose factors from each cluster that are highly associated with a target variable. The implementation requires some form of non-linear canonical correlation analysis. We define a correlation measure for two blocks of factors that will be employed as a measure of similarity within the clustering process. Such measures, in turn, are based on multiple indices of dependence. Few indices have been introduced cf. Wolff (Stochastica 4(3):175–188, 1980), ‘Few indices have been introduced in the literature’. All of them, however, are hard to interpret if the number of dimensions considerably exceeds two. For that reason we come up with signed measures that can be interpreted in the usual way.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Regularization and Model Selection with Categorical Covariates

next chapter Intrablocks Correspondence Analysis

We use Event-driven Process Chains (EPC) as a modeling language. Details can be found in Becker and Schütte (1996).

Becker, J. & Schütte, R. (1996). Handelsinformationssysteme. Verl. Moderne Industrie, Landsberg/Lech.

Hall, M. A. (1999). Correlation-based feature subset selection for machine learning. PhD thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand.

Kiesl, H. (2003). Ordinale Streuungsmasse: Theoretische Fundierung und statistische Anwendung. PhD thesis, Universität Bamberg.

Renyi, A. (1958). On measures of dependence. Acta mathematica hungarica, 9, 441–451.

Rüschendorf, L. (1976). Asymptotic distributions of multivariate rank order statistics. The Annals of Statistics, 4, 912–923.MathSciNetMATHCrossRef

Rüschendorf, L. (2009). On the distributional transform, sklar’s theorem, and the empirical copula process. Journal of Statistical Planning and Inference, 139, 3921–3927.MathSciNetMATHCrossRef

Schmid, F., Blumentritt, T., Gaißer, S., Ruppert, M., & Schmidt, R. (2010). Copula-based measures of multivariate association. In F. Durante, W. Härdle, P. Jaworski, & T. Rychlik (Eds.), Workshop on copula theory and its applications, Warsaw. Berlin Heidelberg: Springer-Verlag.

Witting, H., & Müller-Funk, U. (1995). Mathematische Statistik II. Stuttgart: Teubner Verlag.MATHCrossRef

Wolff, E. F. (1980). N-dimensional measures of dependence. Stochastica, 4(3), 175–188.MathSciNetMATH

Title: Factor Preselection and Multiple Measures of Dependence
Authors: Nina Büchel
Kay. F. Hildebrand
Ulrich Müller-Funk
Publisher: Springer International Publishing
Book: Algorithms from and for Nature and Life
Print ISBN: 978-3-319-00034-3

Electronic ISBN: 978-3-319-00035-0

Copyright Year: 2013
DOI: https://doi.org/10.1007/978-3-319-00035-0_22

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner