2011 | OriginalPaper | Buchkapitel
TWO-CLASS Trees for Non-Parametric Regression Analysis
verfasst von : Roberta Siciliano, Massimo Aria
Erschienen in: Classification and Multivariate Analysis for Complex Data Structures
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
This paper shows that a regression tree problem can be turned into a classification tree problem reducing the computational cost and providing useful interpretation issues. A TWO-CLASS tree methodoloy for non-parametric regression analysis is introduced. Data are as follows: a numerical response variable and a set of predictors (of categorical and/or numerical type) are measured on a sample of objects, with no probability assumption. Thus a non-parametric approach is proposed. The concepts of prospective and retrospective splits are considered. Main idea is to grow a binary partition of the sample of objects such that, at each node of the tree structure, the numerical response is recoded into a dummy or two-class variable (called theoretical response) on the basis of the optimal partition of the objects into two groups within the set of retrospective splits. A two-stage splitting criterion with a fast algorithm is applied: the best split of the objects is found in the set of candidate (prospective) splits of each predictor modalities by maximizing the predictability of the two-class response. Some applications on real world cases and a simulation study allow to demonstrate that the two-class splitting procedure is computationally less intensive than standard regression tree such as CART. Furthermore, the final partitions obtained by the two-class procedure and the standard one are very similar to each other, in terms of percentage of objects belonging together to the same terminal node. Some aids to the interpretation allow to describe the response variable distribution in the terminal nodes.