This paper shows that a regression tree problem can be turned into a classification tree problem reducing the computational cost and providing useful interpretation issues. A TWO-CLASS tree methodoloy for non-parametric regression analysis is introduced. Data are as follows: a numerical response variable and a set of predictors (of categorical and/or numerical type) are measured on a sample of objects, with no probability assumption. Thus a non-parametric approach is proposed. The concepts of prospective and retrospective splits are considered. Main idea is to grow a binary partition of the sample of objects such that, at each node of the tree structure, the numerical response is recoded into a dummy or two-class variable (called theoretical response) on the basis of the optimal partition of the objects into two groups within the set of retrospective splits. A two-stage splitting criterion with a fast algorithm is applied: the best split of the objects is found in the set of candidate (prospective) splits of each predictor modalities by maximizing the predictability of the two-class response. Some applications on real world cases and a simulation study allow to demonstrate that the two-class splitting procedure is computationally less intensive than standard regression tree such as CART. Furthermore, the final partitions obtained by the two-class procedure and the standard one are very similar to each other, in terms of percentage of objects belonging together to the same terminal node. Some aids to the interpretation allow to describe the response variable distribution in the terminal nodes.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- TWO-CLASS Trees for Non-Parametric Regression Analysis
- Springer Berlin Heidelberg
Neuer Inhalt/© ITandMEDIA, Product Lifecycle Management/© Eisenhans | vege | Fotolia