Abstract
The random forests method is one of the most successful ensemble methods. However, random forests do not have high performance when dealing with very-high-dimensional data in presence of dependencies. In this case one can expect that there exist many combinations between the variables and unfortunately the usual random forests method does not effectively exploit this situation. We here investigate a new approach for supervised classification with a huge number of numerical attributes. We propose a random oblique decision trees method. It consists of randomly choosing a subset of predictive attributes and it uses SVM as a split function of these attributes.We compare, on 25 datasets, the effectiveness with classical measures (e.g. precision, recall, F1-measure and accuracy) of random forests of random oblique decision trees with SVMs and random forests of C4.5. Our proposal has significant better performance on very-high-dimensional datasets with slightly better results on lower dimensional datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amit, Y., Geman, D.: Shape Quantization and Recognition with Randomized Trees. Machine Learning 45(1), 5–32 (2001)
Asuncion, A., Newman, D.: UCI Repository of machine learning databases (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bennett, K.P., Mangasarian, O.L.: Multicategory Discrimination via Linear Programming. Optimization Methods and Software 3, 27–39 (1994)
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)
Buntine, W.: Learning Classification Trees. Statistics and Computing 2, 63–73 (1992)
Carvalho, D., Freitas, A.: A hybrid decision tree/genetic algorithm method for data mining. Information Sciences 163(1-3), 13–35 (2004)
Chang, C.C., Lin, C.J.: LIBSVM – A Library for Support Vector Machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cohen, S., Rokach, L., Maimon, O.: Decision-tree instance-space decomposition with grouped gain-ratio. Information Sciences 177(17), 3592–3612 (2007)
Cutler, A., Guohua, Z.: PERT – Perfect Random Tree Ensembles. Computing Science and Statistics 33, 490–497 (2001)
Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Multiple Classifier Systems, pp. 1–15 (2000a)
Dietterich, T.G.: An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40(2), 139–157 (2000b)
Do, T.-N., Lallich, S., Pham, N.-K., Lenca, P.: Un nouvel algorithme de forêts aléatoires d’arbres obliques particulièrement adapté à la classification de données en grandes dimensions. In: Ganascia, J.G., Gançarski, P. (eds.) Extraction et Gestion des Connaissances 2009, Strasbourg, France, pp. 79–90 (2009)
Do, T.N., Poulet, F.: Classifying one Billion Data with a New Distributed SVM Algorithm. In: Proceedings RIVF-2006: the 4th IEEE International Conference on Computer Science, Research, Innovation and Vision for the Future, pp. 59–66 (2006)
Freund, Y., Schapire, R.: A Decision-theoretic Generalization of On-line Learning and an Application to Boosting. In: Computational Learning Theory: Proceedings of the Second European Conference, pp. 23–37 (1995)
Fung, G., Mangasarian, O.: Proximal Support Vector Classifiers. In: Proceedings KDD 2001: Knowledge Discovery and Data Mining, pp. 77–86 (2001)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 63(1), 3–42 (2006)
Heath, D.: A Geometric Framework for Machine Learning. Ph.D. thesis, Johns Hopkins University, Baltimore (1992)
Ho, T.K.: Random Decision Forest. In: Proceedings of the Third International Conference on Document Analysis and Recognition, pp. 278–282 (1995)
Jinyan, L., Huiqing, L.: Kent Ridge Bio-medical Data Set Repository. Technical report (2002), http://datam.i2r.a-star.edu.sg/datasets/krbd/
Loh, W.-Y., Vanichsetakul, N.: Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association 83, 715–728 (1988)
Maji, P.: Efficient design of neural network tree using a new splitting criterion. Neurocomputing 71(4-6), 787–800 (2008)
Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood (1994)
Murthy, S., Kasif, S., Salzberg, S.: A System for Induction of Oblique Decision Trees. Journal of Artificial Intelligence Research 2(1), 1–32 (1994)
Murthy, S., Kasif, S., Salzberg, S., Beigel, R.: OC1: Randomized Induction of Oblique Decision Trees. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 322–327 (1993)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Robnik-Sikonja, M.: Improving Random Forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004)
Rokach, L., Maimon, O.: Top-Down Induction of Decision Trees Classifiers - A Survey. IEEE Transactions on Systems, Man and Cybernetics. Part C: Applications and Reviews 35(4), 476–487 (2005)
Simon, C., Meessen, J., De Vleeschouwer, C.: Embedding proximal support vectors into randomized trees. In: European Symposium on Artificial Neural Networks, Advances in Computational Intelligence and Learning, pp. 373–378 (2009)
Suykens, J., Vandewalle, J.: Least Squares Support Vector Machines Classifiers. Neural Processing Letters 9(3), 293–300 (1999)
van Rijsbergen, C.V.: Information Retrieval. Butterworth (1979)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Veropoulos, K.: Campbell, C. and Cristianini, N, Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55–60 (1999)
Wolpert, D.: Stacked Generalization. Neural Networks 5, 241–259 (1992)
Wu, W., Bennett, K., Cristianini, N., Shawe-Taylor, J.: Large Margin Trees for Induction and Transduction. In: Proceedings of the Sixth International Conference on Machine Learning, pp. 474–483 (1999)
Xu, Q., Pei, W., Yang, L., He, Z.: Support Vector Machine Tree Based on Feature Selection. In: King, I., Wang, J., Chan, L., Wang, D.L. (eds.) ICONIP 2006. LNCS, vol. 4232, pp. 856–863. Springer, Heidelberg (2006)
Yildiz, O., Alpaydin, E.: Linear Discriminant Trees. International Journal of Pattern Recognition and Artificial Intelligence 19(3), 323–353 (2005)
Zhou, Z.-H., Chen, Z.-Q.: Hybrid decision tree. Knowledge-Based Systems 15(8), 515–528 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Do, TN., Lenca, P., Lallich, S., Pham, NK. (2010). Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00580-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-00580-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00579-4
Online ISBN: 978-3-642-00580-0
eBook Packages: EngineeringEngineering (R0)