Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees

Do, Thanh-Nghi; Lenca, Philippe; Lallich, Stéphane; Pham, Nguyen-Khang

doi:10.1007/978-3-642-00580-0_3

Thanh-Nghi Do^5,6,
Philippe Lenca⁵,
Stéphane Lallich⁷ &
…
Nguyen-Khang Pham^6,8

Part of the book series: Studies in Computational Intelligence ((SCI,volume 292))

1236 Accesses
23 Citations
1 Altmetric

Abstract

The random forests method is one of the most successful ensemble methods. However, random forests do not have high performance when dealing with very-high-dimensional data in presence of dependencies. In this case one can expect that there exist many combinations between the variables and unfortunately the usual random forests method does not effectively exploit this situation. We here investigate a new approach for supervised classification with a huge number of numerical attributes. We propose a random oblique decision trees method. It consists of randomly choosing a subset of predictive attributes and it uses SVM as a split function of these attributes.We compare, on 25 datasets, the effectiveness with classical measures (e.g. precision, recall, F1-measure and accuracy) of random forests of random oblique decision trees with SVMs and random forests of C4.5. Our proposal has significant better performance on very-high-dimensional datasets with slightly better results on lower dimensional datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amit, Y., Geman, D.: Shape Quantization and Recognition with Randomized Trees. Machine Learning 45(1), 5–32 (2001)
Article Google Scholar
Asuncion, A., Newman, D.: UCI Repository of machine learning databases (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bennett, K.P., Mangasarian, O.L.: Multicategory Discrimination via Linear Programming. Optimization Methods and Software 3, 27–39 (1994)
Article Google Scholar
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)
MATH Google Scholar
Buntine, W.: Learning Classification Trees. Statistics and Computing 2, 63–73 (1992)
Article Google Scholar
Carvalho, D., Freitas, A.: A hybrid decision tree/genetic algorithm method for data mining. Information Sciences 163(1-3), 13–35 (2004)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM – A Library for Support Vector Machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cohen, S., Rokach, L., Maimon, O.: Decision-tree instance-space decomposition with grouped gain-ratio. Information Sciences 177(17), 3592–3612 (2007)
Article Google Scholar
Cutler, A., Guohua, Z.: PERT – Perfect Random Tree Ensembles. Computing Science and Statistics 33, 490–497 (2001)
Google Scholar
Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Multiple Classifier Systems, pp. 1–15 (2000a)
Google Scholar
Dietterich, T.G.: An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40(2), 139–157 (2000b)
Article Google Scholar
Do, T.-N., Lallich, S., Pham, N.-K., Lenca, P.: Un nouvel algorithme de forêts aléatoires d’arbres obliques particulièrement adapté à la classification de données en grandes dimensions. In: Ganascia, J.G., Gançarski, P. (eds.) Extraction et Gestion des Connaissances 2009, Strasbourg, France, pp. 79–90 (2009)
Google Scholar
Do, T.N., Poulet, F.: Classifying one Billion Data with a New Distributed SVM Algorithm. In: Proceedings RIVF-2006: the 4th IEEE International Conference on Computer Science, Research, Innovation and Vision for the Future, pp. 59–66 (2006)
Google Scholar
Freund, Y., Schapire, R.: A Decision-theoretic Generalization of On-line Learning and an Application to Boosting. In: Computational Learning Theory: Proceedings of the Second European Conference, pp. 23–37 (1995)
Google Scholar
Fung, G., Mangasarian, O.: Proximal Support Vector Classifiers. In: Proceedings KDD 2001: Knowledge Discovery and Data Mining, pp. 77–86 (2001)
Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 63(1), 3–42 (2006)
Article MATH Google Scholar
Heath, D.: A Geometric Framework for Machine Learning. Ph.D. thesis, Johns Hopkins University, Baltimore (1992)
Google Scholar
Ho, T.K.: Random Decision Forest. In: Proceedings of the Third International Conference on Document Analysis and Recognition, pp. 278–282 (1995)
Google Scholar
Jinyan, L., Huiqing, L.: Kent Ridge Bio-medical Data Set Repository. Technical report (2002), http://datam.i2r.a-star.edu.sg/datasets/krbd/
Loh, W.-Y., Vanichsetakul, N.: Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association 83, 715–728 (1988)
Article MATH MathSciNet Google Scholar
Maji, P.: Efficient design of neural network tree using a new splitting criterion. Neurocomputing 71(4-6), 787–800 (2008)
Article Google Scholar
Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood (1994)
Google Scholar
Murthy, S., Kasif, S., Salzberg, S.: A System for Induction of Oblique Decision Trees. Journal of Artificial Intelligence Research 2(1), 1–32 (1994)
MATH Google Scholar
Murthy, S., Kasif, S., Salzberg, S., Beigel, R.: OC1: Randomized Induction of Oblique Decision Trees. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 322–327 (1993)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Robnik-Sikonja, M.: Improving Random Forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004)
Google Scholar
Rokach, L., Maimon, O.: Top-Down Induction of Decision Trees Classifiers - A Survey. IEEE Transactions on Systems, Man and Cybernetics. Part C: Applications and Reviews 35(4), 476–487 (2005)
Article Google Scholar
Simon, C., Meessen, J., De Vleeschouwer, C.: Embedding proximal support vectors into randomized trees. In: European Symposium on Artificial Neural Networks, Advances in Computational Intelligence and Learning, pp. 373–378 (2009)
Google Scholar
Suykens, J., Vandewalle, J.: Least Squares Support Vector Machines Classifiers. Neural Processing Letters 9(3), 293–300 (1999)
Article MathSciNet Google Scholar
van Rijsbergen, C.V.: Information Retrieval. Butterworth (1979)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
MATH Google Scholar
Veropoulos, K.: Campbell, C. and Cristianini, N, Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55–60 (1999)
Google Scholar
Wolpert, D.: Stacked Generalization. Neural Networks 5, 241–259 (1992)
Article Google Scholar
Wu, W., Bennett, K., Cristianini, N., Shawe-Taylor, J.: Large Margin Trees for Induction and Transduction. In: Proceedings of the Sixth International Conference on Machine Learning, pp. 474–483 (1999)
Google Scholar
Xu, Q., Pei, W., Yang, L., He, Z.: Support Vector Machine Tree Based on Feature Selection. In: King, I., Wang, J., Chan, L., Wang, D.L. (eds.) ICONIP 2006. LNCS, vol. 4232, pp. 856–863. Springer, Heidelberg (2006)
Chapter Google Scholar
Yildiz, O., Alpaydin, E.: Linear Discriminant Trees. International Journal of Pattern Recognition and Artificial Intelligence 19(3), 323–353 (2005)
Article Google Scholar
Zhou, Z.-H., Chen, Z.-Q.: Hybrid decision tree. Knowledge-Based Systems 15(8), 515–528 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institut Telecom; Telecom Bretagne, UMR CNRS 3192 Lab-STICC, Université européenne de Bretagne, France
Thanh-Nghi Do & Philippe Lenca
Can Tho University, Vietnam
Thanh-Nghi Do & Nguyen-Khang Pham
Laboratoire ERIC, Université de Lyon, Lyon 2, France
Stéphane Lallich
IRISA, Rennes, France
Nguyen-Khang Pham

Authors

Thanh-Nghi Do
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Lenca
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Lallich
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen-Khang Pham
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Polytechnic School of Nantes University, Nantes, France
Fabrice Guillet & Henri Briand &
Université de Genève, Genève, Switzerland
Gilbert Ritschard
Université Lumi‘́ere Lyon 2, Bron, France
Djamel Abdelkader Zighed

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Do, TN., Lenca, P., Lallich, S., Pham, NK. (2010). Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00580-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-00580-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00579-4
Online ISBN: 978-3-642-00580-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics