Skip to main content

Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees

  • Chapter
Advances in Knowledge Discovery and Management

Part of the book series: Studies in Computational Intelligence ((SCI,volume 292))

Abstract

The random forests method is one of the most successful ensemble methods. However, random forests do not have high performance when dealing with very-high-dimensional data in presence of dependencies. In this case one can expect that there exist many combinations between the variables and unfortunately the usual random forests method does not effectively exploit this situation. We here investigate a new approach for supervised classification with a huge number of numerical attributes. We propose a random oblique decision trees method. It consists of randomly choosing a subset of predictive attributes and it uses SVM as a split function of these attributes.We compare, on 25 datasets, the effectiveness with classical measures (e.g. precision, recall, F1-measure and accuracy) of random forests of random oblique decision trees with SVMs and random forests of C4.5. Our proposal has significant better performance on very-high-dimensional datasets with slightly better results on lower dimensional datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Amit, Y., Geman, D.: Shape Quantization and Recognition with Randomized Trees. Machine Learning 45(1), 5–32 (2001)

    Article  Google Scholar 

  • Asuncion, A., Newman, D.: UCI Repository of machine learning databases (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Bennett, K.P., Mangasarian, O.L.: Multicategory Discrimination via Linear Programming. Optimization Methods and Software 3, 27–39 (1994)

    Article  Google Scholar 

  • Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  • Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)

    MATH  Google Scholar 

  • Buntine, W.: Learning Classification Trees. Statistics and Computing 2, 63–73 (1992)

    Article  Google Scholar 

  • Carvalho, D., Freitas, A.: A hybrid decision tree/genetic algorithm method for data mining. Information Sciences 163(1-3), 13–35 (2004)

    Article  Google Scholar 

  • Chang, C.C., Lin, C.J.: LIBSVM – A Library for Support Vector Machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  • Cohen, S., Rokach, L., Maimon, O.: Decision-tree instance-space decomposition with grouped gain-ratio. Information Sciences 177(17), 3592–3612 (2007)

    Article  Google Scholar 

  • Cutler, A., Guohua, Z.: PERT – Perfect Random Tree Ensembles. Computing Science and Statistics 33, 490–497 (2001)

    Google Scholar 

  • Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Multiple Classifier Systems, pp. 1–15 (2000a)

    Google Scholar 

  • Dietterich, T.G.: An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40(2), 139–157 (2000b)

    Article  Google Scholar 

  • Do, T.-N., Lallich, S., Pham, N.-K., Lenca, P.: Un nouvel algorithme de forêts aléatoires d’arbres obliques particulièrement adapté à la classification de données en grandes dimensions. In: Ganascia, J.G., Gançarski, P. (eds.) Extraction et Gestion des Connaissances 2009, Strasbourg, France, pp. 79–90 (2009)

    Google Scholar 

  • Do, T.N., Poulet, F.: Classifying one Billion Data with a New Distributed SVM Algorithm. In: Proceedings RIVF-2006: the 4th IEEE International Conference on Computer Science, Research, Innovation and Vision for the Future, pp. 59–66 (2006)

    Google Scholar 

  • Freund, Y., Schapire, R.: A Decision-theoretic Generalization of On-line Learning and an Application to Boosting. In: Computational Learning Theory: Proceedings of the Second European Conference, pp. 23–37 (1995)

    Google Scholar 

  • Fung, G., Mangasarian, O.: Proximal Support Vector Classifiers. In: Proceedings KDD 2001: Knowledge Discovery and Data Mining, pp. 77–86 (2001)

    Google Scholar 

  • Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 63(1), 3–42 (2006)

    Article  MATH  Google Scholar 

  • Heath, D.: A Geometric Framework for Machine Learning. Ph.D. thesis, Johns Hopkins University, Baltimore (1992)

    Google Scholar 

  • Ho, T.K.: Random Decision Forest. In: Proceedings of the Third International Conference on Document Analysis and Recognition, pp. 278–282 (1995)

    Google Scholar 

  • Jinyan, L., Huiqing, L.: Kent Ridge Bio-medical Data Set Repository. Technical report (2002), http://datam.i2r.a-star.edu.sg/datasets/krbd/

  • Loh, W.-Y., Vanichsetakul, N.: Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association 83, 715–728 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  • Maji, P.: Efficient design of neural network tree using a new splitting criterion. Neurocomputing 71(4-6), 787–800 (2008)

    Article  Google Scholar 

  • Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood (1994)

    Google Scholar 

  • Murthy, S., Kasif, S., Salzberg, S.: A System for Induction of Oblique Decision Trees. Journal of Artificial Intelligence Research 2(1), 1–32 (1994)

    MATH  Google Scholar 

  • Murthy, S., Kasif, S., Salzberg, S., Beigel, R.: OC1: Randomized Induction of Oblique Decision Trees. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 322–327 (1993)

    Google Scholar 

  • Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  • Robnik-Sikonja, M.: Improving Random Forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004)

    Google Scholar 

  • Rokach, L., Maimon, O.: Top-Down Induction of Decision Trees Classifiers - A Survey. IEEE Transactions on Systems, Man and Cybernetics. Part C: Applications and Reviews 35(4), 476–487 (2005)

    Article  Google Scholar 

  • Simon, C., Meessen, J., De Vleeschouwer, C.: Embedding proximal support vectors into randomized trees. In: European Symposium on Artificial Neural Networks, Advances in Computational Intelligence and Learning, pp. 373–378 (2009)

    Google Scholar 

  • Suykens, J., Vandewalle, J.: Least Squares Support Vector Machines Classifiers. Neural Processing Letters 9(3), 293–300 (1999)

    Article  MathSciNet  Google Scholar 

  • van Rijsbergen, C.V.: Information Retrieval. Butterworth (1979)

    Google Scholar 

  • Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

  • Veropoulos, K.: Campbell, C. and Cristianini, N, Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55–60 (1999)

    Google Scholar 

  • Wolpert, D.: Stacked Generalization. Neural Networks 5, 241–259 (1992)

    Article  Google Scholar 

  • Wu, W., Bennett, K., Cristianini, N., Shawe-Taylor, J.: Large Margin Trees for Induction and Transduction. In: Proceedings of the Sixth International Conference on Machine Learning, pp. 474–483 (1999)

    Google Scholar 

  • Xu, Q., Pei, W., Yang, L., He, Z.: Support Vector Machine Tree Based on Feature Selection. In: King, I., Wang, J., Chan, L., Wang, D.L. (eds.) ICONIP 2006. LNCS, vol. 4232, pp. 856–863. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Yildiz, O., Alpaydin, E.: Linear Discriminant Trees. International Journal of Pattern Recognition and Artificial Intelligence 19(3), 323–353 (2005)

    Article  Google Scholar 

  • Zhou, Z.-H., Chen, Z.-Q.: Hybrid decision tree. Knowledge-Based Systems 15(8), 515–528 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Do, TN., Lenca, P., Lallich, S., Pham, NK. (2010). Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00580-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00580-0_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00579-4

  • Online ISBN: 978-3-642-00580-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics