Abstract
Random forest (RF) is one of the most powerful ensemble classifiers often used in machine learning applications. It has been found successful on many benchmarked data. However, the performance of an RF model is highly affected by the calibration of the model parameters. It requires optimization of two parameters—(i) size of RF and (ii) number of features. RF is based on the principle of bagging and random selection of relevant features. This paper conveys an effective method in improving classification accuracy of RF. The principal component analysis (PCA) technique was used for dimension reduction of spectral bands whereas correlation-based feature selection (CFS) was used to identify the optimal set of features. RF was initialized by 10 random trees with an increment of 10, with a variable number of features till the model achieved its highest accuracy. The model was tested with variable sample sizes in order to observe the effectiveness. An investigation was carried out on Hyperion sensor data of the Earth Observing-1 (EO-1) satellite. The performance of RF was observed to be significantly enhanced in terms of predictive ability and computational expenses with the optimized set of features and number of random trees as base classifiers. While comparing with the other advanced classifiers like a support vector machine (SVM), multilayer perceptron (MLP) and maximum likelihood classifier (MLC), the optimized RF outperformed all the other classifiers.
Similar content being viewed by others
Notes
National Aeronautical and Space Application
ODs, training datasets with all the features
DRs, training datasets with the optimal set of features
References
Benediktsson JA, Sveinsson JR, Ersoy OK, Swain PH (1997) Parallel consensual neural networks. IEEE Trans Neural Netw 8:54–65
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Chutia D, Bhattacharyya DK, Kalita R, Sudhakar S (2014a) A model on achieving higher performance in the classification of hyperspectral satellite data: a case study on Hyperion data. Appl Geomat, (Springer) 6(3):181–195
Chutia D, Bhattacharyya DK, Kalita R, Sudhakar S (2014b) OBCsvmFS: object-based classification supported by support vector machine feature selection approach for hyperspectral data. J Geom 8(1):12–19
Chutia D, Bhattacharyya DK, Sarma KK, Kalita R, Sudhakar S (2016) Hyperspectral remote sensing classifications: a perspective survey. Trans GIS 20(4):463–490
García Adeva JJ, Ulises Cerviño B, Calvo RA (2005) Accuracy and diversity in ensembles of text Categorisers. CLEI J 8(2):1–12
Giacinto G, Roli F (1997) Ensembles of neural networks for soft classification of remote sensing images. Proc. of the European Symposium on Intelligent Techniques, March 20-21, Bari, Italy, pp 166-170
Gislason PO, Benediktsson JA, Sveinsson JR (2004) Random forest classification of multisource remote sensing and geographic data. Geoscience and Remote Sensing Symposium, IGARSS '04. Proceedings. IGARSS '04. Proceedings. 2004 IEEE International , vol.2, no., pp.1049,1052 vol.2, 20–24 Sep.
Green AA, Berman M, Switzer P, Craig MD (1988) A transformation for ordering multispectral data in terms of image quality with implications for noise removal. IEEE Trans Geosci Remote Sens 26(1):65–74
Hall MA (1999) Correlation-based feature subset selection for machine learning. Hamilton, New Zealand. PhD thesis
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Harris JR, Ponomarev P, Shang J, Rogge D (2006) Noise reduction and best band selection techniques for improving classification results using hyperspectral data: application to lithological mapping in Canada’s Arctic. Can J Remote Sens 32(5):341–354
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Howard A (1987) Elementary linear algebra 5e, 2nd edn. Wiley, Canada
Hsu PH (2007) Feature extraction of hyperspectral images using wavelet and matching pursuit. ISPRS J Photogramm Remote Sens 62(2):78–92
Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4):411–430
Jisoo H, Yangchi C, Crawford MM, Ghosh J (2005) Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans Geosci Remote Sens 43(3):492–501
Joevivek V, Hemalatha T, Soman KP (2009) Determining an efficient supervised classification method for hyperspectral image. In 2009 International Conference on Advances in Recent Technologies in Communication and Computing (pp 384-386). IEEE
Junshi X, Dalla Mura M, Chanussot J, Du P, He X (2015) Random subspace ensembles for hyperspectral image classification with extended morphological attribute profiles. IEEE Trans Geosci Remote Sens 53(9):4768–4786
Kohavi R (1995) Wrappers for performance enhancement and oblivious decision graphs, PhD thesis, Stanford University
Kohavi R, John G (1996) Wrappers for feature subset selection. Artif Intell Spec Issue Relevance 97(1–2):273–324
Krogh A, Vedelsby J (1995) Neural networks ensembles, cross validation and active learning. In: Touretzky DS, Tesauro G, Leen TK (eds) Advances in neural information processing systems, vol 7. MIT Press, Cambridge, pp 107–115
Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles. Mach Learn 51:181–207
Li W, Prasad S, Fowler JE, Bruce LM (2012) Locality-preserving discriminant analysis in kernel-induced feature spaces for hyperspectral image classification. IEEE Geosci Remote Sens Lett 8(5):894–898
Mader S, Vohland M, Jarmer T, Casper M (2006) Crop classification with hyperspectral data of the HyMap sensor using different feature extraction techniques. In 2nd Workshop of the EARSel SIG on Remote Sensing of Land Use & Land Cover, edited by M Braun (Bonn, Germany) (pp 96–101)
Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 1:169–198
Pal M (2003) Random forests for land cover classification. Geoscience and Remote Sensing Symposium, IGARSS '03. Proceedings. 2003 IEEE International, vol.6, no., pp 3510–3512, 21–25 July 2003. https://doi.org/10.1109/IGARSS.2003.1294837
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26:217–222
Piragnolo M, Masiero A, Pirotti F (2017) Open source R for applying machine learning to RPAS remote sensing images. Open Geospat Data Softw Stand 2(1):16
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39
Ross Q (1993) C4.5: Programs for machine learning, vol 16. Morgan Kaufmann Publishers, San Mateo, pp 235–240
Roy M, Routaray D, Ghosh S, Ghosh A (2014) Ensemble of multilayer perceptrons for change detection in remotely sensed images. IEEE Geosci Remote Sens Lett 11(1):49–53
Su H, Yang H, Du Q, Sheng Y (2011) Semisupervised band clustering for dimensionality reduction of hyperspectral imagery. IEEE Geosci Remote Sens Lett 8(6):1135–1139
Tao D, Xiaoou T, Xuelong L, Xindong W (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intel 28(7):1088–1099
Tremblay G (2004) Optimizing nearest neighbour in random subspaces using a multi-objective genetic algorithm. 17th International Conference on Pattern Recognition, pp 208–211
Waske B, Braun M (2009) Classifier ensembles for land cover mapping using multitemporal SAR imagery. ISPRS J Photogramm Remote Sens 64(5):450–457
Wei W, Du Q, Younan NH (2012) Fast supervised hyperspectral band selection using graphics processing unit. J Appl Remote Sens 6(1):061504
Yang C, Everitt JH, Johnson HB (2009) Applying image transformation and classification techniques to airborne hyperspectral imagery for mapping Ashe juniper infestations. Int J Remote Sens 30(11):2741–2758
Yoav F, Robert ES (1996) Experiments with a new boosting algorithm. In: Thirteenth International Conference on Machine Learning, San Francisco, p 148–156
Acknowledgements
The authors would like to thank the North Eastern Space Applications Centre, Department of Space, Government of India, Umiam, Meghalaya, India, for providing the necessary guidance and support during the study. The authors also acknowledge the concerned authorities of WEKA and ImageJ software for their important role in carrying out the investigation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chutia, D., Borah, N., Baruah, D. et al. An effective approach for improving the accuracy of a random forest classifier in the classification of Hyperion data. Appl Geomat 12, 95–105 (2020). https://doi.org/10.1007/s12518-019-00281-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12518-019-00281-8