ABSTRACT
The Random Forest (RF) classifiers are suitable for dealing with the high dimensional noisy data in text classification. An RF model comprises a set of decision trees each of which is trained using random subsets of features. Given an instance, the prediction by the RF is obtained via majority voting of the predictions of all the trees in the forest. However, different test instances would have different values for the features used in the trees and the trees should contribute differently to the predictions. This diverse contribution of the trees is not considered in traditional RFs. Many approaches have been proposed to model the diverse contributions by selecting a subset of trees for each instance. This paper is among these approaches. It proposes a Semantics Aware Random Forest (SARF) classifier. SARF extracts the features used by trees to generate the predictions and selects a subset of the predictions for which the features are relevant to the predicted classes. We evaluated SARF's classification performance on $30$ real-world text datasets and assessed its competitiveness with state-of-the-art ensemble selection methods. The results demonstrate the superior performance of the proposed approach in textual information retrieval and initiate a new direction of research to utilise interpretability of classifiers.
- Dhammika Amaratunga, Javier Cabrera, and Yung-Seop Lee. 2008. Enriched random forests. Bioinformatics , Vol. 24, 18 (2008), 2010--2014.Google ScholarDigital Library
- David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Mü ller. 2010. How to Explain Individual Classification Decisions. JMLR , Vol. 11 (2010).Google Scholar
- Simon Bernard, Laurent Heutte, and Sébastien Adam. 2008. On the selection of decision trees in random forests. International Joint Conference on Neural Networks (2008), 302--307.Google Scholar
- Leo Breiman. 2001. Random forests. Machine learning , Vol. 45, 1 (2001), 5--32.Google ScholarDigital Library
- Alceu S Britto Jr, Robert Sabourin, and Luiz ES Oliveira. 2014. Dynamic selection of classifiers a comprehensive review. Pattern Recognition , Vol. 47, 11 (2014).Google Scholar
- Jose Camacho-Collados and Mohammad Taher Pilehvar. 2018. On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis. In Proceedings of the 2018 EMNLP Workshop. 40--46.Google ScholarCross Ref
- Raphael Campos, Sérgio Canuto, Thiago Salles, Clebson CA de Sá, and Marcos André Goncc alves. 2017. Stacking bagged and boosted forests for effective automated classification. In Proceedings of the 40th ACM SIGIR. 105--114.Google ScholarDigital Library
- Rafael MO Cruz, Robert Sabourin, and George DC Cavalcanti. 2017. META-DES. Oracle: Meta-learning and feature selection for dynamic ensemble selection. Information fusion , Vol. 38 (2017), 84--103.Google Scholar
- Rafael MO Cruz, Robert Sabourin, and George DC Cavalcanti. 2018b. Dynamic classifier selection: Recent advances and perspectives. Information Fusion , Vol. 41 (2018), 195--216.Google ScholarDigital Library
- Rafael MO Cruz, Robert Sabourin, George DC Cavalcanti, and Tsang Ing Ren. 2015. META-DES: a dynamic ensemble selection framework using meta-learning. Pattern recognition , Vol. 48, 5 (2015), 1925--1935.Google Scholar
- Rafael M. O. Cruz, Luiz G. Hafemann, Robert Sabourin, and George D. C. Cavalcanti. 2018a. DESlib: A Dynamic ensemble selection library in Python . arXiv preprint arXiv:1802.04967 (2018).Google Scholar
- Janez Demvs ar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets . Journal of Machine Learning Research , Vol. 7 (2006), 1--30.Google ScholarDigital Library
- Thomas G Dietterich. 2000. Ensemble methods in machine learning. In International workshop on multiple classifier systems. Springer, 1--15.Google ScholarDigital Library
- Haytham Elghazel, Alex Aussem, and Florence Perraud. 2011. Trading-off diversity and accuracy for optimal ensemble tree selection in random forests. In Ensembles in Machine Learning Applications. Springer, 169--179.Google Scholar
- Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research , Vol. 15, 1 (2014), 3133--3181.Google ScholarDigital Library
- Salvador Garc'ia, Zhong-Liang Zhang, Abdulrahman Altalhi, Saleh Alshomrani, and Francisco Herrera. 2018. Dynamic ensemble selection for multi-class imbalanced datasets. Information Sciences , Vol. 445 (2018), 22--37.Google ScholarCross Ref
- Md Zahidul Islam, Jixue Liu, Lin Liu, Jiuyong Li, and Wei Kang. 2019. Semantic Explanations in Ensemble Learning. In Proceedings of the PAKDD 2019 . 29--41.Google ScholarCross Ref
- Albert HR Ko, Robert Sabourin, and Alceu Souza Britto Jr. 2008. From dynamic classifier selection to dynamic ensemble selection. Pattern Recognition , Vol. 41 (2008).Google Scholar
- L. I Kuncheva. 2014. Combining Pattern Classifiers: Methods and Algorithms second edition ed.). John Wiley & Sons, Inc.Google Scholar
- Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification.. In AAAI , Vol. 333. 2267--2273.Google ScholarDigital Library
- Christopher D Manning and Hinrich Schütze. 1999. Foundations of statistical natural language processing .MIT press.Google ScholarDigital Library
- Anil Narassiguin, Haytham Elghazel, and Alex Aussem. 2017. Dynamic Ensemble Selection with Probabilistic Classifier Chains. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 169--186.Google Scholar
- Aytuug Onan, Serdar Korukouglu, and Hasan Bulut. 2016. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications , Vol. 57 (2016), 232--247.Google ScholarDigital Library
- Bo Pang and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr. , Vol. 2, 1--2 (2008), 1--135.Google ScholarDigital Library
- Fábio Pinto, Carlos Soares, and Jo ao Mendes-Moreira. 2016. CHADE: Metalearning with Classifier Chains for Dynamic Combination of Classifiers. In Proceedings of the ECML PKDD. 410--425.Google ScholarDigital Library
- Gregory Plumb, Denali Molitor, and Ameet S Talwalkar. 2018. Model Agnostic Supervised Local Explanations. In Advances in Neural Information Processing Systems. 2520--2529.Google Scholar
- Robi Polikar. 2006. Ensemble Based Systems in Decision Making. IEEE Circuits and Systems Magazine , Vol. 6, 3 (2006), 21--45.Google ScholarCross Ref
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?": Explaining the Predictions of Any Classifier. In the 22nd ACM SIGKDD. 1135--1144.Google Scholar
- Marko Robnik-vS ikonja. 2004. Improving random forests. In European conference on machine learning . Springer, 359--370.Google Scholar
- Lior Rokach. 2009. Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography. Computational Statistics & Data Analysis , Vol. 53, 12 (2009), 4046--4072.Google ScholarDigital Library
- Thiago Salles, Marcos Gonçalves, Victor Rodrigues, and Leonardo Rocha. 2018. Improving random forests by neighborhood projection for effective text classification. Information Systems , Vol. 77 (2018), 1--21.Google ScholarCross Ref
- Thiago Salles, Marcos Gonccalves, Victor Rodrigues, and Leonardo Rocha. 2015. BROOF: Exploiting Out-of-Bag Errors, Boosting and Random Forests for Effective Automated Classification. In Proceedings of the 38th ACM SIGIR . 353--362.Google ScholarDigital Library
- Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management , Vol. 24, 5 (1988).Google Scholar
- Robert E Schapire and Yoram Singer. 2000. BoosTexter: A boosting-based system for text categorization. Machine learning , Vol. 39, 2--3 (2000), 135--168.Google Scholar
- Grigorios Tsoumakas, Ioannis Partalas, and Ioannis Vlahavas. 2009. An ensemble pruning primer. In Applications of supervised and unsupervised ensemble methods .Google Scholar
- Alexey Tsymbal, Mykola Pechenizkiy, and Pádraig Cunningham. 2006. Dynamic integration with random forests. In Proceedings of the ECML. Springer, 801--808.Google ScholarDigital Library
- Gang Wang, Jianshan Sun, Jian Ma, Kaiquan Xu, and Jibao Gu. 2014. Sentiment classification: The contribution of ensemble learning. Decision support systems , Vol. 57 (2014), 77--93.Google Scholar
- Tomasz Woloszynski, Marek Kurzynski, Pawel Podsiadlo, and Gwidon W Stachowiak. 2012. A measure of competence based on random classification for dynamic ensemble selection. Information Fusion , Vol. 13, 3 (2012), 207--213.Google ScholarDigital Library
- Kevin Woods, W. Philip Kegelmeyer, and Kevin Bowyer. 1997. Combination of multiple classifiers using local accuracy estimates. IEEE transactions on pattern analysis and machine intelligence , Vol. 19, 4 (1997), 405--410.Google ScholarDigital Library
- Baoxun Xu, Xiufeng Guo, Yunming Ye, and Jiefeng Cheng. 2012. An Improved Random Forest Classifier for Text Categorization. JCP , Vol. 7, 12 (2012), 2913--2920.Google Scholar
- Fan Yang, Wei-hang Lu, Lin-kai Luo, and Tao Li. 2012. Margin optimization based pruning for random forest. Neurocomputing , Vol. 94 (2012), 54--63.Google ScholarDigital Library
- Heping Zhang and Minghui Wang. 2009. Search for the smallest random forest. Statistics and its Interface , Vol. 2, 3 (2009), 381.Google Scholar
- Zhong-Liang Zhang, Yu-Yu Chen, Jing Li, and Xing-Gang Luo. 2019. A distance-based weighting framework for boosting the performance of dynamic ensemble selection. Information Processing & Management , Vol. 56, 4 (2019), 1300--1316.Google ScholarDigital Library
Index Terms
- A Semantics Aware Random Forest for Text Classification
Recommendations
Rotation Forest: A New Classifier Ensemble Method
We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis ...
GRASP forest: a new ensemble method for trees
MCS'11: Proceedings of the 10th international conference on Multiple classifier systemsThis paper proposes a method for constructing ensembles of decision trees: GRASP Forest. This method uses the metaheuristic GRASP, usually used in optimization problems, to increase the diversity of the ensemble. While Random Forest increases the ...
An improved random forest based on the classification accuracy and correlation measurement of decision trees
Highlights- Propose an improved random forest based on the improvement of decision trees.
- Improve the evaluation mechanism for the classification effect of decision trees.
- Propose a method for quantifying the diversity between decision trees.
AbstractRandom forest is one of the most widely used machine learning algorithms. Decision trees used to construct the random forest may have low classification accuracies or high correlations, which affects the comprehensive performance of the random ...
Comments