ABSTRACT
The number of digital documents, which are a collection of a huge volume of features on the Web, is increasing day-by-day. Hence, selection of important features relevant to the classification process, and consequently discarding irrelevant ones, is the need of the hour. Aiming in this direction, this paper highlights two important aspects of Information Retrieval:
- proposes a new feature selection technique called Commonality-Rarity Score Computation (CRSC) to find the important features from a large corpus.
- shows the importance of extended feature space of Extreme Learning Machine (ELM) in the field of text categorization.
Empirical results on two established datasets show that the proposed approach is more promising compared to the standard feature selection techniques and the performance of ELM outperforms other prominent classifiers.
- F. Sebastiani, "Machine learning in automated text categorization," ACM Comput. Surv., vol. 34, pp. 1--47, Mar. 2002. Google ScholarDigital Library
- Y. Yang and J. O. Pedersen, "A comparative study on feature selection in text categorization," in ICML, vol. 97, pp. 412--420, 1997. Google ScholarDigital Library
- J. Lee and D.-W. Kim, "Mutual information-based multi-label feature selection using interaction information," Expert Systems with Applications, vol. 42, no. 4, pp. 2013--2025, 2015. Google ScholarDigital Library
- J. Meng, H. Lin, and Y. Yu, "A two-stage feature selection method for text categorization," Computers & Mathematics with Applications, vol. 62, no. 7, pp. 2793--2800, 2011. Google ScholarDigital Library
- J. Yang, Y. Liu, Z. Liu, X. Zhu, and X. Zhang, "A new feature selection algorithm based on binomial hypothesis testing for spam filtering," Knowledge-Based Systems, vol. 24, no. 6, pp. 904--914, 2011. Google ScholarDigital Library
- R. K. Roul, S. R. Asthana, and G. Kumar, "Study on suitability and importance of multilayer extreme learning machine for classification of text data," Soft Computing, vol. 20, no. 6, pp. 1--18, 2016.Google Scholar
- N. Azam and J. Yao, "Comparison of term frequency and document frequency based feature selection metrics in text categorization," Expert Systems with Applications, vol. 39, no. 5, pp. 4760--4768, 2012. Google ScholarDigital Library
- G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, "Extreme learning machine: theory and applications," Neurocomputing, vol. 70, no. 1, pp. 489--501, 2006.Google ScholarCross Ref
- G.-B. Huang and L. Chen, "Convex incremental extreme learning machine," Neurocomputing, vol. 70, no. 16, pp. 3056--3062, 2007. Google ScholarDigital Library
- G.-B. Huang, L. Chen, C. K. Siew, et al., "Universal approximation using incremental constructive feedforward networks with random hidden nodes," IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879--892, 2006. Google ScholarDigital Library
- G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, "Extreme learning machine for regression and multiclass classification," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 2, pp. 513--529, 2012. Google ScholarDigital Library
Recommendations
Ensemble feature selection for single-label text classification: a comprehensive analytical study
AbstractDue to the large amount of textual data, text classification is a crucial problem in the modern era. In text classification studies, feature selection is one of the most crucial processes because it has a big impact on classification accuracy. ...
Comparison on Feature Selection Methods for Text Classification
ICMSS 2020: Proceedings of the 2020 4th International Conference on Management Engineering, Software Engineering and Service SciencesThe high-dimensional text data always contains a large quantity of noisy terms which bring negative effects on the performance of text classification. Feature selection is the common solution for dimension reduction in text classification. The choices of ...
Feature selection methods for text classification
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data miningWe consider feature selection for text classification both theoretically and empirically. Our main result is an unsupervised feature selection strategy for which we give worst-case theoretical guarantees on the generalization power of the resultant ...
Comments