short-paper

Commonality-Rarity Score Computation: A novel Feature Selection Technique using Extended Feature Space of ELM for Text Classification

Authors:
Rajendra Kumar Roul

BITS, Pilani- Goa Campus, Zuarinagar, Goa

BITS, Pilani- Goa Campus, Zuarinagar, Goa
View Profile

,
Aditya Bhalla

BITS, Pilani- Goa Campus, Zuarinagar, Goa

BITS, Pilani- Goa Campus, Zuarinagar, Goa
View Profile

,
Abhishek Srivastava

BITS, Pilani- Goa Campus, Zuarinagar, Goa

BITS, Pilani- Goa Campus, Zuarinagar, Goa
View Profile

FIRE '16: Proceedings of the 8th Annual Meeting of the Forum for Information Retrieval EvaluationDecember 2016Pages 37–41https://doi.org/10.1145/3015157.3015165

Published:08 December 2016Publication History

FIRE '16: Proceedings of the 8th Annual Meeting of the Forum for Information Retrieval Evaluation

Pages 37–41

ABSTRACT

The number of digital documents, which are a collection of a huge volume of features on the Web, is increasing day-by-day. Hence, selection of important features relevant to the classification process, and consequently discarding irrelevant ones, is the need of the hour. Aiming in this direction, this paper highlights two important aspects of Information Retrieval:

- proposes a new feature selection technique called Commonality-Rarity Score Computation (CRSC) to find the important features from a large corpus.

- shows the importance of extended feature space of Extreme Learning Machine (ELM) in the field of text categorization.

Empirical results on two established datasets show that the proposed approach is more promising compared to the standard feature selection techniques and the performance of ELM outperforms other prominent classifiers.

References

F. Sebastiani, "Machine learning in automated text categorization," ACM Comput. Surv., vol. 34, pp. 1--47, Mar. 2002. Google ScholarDigital Library
Y. Yang and J. O. Pedersen, "A comparative study on feature selection in text categorization," in ICML, vol. 97, pp. 412--420, 1997. Google ScholarDigital Library
J. Lee and D.-W. Kim, "Mutual information-based multi-label feature selection using interaction information," Expert Systems with Applications, vol. 42, no. 4, pp. 2013--2025, 2015. Google ScholarDigital Library
J. Meng, H. Lin, and Y. Yu, "A two-stage feature selection method for text categorization," Computers & Mathematics with Applications, vol. 62, no. 7, pp. 2793--2800, 2011. Google ScholarDigital Library
J. Yang, Y. Liu, Z. Liu, X. Zhu, and X. Zhang, "A new feature selection algorithm based on binomial hypothesis testing for spam filtering," Knowledge-Based Systems, vol. 24, no. 6, pp. 904--914, 2011. Google ScholarDigital Library
R. K. Roul, S. R. Asthana, and G. Kumar, "Study on suitability and importance of multilayer extreme learning machine for classification of text data," Soft Computing, vol. 20, no. 6, pp. 1--18, 2016.Google Scholar
N. Azam and J. Yao, "Comparison of term frequency and document frequency based feature selection metrics in text categorization," Expert Systems with Applications, vol. 39, no. 5, pp. 4760--4768, 2012. Google ScholarDigital Library
G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, "Extreme learning machine: theory and applications," Neurocomputing, vol. 70, no. 1, pp. 489--501, 2006.Google ScholarCross Ref
G.-B. Huang and L. Chen, "Convex incremental extreme learning machine," Neurocomputing, vol. 70, no. 16, pp. 3056--3062, 2007. Google ScholarDigital Library
G.-B. Huang, L. Chen, C. K. Siew, et al., "Universal approximation using incremental constructive feedforward networks with random hidden nodes," IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879--892, 2006. Google ScholarDigital Library
G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, "Extreme learning machine for regression and multiclass classification," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 2, pp. 513--529, 2012. Google ScholarDigital Library

Recommendations

Ensemble feature selection for single-label text classification: a comprehensive analytical study
Abstract
Due to the large amount of textual data, text classification is a crucial problem in the modern era. In text classification studies, feature selection is one of the most crucial processes because it has a big impact on classification accuracy. ...
Read More
Comparison on Feature Selection Methods for Text Classification
ICMSS 2020: Proceedings of the 2020 4th International Conference on Management Engineering, Software Engineering and Service Sciences

The high-dimensional text data always contains a large quantity of noisy terms which bring negative effects on the performance of text classification. Feature selection is the common solution for dimension reduction in text classification. The choices of ...
Read More
Feature selection methods for text classification
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

We consider feature selection for text classification both theoretically and empirically. Our main result is an unsupervised feature selection strategy for which we give worst-case theoretical guarantees on the generalization power of the resultant ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FIRE '16: Proceedings of the 8th Annual Meeting of the Forum for Information Retrieval Evaluation
December 2016
47 pages
ISBN:9781450348386
DOI:10.1145/3015157
Editors:
Prasenjit Majumder,
Mandar Mitra,
Jainisha Sankhavara,
Parth Mehta
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 December 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Commonality
Extreme learning machine
Feature selection
Rarity
Text classification
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
FIRE '16 Paper Acceptance Rate7of22submissions,32%Overall Acceptance Rate19of64submissions,30%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 83
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Commonality-Rarity Score Computation: A novel Feature Selection Technique using Extended Feature Space of ELM for Text Classification

FIRE '16: Proceedings of the 8th Annual Meeting of the Forum for Information Retrieval Evaluation

ABSTRACT

References

Cited By

Recommendations

Ensemble feature selection for single-label text classification: a comprehensive analytical study

Comparison on Feature Selection Methods for Text Classification

Feature selection methods for text classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Commonality-Rarity Score Computation: A novel Feature Selection Technique using Extended Feature Space of ELM for Text Classification

FIRE '16: Proceedings of the 8th Annual Meeting of the Forum for Information Retrieval Evaluation

ABSTRACT

References

Cited By

Recommendations

Ensemble feature selection for single-label text classification: a comprehensive analytical study

Comparison on Feature Selection Methods for Text Classification

Feature selection methods for text classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media