Skip to main content
Top

2015 | OriginalPaper | Chapter

A Novel Feature Selection Method Based on Category Distribution and Phrase Attributes

Authors : Yi Zheng, Weihong Han, Chengzhang Zhu

Published in: Trustworthy Computing and Services

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A novel and effective feature selection method called CDPAB-FSM(category distribution and phrase attribute based feature selection method) for automatical Chinese web page classification was proposed. The method combined the distribution among categories with that within the category. As well, the length and position of the phrases to select were taken into account in order to distinguish the feature phrases from other unimportant ones. Experiments showed that CDPAB-FSM was suited for feature selection of Chinese web page classification and it achieved better classification results than TF-IDF did.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Baboo, S.S., Sasikala, S.: Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. In: 2010 IEEE International Conference on Communication Control and Computing Technologies, pp. 748–757. IEEE (2010) Baboo, S.S., Sasikala, S.: Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. In: 2010 IEEE International Conference on Communication Control and Computing Technologies, pp. 748–757. IEEE (2010)
2.
go back to reference Cambria, E., Huang, G.B., Kasun, L.L.C., Zhou, H., Vong, C.M., Lin, J., Yin, J., Cai, Z., Liu, Q., Li, K., et al.: Extreme learning machines. IEEE Intell. Syst. 28(6), 30–59 (2013)CrossRef Cambria, E., Huang, G.B., Kasun, L.L.C., Zhou, H., Vong, C.M., Lin, J., Yin, J., Cai, Z., Liu, Q., Li, K., et al.: Extreme learning machines. IEEE Intell. Syst. 28(6), 30–59 (2013)CrossRef
3.
go back to reference Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42(2), 513–529 (2012)CrossRef Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42(2), 513–529 (2012)CrossRef
4.
go back to reference Lan, Y., Hu, Z., Soh, Y.C., Huang, G.B.: An extreme learning machine approach for speaker recognition. Neural Comput. Appl. 22(3–4), 417–425 (2013)CrossRef Lan, Y., Hu, Z., Soh, Y.C., Huang, G.B.: An extreme learning machine approach for speaker recognition. Neural Comput. Appl. 22(3–4), 417–425 (2013)CrossRef
5.
go back to reference Mesleh, A.M.: Chi square feature extraction based svms arabic language text categorization system. J. Comput. Sci. 3(6), 430 (2007)CrossRef Mesleh, A.M.: Chi square feature extraction based svms arabic language text categorization system. J. Comput. Sci. 3(6), 430 (2007)CrossRef
6.
go back to reference Oveisi, F., Oveisi, S., Erfanian, A., Patras, I.: Tree-structured feature extraction using mutual information. IEEE Trans. Neural Netw. Learn. Syst. 23(1), 127–137 (2012)CrossRef Oveisi, F., Oveisi, S., Erfanian, A., Patras, I.: Tree-structured feature extraction using mutual information. IEEE Trans. Neural Netw. Learn. Syst. 23(1), 127–137 (2012)CrossRef
7.
go back to reference Patil, L.H., Atique, M.: A novel approach for feature selection method tf-idf in document clustering. In: 2013 IEEE 3rd International Advance Computing Conference, pp. 858–862. IEEE (2013) Patil, L.H., Atique, M.: A novel approach for feature selection method tf-idf in document clustering. In: 2013 IEEE 3rd International Advance Computing Conference, pp. 858–862. IEEE (2013)
8.
go back to reference Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRef Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRef
9.
go back to reference Xu, J., Zhou, H., Huang, G.B.: Extreme learning machine based fast object recognition. In: 15th International Conference on Information Fusion, pp. 1490–1496. IEEE (2012) Xu, J., Zhou, H., Huang, G.B.: Extreme learning machine based fast object recognition. In: 15th International Conference on Information Fusion, pp. 1490–1496. IEEE (2012)
10.
go back to reference Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: 1997 International Conference on Machine Learning, vol. 97, pp. 412–420 (1997) Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: 1997 International Conference on Machine Learning, vol. 97, pp. 412–420 (1997)
11.
go back to reference Zhang, H., Ren, Y.g., Yang, X.: Research on text feature selection algorithm based on information gain and feature relation tree. In: 10th Web Information System and Application Conference. pp. 446–449. IEEE (2013) Zhang, H., Ren, Y.g., Yang, X.: Research on text feature selection algorithm based on information gain and feature relation tree. In: 10th Web Information System and Application Conference. pp. 446–449. IEEE (2013)
12.
go back to reference Zhu, D., Xiao, J.: R-tfidf, a variety of tf-idf term weighting strategy in document categorization. In: Seventh International Conference on Semantics Knowledge and Grid, pp. 83–90. IEEE (2011) Zhu, D., Xiao, J.: R-tfidf, a variety of tf-idf term weighting strategy in document categorization. In: Seventh International Conference on Semantics Knowledge and Grid, pp. 83–90. IEEE (2011)
13.
go back to reference Zong, W., Huang, G.B.: Face recognition based on extreme learning machine. Neurocomputing 74(16), 2541–2551 (2011)CrossRef Zong, W., Huang, G.B.: Face recognition based on extreme learning machine. Neurocomputing 74(16), 2541–2551 (2011)CrossRef
Metadata
Title
A Novel Feature Selection Method Based on Category Distribution and Phrase Attributes
Authors
Yi Zheng
Weihong Han
Chengzhang Zhu
Copyright Year
2015
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-47401-3_4

Premium Partner