Skip to main content
Top

2017 | OriginalPaper | Chapter

A Supervised Term Weighting Scheme for Multi-class Text Categorization

Authors : Yiwei Gu, Xiaodong Gu

Published in: Intelligent Computing Methodologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most supervised term weighting (STW) schemes can only be applied to binary text classification tasks such as sentiment analysis (SA) rather than text classification with more than two categories. In this paper, we proposed a new supervised term weighting scheme for multi-class text categorization. The so-called inverse term entropy (ite) measures the distribution of different terms across all the categories according to the definition of entropy in information theory. We present experimental results obtained on the 20NewsGroup dataset with a popular classifier learning method, support vector machine (SVM). Our weighting scheme ite achieved the best result in classification accuracy compared with other existing methods. And ite has the most stable performance with the reduction of training samples as well. Furthermore, our method has a built-in property to prevent over-weighting in STW. Over-weighting is a newly proposed concept especially with supervised term weightings in our earlier work and re-introduced here. Caused by the improper singular terms and too large ratios between term weights, over-weighting could deprive the performance of text classification tasks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bata, I., Hauskrecht, M.: Boosting KNN text classification accuracy by using supervised term weighting schemes. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2041–2044. ACM, November 2009 Bata, I., Hauskrecht, M.: Boosting KNN text classification accuracy by using supervised term weighting schemes. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2041–2044. ACM, November 2009
2.
go back to reference Croft, W.B.: Experiments with representation in a document-retrieval system. Inf. Technol.-Res. Dev. Appl. 2(1), 1–21 (1983) Croft, W.B.: Experiments with representation in a document-retrieval system. Inf. Technol.-Res. Dev. Appl. 2(1), 1–21 (1983)
3.
go back to reference Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and Its Applications. Springer, Heidelberg, pp. 81–97 (2004)CrossRef Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and Its Applications. Springer, Heidelberg, pp. 81–97 (2004)CrossRef
4.
go back to reference Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(1), 1871–1874 (2008)MATH Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(1), 1871–1874 (2008)MATH
5.
go back to reference Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)CrossRef Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)CrossRef
6.
go back to reference Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments. Inf. Process. Manag. 36(6), 779–808 (2000)CrossRef Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments. Inf. Process. Manag. 36(6), 779–808 (2000)CrossRef
7.
go back to reference Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339, July 1995 Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339, July 1995
8.
go back to reference Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics, June 2011 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics, June 2011
9.
go back to reference Martineau, J., Finin, T.: Delta TFIDF: an improved feature space for sentiment analysis. ICWSM 9, 106 (2009) Martineau, J., Finin, T.: Delta TFIDF: an improved feature space for sentiment analysis. ICWSM 9, 106 (2009)
10.
go back to reference Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1386–1395. Association for Computational Linguistics, July 2010 Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1386–1395. Association for Computational Linguistics, July 2010
11.
go back to reference Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics, July 2004 Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics, July 2004
12.
go back to reference Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)CrossRef Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)CrossRef
14.
go back to reference Soucy, P., Mineau, G.W.: Beyond TFIDF weighting for text categorization in the vector space model. IJCAI 5, 1130–1135 (2005) Soucy, P., Mineau, G.W.: Beyond TFIDF weighting for text categorization in the vector space model. IJCAI 5, 1130–1135 (2005)
16.
go back to reference Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs. inverse document frequency. In: ACM SIGIR Forum, vol. 16, no. 1, pp. 30–39. ACM, May 1981 Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs. inverse document frequency. In: ACM SIGIR Forum, vol. 16, no. 1, pp. 30–39. ACM, May 1981
Metadata
Title
A Supervised Term Weighting Scheme for Multi-class Text Categorization
Authors
Yiwei Gu
Xiaodong Gu
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-63315-2_38

Premium Partner