Skip to main content
Top
Published in: Discover Computing 4/2008

01-08-2008

Boosting multi-label hierarchical text categorization

Authors: Andrea Esuli, Tiziano Fagni, Fabrizio Sebastiani

Published in: Discover Computing | Issue 4/2008

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Hierarchical Text Categorization (HTC) is the task of generating (usually by means of supervised learning algorithms) text classifiers that operate on hierarchically structured classification schemes. Notwithstanding the fact that most large-sized classification schemes for text have a hierarchical structure, so far the attention of text classification researchers has mostly focused on algorithms for “flat” classification, i.e. algorithms that operate on non-hierarchical classification schemes. These algorithms, once applied to a hierarchical classification problem, are not capable of taking advantage of the information inherent in the class hierarchy, and may thus be suboptimal, in terms of efficiency and/or effectiveness. In this paper we propose TreeBoost.MH, a multi-label HTC algorithm consisting of a hierarchical variant of AdaBoost.MH, a very well-known member of the family of “boosting” learning algorithms. TreeBoost.MH embodies several intuitions that had arisen before within HTC: e.g. the intuitions that both feature selection and the selection of negative training examples should be performed “locally”, i.e. by paying attention to the topology of the classification scheme. It also embodies the novel intuition that the weight distribution that boosting algorithms update at every boosting round should likewise be updated “locally”. All these intuitions are embodied within TreeBoost.MH in an elegant and simple way, i.e. by defining TreeBoost.MH as a recursive algorithm that uses AdaBoost.MH as its base step, and that recurs over the tree structure. We present the results of experimenting TreeBoost.MH on three HTC benchmarks, and discuss analytically its computational cost.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
4
This is the result of the fact that the very first publicly available TC benchmarks (e.g. the Reuters-21578 benchmark—see Sect. 5.1) had no hierarchical structure; the TC literature ended up in overfitting, at least to some degree, the available benchmarks.
 
5
Consistently with most mathematical literature we use the caret symbol (\(\hat{\,}\)) to indicate estimation. In fact, a classifier \(\hat\Upphi\) can be understood as an estimation, or approximation, of an unknown “target function” Φ.
 
6
In Schapire and Singer (2000) the value for ε is chosen by 3-fold cross validation on the training set, but this procedure is reported to give only marginal improvements with respect to the default choice of \(\epsilon=\frac{1}{gm},\) which we adopt in this work.
 
7
Note that a local policy would also implement this view, but is not made possible by AdaBoost.MH, since this latter uses the same set of features for all the categories involved in the ML classification problem. This means that we need to use the same set of features for all categories in \(\downarrow({c_{j}}).\)
 
8
Our analysis is here in terms of the S original weak hypotheses rather than in terms of the \(\Updelta\) combined weak hypotheses; this is because different internal nodes have different values of \(\Updelta,\) which would needlessly complicate the notation and the discussion; the conclusions are not affected anyway.
 
9
Reuters-21578 is freely available for experimentation purposes from http://​www.​daviddlewis.​com/​resources/​testcollections/​~reuters21578/​.
 
13
Note that many real-world classification schemes (e.g. the ACM Classification Scheme) are of this latter type, since their internal nodes usually have a special child category (called \({\mathsf{General}},\) or \({\mathsf{Other}}\)) which contains all documents belonging to the node but to none of its descendant leaves.
 
14
The reader might notice that the best performance we have obtained from AdaBoost.MH on Reuters-21578 (F 1 μ  = .808) is inferior to the one reported in Schapire and Singer (2000) for the same algorithm (F 1 μ  = .851). There are several reasons for this: (a) Schapire and Singer (2000) actually uses a different, much older version of this collection, called Reuters-21450 (Apté et al. 1994); (b) Schapire and Singer (2000) only uses the 93 categories which have at least 2 positive training examples and 1 positive test example, while we also use the categories that have just 1 positive training example and those that have no positive test example. This makes the two sets of AdaBoost.MH results difficult to compare.
 
Literature
go back to reference Apté, C., Damerau, F. J., & Weiss, S. M. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3), 233–251.CrossRef Apté, C., Damerau, F. J., & Weiss, S. M. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3), 233–251.CrossRef
go back to reference Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04), pp. 78–87. Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04), pp. 78–87.
go back to reference Ceci, M., & Malerba, D. (2007). Classifying Web documents in a hierarchy of categories: A comprehensive study. Journal of Intelligent Information Systems, 28(1), 37–78.CrossRef Ceci, M., & Malerba, D. (2007). Classifying Web documents in a hierarchy of categories: A comprehensive study. Journal of Intelligent Information Systems, 28(1), 37–78.CrossRef
go back to reference Chakrabarti, S., Dom, B. E., Agrawal, R., & Raghavan, P. (1998). Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. Journal of Very Large Data Bases, 7(3), 163–178.CrossRef Chakrabarti, S., Dom, B. E., Agrawal, R., & Raghavan, P. (1998). Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. Journal of Very Large Data Bases, 7(3), 163–178.CrossRef
go back to reference Cheng, C.-H., Tang, J., Wai-Chee, A., & King, I. (2001). Hierarchical classification of documents with error control. In Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’01) (pp. 433–443). Hong Kong, CN. Cheng, C.-H., Tang, J., Wai-Chee, A., & King, I. (2001). Hierarchical classification of documents with error control. In Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’01) (pp. 433–443). Hong Kong, CN.
go back to reference Dumais, S. T., & Chen, H. (2000). Hierarchical classification of web content. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (SIGIR’00) (pp. 256–263). Athens, GR. Dumais, S. T., & Chen, H. (2000). Hierarchical classification of web content. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (SIGIR’00) (pp. 256–263). Athens, GR.
go back to reference Fagni, T., & Sebastiani, F. (2007). On the selection of negative examples for hierarchical text categorization. In Proceedings of the 3rd Language & Technology Conference (LTC’07) (pp. 24–28). Poznań, PL. Fagni, T., & Sebastiani, F. (2007). On the selection of negative examples for hierarchical text categorization. In Proceedings of the 3rd Language & Technology Conference (LTC’07) (pp. 24–28). Poznań, PL.
go back to reference Forman, G. (2004). A pitfall and solution in multi-class feature selection for text classification. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). Banff, CA. Forman, G. (2004). A pitfall and solution in multi-class feature selection for text classification. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). Banff, CA.
go back to reference Gaussier, É., Goutte, C., Popat, K., & Chen, F. (2002). A hierarchical model for clustering and categorising documents. In Proceedings of the 24th European Colloquium on Information Retrieval Research (ECIR’02) (pp. 229–247). Glasgow, UK. Gaussier, É., Goutte, C., Popat, K., & Chen, F. (2002). A hierarchical model for clustering and categorising documents. In Proceedings of the 24th European Colloquium on Information Retrieval Research (ECIR’02) (pp. 229–247). Glasgow, UK.
go back to reference Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning (ICML’97) (pp. 170–178). Nashville, US. Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning (ICML’97) (pp. 170–178). Nashville, US.
go back to reference Lewis, D. D. (1992). Representation and learning in information retrieval. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, US. Lewis, D. D. (1992). Representation and learning in information retrieval. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, US.
go back to reference Lewis, D. D., Li, F., Rose, T., & Yang, Y. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397. Lewis, D. D., Li, F., Rose, T., & Yang, Y. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397.
go back to reference Liu, T. Y., Yang, Y., Wan, H., Zeng, H. J., Chen, Z., & Ma, W. Y. (2005). Support vector machines classification with a very large-scale taxonomy. SIGKDD Explorations, 7(1), 36–43.CrossRef Liu, T. Y., Yang, Y., Wan, H., Zeng, H. J., Chen, Z., & Ma, W. Y. (2005). Support vector machines classification with a very large-scale taxonomy. SIGKDD Explorations, 7(1), 36–43.CrossRef
go back to reference McCallum, A. K., Rosenfeld, R., Mitchell, T. M., Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 15th International Conference on Machine Learning (ICML’98) (pp. 359–367). Madison, US. McCallum, A. K., Rosenfeld, R., Mitchell, T. M., Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 15th International Conference on Machine Learning (ICML’98) (pp. 359–367). Madison, US.
go back to reference Ng, H. T., Goh, W. B., Low, K. L. (1997). Feature selection, perceptron learning, and a usability case study for text categorization. In Proceedings of the 20th ACM International Conference on Research and Development in Information Retrieval (SIGIR’97) (pp. 67–73). Philadelphia, US. Ng, H. T., Goh, W. B., Low, K. L. (1997). Feature selection, perceptron learning, and a usability case study for text categorization. In Proceedings of the 20th ACM International Conference on Research and Development in Information Retrieval (SIGIR’97) (pp. 67–73). Philadelphia, US.
go back to reference Rose, T., Stevenson, M., & Whitehead, M. (2002). The Reuters Corpus Volume 1—from yesterday’s news to tomorrow’s language resources. In Proceedings of the 3rd International Conference on Language (LREC’02) Resources and Evaluation (pp. 827–832). Las Palmas, ES. Rose, T., Stevenson, M., & Whitehead, M. (2002). The Reuters Corpus Volume 1—from yesterday’s news to tomorrow’s language resources. In Proceedings of the 3rd International Conference on Language (LREC’02) Resources and Evaluation (pp. 827–832). Las Palmas, ES.
go back to reference Ruiz, M., & Srinivasan, P. (2002). Hierarchical text classification using neural networks. Information Retrieval, 5(1), 87–118.CrossRefMATH Ruiz, M., & Srinivasan, P. (2002). Hierarchical text classification using neural networks. Information Retrieval, 5(1), 87–118.CrossRefMATH
go back to reference Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.CrossRefMATH Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.CrossRefMATH
go back to reference Schapire, R. E., & Singer, Y. (2000). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168.CrossRefMATH Schapire, R. E., & Singer, Y. (2000). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168.CrossRefMATH
go back to reference Schapire, R. E., Singer, Y., & Singhal, A. (1998). Boosting and Rocchio applied to text filtering. In Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval (SIGIR’98) (pp. 215–223). Melbourne, AU. Schapire, R. E., Singer, Y., & Singhal, A. (1998). Boosting and Rocchio applied to text filtering. In Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval (SIGIR’98) (pp. 215–223). Melbourne, AU.
go back to reference Sebastiani, F., Sperduti, A., & Valdambrini N. (2000). An improved boosting algorithm and its application to automated text categorization. In Proceedings of the 9th ACM International Conference on Information and Knowledge Management (CIKM’00) (pp. 78–85). McLean, US. Sebastiani, F., Sperduti, A., & Valdambrini N. (2000). An improved boosting algorithm and its application to automated text categorization. In Proceedings of the 9th ACM International Conference on Information and Knowledge Management (CIKM’00) (pp. 78–85). McLean, US.
go back to reference Spiegel, M. R., & Stephens, L. J. (1999). Statistics (3rd ed.). New York, US: McGraw-Hill. Spiegel, M. R., & Stephens, L. J. (1999). Statistics (3rd ed.). New York, US: McGraw-Hill.
go back to reference Sun, A., & Lim, E.-P. (2001). Hierarchical text classification and evaluation. In Proceedings of the 1st IEEE International Conference on Data Mining (ICDM-01) (pp. 521–528). San Jose, US. Sun, A., & Lim, E.-P. (2001). Hierarchical text classification and evaluation. In Proceedings of the 1st IEEE International Conference on Data Mining (ICDM-01) (pp. 521–528). San Jose, US.
go back to reference Toutanova, K., Chen, F., Popat, K., & Hofmann, T. (2001). Text classification in a hierarchical mixture model for small training sets. In Proceedings of the 10th ACM International Conference on Information and Knowledge Management (CIKM’01) (pp. 105–113). Atlanta, US. Toutanova, K., Chen, F., Popat, K., & Hofmann, T. (2001). Text classification in a hierarchical mixture model for small training sets. In Proceedings of the 10th ACM International Conference on Information and Knowledge Management (CIKM’01) (pp. 105–113). Atlanta, US.
go back to reference Vinokourov, A., & Girolami, M. (2002). A probabilistic framework for the hierarchic organisation and classification of document collections. Journal of Intelligent Information Systems, 18(2/3), 153–172.CrossRef Vinokourov, A., & Girolami, M. (2002). A probabilistic framework for the hierarchic organisation and classification of document collections. Journal of Intelligent Information Systems, 18(2/3), 153–172.CrossRef
go back to reference Weigend, A. S., Wiener, E. D., & Pedersen, J. O. (1999). Exploiting hierarchy in text categorization. Information Retrieval, 1(3), 193–216.CrossRef Weigend, A. S., Wiener, E. D., & Pedersen, J. O. (1999). Exploiting hierarchy in text categorization. Information Retrieval, 1(3), 193–216.CrossRef
go back to reference Wiener, E. D., Pedersen, J. O., & Weigend, A. S. (1995). A neural network approach to topic spotting. In Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR’95) (pp. 317–332). Las Vegas, US. Wiener, E. D., Pedersen, J. O., & Weigend, A. S. (1995). A neural network approach to topic spotting. In Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR’95) (pp. 317–332). Las Vegas, US.
go back to reference Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR’99) (pp. 42–49). Berkeley, US. Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR’99) (pp. 42–49). Berkeley, US.
go back to reference Yang, Y., Zhang, J., & Kisiel, B. (2003). A scalability analysis of classifiers in text categorization. In Proceedings of the 26th ACM International Conference on Research and Development in Information Retrieval (SIGIR’03) (pp. 96–103). Toronto, CA. Yang, Y., Zhang, J., & Kisiel, B. (2003). A scalability analysis of classifiers in text categorization. In Proceedings of the 26th ACM International Conference on Research and Development in Information Retrieval (SIGIR’03) (pp. 96–103). Toronto, CA.
Metadata
Title
Boosting multi-label hierarchical text categorization
Authors
Andrea Esuli
Tiziano Fagni
Fabrizio Sebastiani
Publication date
01-08-2008
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 4/2008
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-008-9047-y

Other articles of this Issue 4/2008

Discover Computing 4/2008 Go to the issue

Preface

Preface

Premium Partner