skip to main content
10.1145/3357384.3357885acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach

Authors Info & Claims
Published:03 November 2019Publication History

ABSTRACT

Hierarchical multi-label text classification (HMTC) is a fundamental but challenging task of numerous applications (e.g., patent annotation), where documents are assigned to multiple categories stored in a hierarchical structure. Categories at different levels of a document tend to have dependencies. However, the majority of prior studies for the HMTC task employ classifiers to either deal with all categories simultaneously or decompose the original problem into a set of flat multi-label classification subproblems, ignoring the associations between texts and the hierarchical structure and the dependencies among different levels of the hierarchical structure. To that end, in this paper, we propose a novel framework called Hierarchical Attention-based Recurrent Neural Network (HARNN) for classifying documents into the most relevant categories level by level via integrating texts and the hierarchical category structure. Specifically, we first apply a documentation representing layer for obtaining the representation of texts and the hierarchical structure. Then, we develop an hierarchical attention-based recurrent layer to model the dependencies among different levels of the hierarchical structure in a top-down fashion. Here, a hierarchical attention strategy is proposed to capture the associations between texts and the hierarchical structure. Finally, we design a hybrid method which is capable of predicting the categories of each level while classifying all categories in the entire hierarchical structure precisely. Extensive experimental results on two real-world datasets demonstrate the effectiveness and explanatory power of HARNN.

References

  1. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.Google ScholarGoogle Scholar
  2. P. N. Bennett and N. Nguyen. Refined experts: improving classification in large taxonomies. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 11--18. ACM, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. Bi and J. T. Kwok. Multi-label classification on tree-and dag-structured hierarchies. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 17--24, 2011.Google ScholarGoogle Scholar
  4. H. B. Borges and J. C. Nievola. Multi-label hierarchical classification using a competitive neural network for protein function prediction. In Neural Networks (IJCNN), The 2012 International Joint Conference on, pages 1--8. IEEE, 2012.Google ScholarGoogle Scholar
  5. A. Braytee, W. Liu, D. R. Catchpoole, and P. J. Kennedy. Multi-label feature selection using correlation information. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1649--1656. ACM, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 78--87. ACM, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Cerri, R. C. Barros, A. C. de Carvalho, and Y. Jin. Reduction strategies for hierarchical multi-label classification in protein function prediction. BMC bioinformatics, 17(1):373, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  8. N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7(Jan):31--54, 2006.Google ScholarGoogle Scholar
  9. J. Davis and M. Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233--240. ACM, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. O. Dekel, J. Keshet, and Y. Singer. Large margin hierarchical classification. In Proceedings of the twenty-first international conference on Machine learning, page 27. ACM, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Esuli, T. Fagni, and F. Sebastiani. Boosting multi-label hierarchical text categorization. Information Retrieval, 11(4):287--313, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. J. Fall, A. Törcsvári, K. Benzineb, and G. Karetka. Automated categorization in the international patent classification. In Acm Sigir Forum, volume 37, pages 10--25. ACM, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Gibaja and S. Ventura. Multi-label learning: a review of the state of the art and ongoing research. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(6):411--444, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. C. Gomez and M.-F. Moens. A survey of automated hierarchical classification of patents. In Professional Search in the Modern World, pages 215--249. Springer, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.Google ScholarGoogle Scholar
  16. J. Han, C. Wang, and A. El-Kishky. Bringing structure to text: mining phrases, entities, topics, and hierarchies. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1968--1968. ACM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. Huang, Q. Liu, E. Chen, H. Zhao, M. Gao, S. Wei, Y. Su, and G. Hu. Question difficulty prediction for reading problems in standard tests. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google ScholarGoogle Scholar
  20. Z. Lin, M. Feng, C. N. d. Santos, M. Yu, B. Xiang, B. Zhou, and Y. Bengio. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130, 2017.Google ScholarGoogle Scholar
  21. J. Ma, P. Cui, X. Wang, and W. Zhu. Hierarchical taxonomy aware network embedding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1920--1929. ACM, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Mayne and R. Perry. Hierarchically classifying documents with multiple labels. In Computational Intelligence and Data Mining, 2009. CIDM'09. IEEE Symposium on, pages 133--139. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  23. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Z. Ren, M.-H. Peetz, S. Liang, W. Van Dolen, and M. De Rijke. Hierarchical multi-label classification of social text streams. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 213--222. ACM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Learning hierarchical multi-category text classification models. In Proceedings of the 22nd international conference on Machine learning, pages 744--751. ACM, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. Journal of Machine Learning Research, 7(Jul):1601--1626, 2006.Google ScholarGoogle Scholar
  27. M. E. Ruiz and P. Srinivasan. Hierarchical text categorization using neural networks. Information Retrieval, 5(1):87--118, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. N. Silla and A. A. Freitas. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1--2):31--72, 2011.Google ScholarGoogle Scholar
  29. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929--1958, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Tao, S. Tong, H. Zhao, T. Xu, B. Jin, and Q. Liu. A radical-aware attention-based model for chinese text classification. In Thirty-Third AAAI Conference on Artificial Intelligence, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  31. C. Vens, J. Struyf, L. Schietgat, S. Dvz eroski, and H. Blockeel. Decision trees for hierarchical multi-label classification. Machine learning, 73(2):185, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. X. Wang and G. Sukthankar. Multi-label relational neighbor classification using social context features. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 464--472. ACM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Wehrmann, R. Cerri, and R. Barros. Hierarchical multi-label classification networks. In International Conference on Machine Learning, pages 5225--5234, 2018.Google ScholarGoogle Scholar
  34. F. Wu, J. Zhang, and V. Honavar. Learning classifiers using hierarchically structured class taxonomies. In International Symposium on Abstraction, Reformulation, and Approximation, pages 313--320. Springer, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Xu, Z. Wang, Z. Shen, Y. Wang, and E. Chen. Learning low-rank label correlations for multi-label classification with missing labels. In 2014 IEEE International Conference on Data Mining, pages 1067--1072. IEEE, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. B. Yang, J.-T. Sun, T. Wang, and Z. Chen. Effective multi-label active learning for text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 917--926. ACM, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Zhang, K. Xiao, Q. Liu, Y. Tao, and Y. Deng. Modeling social attention for stock analysis: An influence propagation perspective. In 2015 IEEE International Conference on Data Mining, pages 609--618. IEEE, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
        November 2019
        3373 pages
        ISBN:9781450369763
        DOI:10.1145/3357384

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 November 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CIKM '19 Paper Acceptance Rate202of1,031submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader