ABSTRACT
Hierarchical multi-label text classification (HMTC) is a fundamental but challenging task of numerous applications (e.g., patent annotation), where documents are assigned to multiple categories stored in a hierarchical structure. Categories at different levels of a document tend to have dependencies. However, the majority of prior studies for the HMTC task employ classifiers to either deal with all categories simultaneously or decompose the original problem into a set of flat multi-label classification subproblems, ignoring the associations between texts and the hierarchical structure and the dependencies among different levels of the hierarchical structure. To that end, in this paper, we propose a novel framework called Hierarchical Attention-based Recurrent Neural Network (HARNN) for classifying documents into the most relevant categories level by level via integrating texts and the hierarchical category structure. Specifically, we first apply a documentation representing layer for obtaining the representation of texts and the hierarchical structure. Then, we develop an hierarchical attention-based recurrent layer to model the dependencies among different levels of the hierarchical structure in a top-down fashion. Here, a hierarchical attention strategy is proposed to capture the associations between texts and the hierarchical structure. Finally, we design a hybrid method which is capable of predicting the categories of each level while classifying all categories in the entire hierarchical structure precisely. Extensive experimental results on two real-world datasets demonstrate the effectiveness and explanatory power of HARNN.
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.Google Scholar
- P. N. Bennett and N. Nguyen. Refined experts: improving classification in large taxonomies. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 11--18. ACM, 2009.Google ScholarDigital Library
- W. Bi and J. T. Kwok. Multi-label classification on tree-and dag-structured hierarchies. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 17--24, 2011.Google Scholar
- H. B. Borges and J. C. Nievola. Multi-label hierarchical classification using a competitive neural network for protein function prediction. In Neural Networks (IJCNN), The 2012 International Joint Conference on, pages 1--8. IEEE, 2012.Google Scholar
- A. Braytee, W. Liu, D. R. Catchpoole, and P. J. Kennedy. Multi-label feature selection using correlation information. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1649--1656. ACM, 2017.Google ScholarDigital Library
- L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 78--87. ACM, 2004.Google ScholarDigital Library
- R. Cerri, R. C. Barros, A. C. de Carvalho, and Y. Jin. Reduction strategies for hierarchical multi-label classification in protein function prediction. BMC bioinformatics, 17(1):373, 2016.Google ScholarCross Ref
- N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7(Jan):31--54, 2006.Google Scholar
- J. Davis and M. Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233--240. ACM, 2006.Google ScholarDigital Library
- O. Dekel, J. Keshet, and Y. Singer. Large margin hierarchical classification. In Proceedings of the twenty-first international conference on Machine learning, page 27. ACM, 2004.Google ScholarDigital Library
- A. Esuli, T. Fagni, and F. Sebastiani. Boosting multi-label hierarchical text categorization. Information Retrieval, 11(4):287--313, 2008.Google ScholarDigital Library
- C. J. Fall, A. Törcsvári, K. Benzineb, and G. Karetka. Automated categorization in the international patent classification. In Acm Sigir Forum, volume 37, pages 10--25. ACM, 2003.Google ScholarDigital Library
- E. Gibaja and S. Ventura. Multi-label learning: a review of the state of the art and ongoing research. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(6):411--444, 2014.Google ScholarCross Ref
- J. C. Gomez and M.-F. Moens. A survey of automated hierarchical classification of patents. In Professional Search in the Modern World, pages 215--249. Springer, 2014.Google ScholarCross Ref
- A. Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.Google Scholar
- J. Han, C. Wang, and A. El-Kishky. Bringing structure to text: mining phrases, entities, topics, and hierarchies. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1968--1968. ACM, 2014.Google ScholarDigital Library
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997.Google ScholarDigital Library
- Z. Huang, Q. Liu, E. Chen, H. Zhao, M. Gao, S. Wei, Y. Su, and G. Hu. Question difficulty prediction for reading problems in standard tests. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.Google ScholarDigital Library
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google Scholar
- Z. Lin, M. Feng, C. N. d. Santos, M. Yu, B. Xiang, B. Zhou, and Y. Bengio. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130, 2017.Google Scholar
- J. Ma, P. Cui, X. Wang, and W. Zhu. Hierarchical taxonomy aware network embedding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1920--1929. ACM, 2018.Google ScholarDigital Library
- A. Mayne and R. Perry. Hierarchically classifying documents with multiple labels. In Computational Intelligence and Data Mining, 2009. CIDM'09. IEEE Symposium on, pages 133--139. IEEE, 2009.Google ScholarCross Ref
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013.Google ScholarDigital Library
- Z. Ren, M.-H. Peetz, S. Liang, W. Van Dolen, and M. De Rijke. Hierarchical multi-label classification of social text streams. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 213--222. ACM, 2014.Google ScholarDigital Library
- J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Learning hierarchical multi-category text classification models. In Proceedings of the 22nd international conference on Machine learning, pages 744--751. ACM, 2005.Google ScholarDigital Library
- J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. Journal of Machine Learning Research, 7(Jul):1601--1626, 2006.Google Scholar
- M. E. Ruiz and P. Srinivasan. Hierarchical text categorization using neural networks. Information Retrieval, 5(1):87--118, 2002.Google ScholarDigital Library
- C. N. Silla and A. A. Freitas. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1--2):31--72, 2011.Google Scholar
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929--1958, 2014.Google ScholarDigital Library
- H. Tao, S. Tong, H. Zhao, T. Xu, B. Jin, and Q. Liu. A radical-aware attention-based model for chinese text classification. In Thirty-Third AAAI Conference on Artificial Intelligence, 2019.Google ScholarCross Ref
- C. Vens, J. Struyf, L. Schietgat, S. Dvz eroski, and H. Blockeel. Decision trees for hierarchical multi-label classification. Machine learning, 73(2):185, 2008.Google ScholarDigital Library
- X. Wang and G. Sukthankar. Multi-label relational neighbor classification using social context features. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 464--472. ACM, 2013.Google ScholarDigital Library
- J. Wehrmann, R. Cerri, and R. Barros. Hierarchical multi-label classification networks. In International Conference on Machine Learning, pages 5225--5234, 2018.Google Scholar
- F. Wu, J. Zhang, and V. Honavar. Learning classifiers using hierarchically structured class taxonomies. In International Symposium on Abstraction, Reformulation, and Approximation, pages 313--320. Springer, 2005.Google ScholarDigital Library
- L. Xu, Z. Wang, Z. Shen, Y. Wang, and E. Chen. Learning low-rank label correlations for multi-label classification with missing labels. In 2014 IEEE International Conference on Data Mining, pages 1067--1072. IEEE, 2014.Google ScholarDigital Library
- B. Yang, J.-T. Sun, T. Wang, and Z. Chen. Effective multi-label active learning for text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 917--926. ACM, 2009.Google ScholarDigital Library
- L. Zhang, K. Xiao, Q. Liu, Y. Tao, and Y. Deng. Modeling social attention for stock analysis: An influence propagation perspective. In 2015 IEEE International Conference on Data Mining, pages 609--618. IEEE, 2015.Google ScholarDigital Library
Index Terms
- Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach
Recommendations
An Interactive Fusion Model for Hierarchical Multi-label Text Classification
Natural Language Processing and Chinese ComputingAbstractScientific research literature usually has multi-level labels, and there are often dependencies between multi-level labels. It is crucial for the model to learn and integrate the information between multi-level labels for the hierarchical multi-...
Cognitive structure learning model for hierarchical multi-label text classification
AbstractThe human mind grows in learning new knowledge, which finally organizes and develops a basic mental pattern called cognitive structure. Hierarchical multi-label text classification (HMLTC), a fundamental but challenging task in many real-world ...
Exploiting label dependency for hierarchical multi-label classification
PAKDD'12: Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part IHierarchical multi-label classification is a variant of traditional classification in which the instances can belong to several labels, that are in turn organized in a hierarchy. Existing hierarchical multi-label classification algorithms ignore ...
Comments