skip to main content
research-article

Deep Learning--based Text Classification: A Comprehensive Review

Published:17 April 2021Publication History
Skip Abstract Section

Abstract

Deep learning--based models have surpassed classical machine learning--based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this article, we provide a comprehensive review of more than 150 deep learning--based models for text classification developed in recent years, and we discuss their technical contributions, similarities, and strengths. We also provide a summary of more than 40 popular datasets widely used for text classification. Finally, we provide a quantitative analysis of the performance of different deep learning models on popular benchmarks, and we discuss future research directions.

References

  1. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Info. Sci. 41, 6 (1990), 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  2. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Feb. 2003), 1137--1155.Google ScholarGoogle Scholar
  3. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. MIT Press, 3111--3119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. Retrieved from https://arXiv:1802.05365 (2018).Google ScholarGoogle ScholarCross RefCross Ref
  5. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. MIT Press, 5998--6008.Google ScholarGoogle Scholar
  6. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Retrieved from https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.Google ScholarGoogle Scholar
  7. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.Retrieved from https://arXiv:1810.04805.Google ScholarGoogle Scholar
  8. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell et al. 2020. Language models are few-shot learners. Retrieved from https://arXiv:2005.14165.Google ScholarGoogle Scholar
  9. Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 2020. Gshard: Scaling giant models with conditional computation and automatic sharding. Retrieved from https://arXiv:2006.16668.Google ScholarGoogle Scholar
  10. Gary Marcus and Ernest Davis. 2019. Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon.Google ScholarGoogle Scholar
  11. Gary Marcus. 2020. The next decade in ai: Four steps towards robust artificial intelligence. Retrieved from https://arXiv:2002.06177.Google ScholarGoogle Scholar
  12. Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, and Douwe Kiela. 2019. Adversarial nli: A new benchmark for natural language understanding. Retrieved from https://arXiv:1910.14599.Google ScholarGoogle Scholar
  13. Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2019. Is bert really robust? Natural language attack on text classification and entailment. Retrieved from https://arXiv:1907.11932 2.Google ScholarGoogle Scholar
  14. Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, and Jianfeng Gao. 2020. Adversarial training for large neural language models. Retrieved from https://arXiv:2004.08994.Google ScholarGoogle Scholar
  15. Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to compose neural networks for question answering. Retrieved from https://arXiv:1601.01705.Google ScholarGoogle Scholar
  16. Mohit Iyyer, Wen-tau Yih, and Ming-Wei Chang. 2017. Search-based neural structured learning for sequential question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1821--1831.Google ScholarGoogle ScholarCross RefCross Ref
  17. Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, and Jianfeng Gao. 2019. Enhancing the transformer with explicit relational encoding for math problem solving. Retrieved from https://arXiv:1910.06611.Google ScholarGoogle Scholar
  18. Jianfeng Gao, Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, and Heung-Yeung Shum. 2020. Robust conversational AI with grounded text generation. Retrieved from https://arXiv:2009.03457.Google ScholarGoogle Scholar
  19. Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes, and Donald Brown. 2019. Text classification algorithms: A survey. Information 10, 4 (2019), 150.Google ScholarGoogle ScholarCross RefCross Ref
  20. Christopher D. Manning, Hinrich Schütze, and Prabhakar Raghavan. 2008. Introduction to Information Retrieval. Cambridge University Press.Google ScholarGoogle Scholar
  21. Daniel Jurasky and James H. Martin. 2008. Speech and language processing: An introduction to natural language Processing. Computational Linguistics and Speech Recognition. Prentice Hall, NJ.Google ScholarGoogle Scholar
  22. Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. Retrieved from https://arXiv:1804.07461.Google ScholarGoogle Scholar
  23. Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2019. Multi-task deep neural networks for natural language understanding. Retrieved from https://arXiv:1901.11504.Google ScholarGoogle Scholar
  24. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. Retrieved from https://arXiv:1606.05250.Google ScholarGoogle Scholar
  25. Marco Marelli, Luisa Bentivogli, Marco Baroni, Raffaella Bernardi, Stefano Menini, and Roberto Zamparelli. 2014. Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14). 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  26. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Retrieved from https://arXiv:1301.3781.Google ScholarGoogle Scholar
  28. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  29. Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daumé III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1681--1691.Google ScholarGoogle ScholarCross RefCross Ref
  30. Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. Fasttext. zip: Compressing text classification models. Retrieved from https://arXiv:1612.03651.Google ScholarGoogle Scholar
  31. Sida Wang and Christopher D. Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 90--94.Google ScholarGoogle Scholar
  32. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188--1196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. Retrieved from https://arXiv:1503.00075.Google ScholarGoogle Scholar
  34. Xiaodan Zhu, Parinaz Sobihani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In Proceedings of the International Conference on Machine Learning. 1604--1612.Google ScholarGoogle Scholar
  35. Jianpeng Cheng, Li Dong, and Mirella Lapata. 2016. Long short-term memory-networks for machine reading. Retrieved from https://arXiv:1601.06733.Google ScholarGoogle Scholar
  36. Pengfei Liu, Xipeng Qiu, Xinchi Chen, Shiyu Wu, and Xuan-Jing Huang. 2015. Multi-timescale long short-term memory neural network for modelling sentences and documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2326--2335.Google ScholarGoogle ScholarCross RefCross Ref
  37. Adji B. Dieng, Chong Wang, Jianfeng Gao, and John Paisley. 2016. Topicrnn: A recurrent neural network with long-range semantic dependency. Retrieved from https://arXiv:1611.01702.Google ScholarGoogle Scholar
  38. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. Retrieved from https://arXiv:1605.05101.Google ScholarGoogle Scholar
  39. Rie Johnson and Tong Zhang. 2016. Supervised and semi-supervised text categorization using LSTM for region embeddings. Retrieved from https://arXiv:1602.02373.Google ScholarGoogle Scholar
  40. Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. Retrieved from https://arXiv:1611.06639.Google ScholarGoogle Scholar
  41. Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. Retrieved from https://arXiv:1702.03814.Google ScholarGoogle Scholar
  42. Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A deep architecture for semantic matching with multiple positional sentence representations. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  43. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631--1642.Google ScholarGoogle Scholar
  44. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  45. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). DOI:http://dx.doi.org/10.3115/v1/p14-1062 arxiv:1404.2188Google ScholarGoogle ScholarCross RefCross Ref
  46. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). DOI:http://dx.doi.org/10.3115/v1/d14-1181 arxiv:1408.5882Google ScholarGoogle ScholarCross RefCross Ref
  47. Jingzhou Liu, Wei Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM Conference on Research and Development in Information Retrieval (SIGIR’17). DOI:http://dx.doi.org/10.1145/3077136.3080834Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Rie Johnson and Tong Zhang. 2015. Effective use of word order for text categorization with convolutional neural networks. In NAACL Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (HLT’15). DOI:http://dx.doi.org/10.3115/v1/n15-1011 arxiv:1412.1058Google ScholarGoogle ScholarCross RefCross Ref
  49. Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 562--570.Google ScholarGoogle ScholarCross RefCross Ref
  50. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. MIT Press, 649--657.Google ScholarGoogle Scholar
  51. Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  52. Joseph D. Prusa and Taghi M. Khoshgoftaar. 2016. Designing a better data representation for deep neural networks and text classification. In Proceedings of the IEEE 17th International Conference on Information Reuse and Integration (IRI’16). DOI:http://dx.doi.org/10.1109/IRI.2016.61Google ScholarGoogle Scholar
  53. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). Retrieved from https://arxiv:1409.1556.Google ScholarGoogle Scholar
  54. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. DOI:http://dx.doi.org/10.1109/CVPR.2016.90 arxiv:1512.03385Google ScholarGoogle ScholarCross RefCross Ref
  55. Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2016. Very deep convolutional networks for text classification. Retrieved from https://arXiv:1606.01781.Google ScholarGoogle Scholar
  56. Andréa B. Duque, Luã Lázaro J. Santos, David Macêdo, and Cleber Zanchettin. 2019. Squeezed very deep convolutional neural networks for text classification. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). DOI:http://dx.doi.org/10.1007/978-3-030-30487-4_16 Retrieved from https://arxiv:1901.09821.Google ScholarGoogle Scholar
  57. Hoa T. Le, Christophe Cerisara, and Alexandre Denis. 2018. Do convolutional networks need to be deep for text classification? In Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  58. Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). DOI:http://dx.doi.org/10.1109/CVPR.2017.243 arxiv:1608.06993Google ScholarGoogle Scholar
  59. Bao Guo, Chunxia Zhang, Junmin Liu, and Xiaoyi Ma. 2019. Improving text classification with weighted word embeddings via a multi-channel TextCNN model. Neurocomputing (2019). DOI:http://dx.doi.org/10.1016/j.neucom.2019.07.052Google ScholarGoogle Scholar
  60. Ye Zhang and Byron Wallace. 2015. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. Retrieved from https://arXiv:1510.03820.Google ScholarGoogle Scholar
  61. Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, and Zhi Jin. 2015. Natural language inference by tree-based convolution and heuristic matching. Retrieved from https://arXiv:1512.08422.Google ScholarGoogle Scholar
  62. Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). Retrieved from https://arxiv:1602.06359.Google ScholarGoogle Scholar
  63. Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. 2017. Combining knowledge with deep convolutional neural networks for short text classification. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’17). DOI:http://dx.doi.org/10.24963/ijcai.2017/406Google ScholarGoogle ScholarCross RefCross Ref
  64. Sarvnaz Karimi, Xiang Dai, Hamedh Hassanzadeh, and Anthony Nguyen. 2017. Automatic diagnosis coding of radiology reports: A comparison of deep learning and conventional classification methods. BioNLP. DOI:http://dx.doi.org/10.18653/v1/w17-2342Google ScholarGoogle Scholar
  65. Shengwen Peng, Ronghui You, Hongning Wang, Chengxiang Zhai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2016. DeepMeSH: Deep semantic representation for improving large-scale MeSH indexing. Bioinformatics (2016). Retrieved from DOI:http://dx.doi.org/10.1093/bioinformatics/btw294Google ScholarGoogle Scholar
  66. Anthony Rios and Ramakanth Kavuluru. 2015. Convolutional neural networks for biomedical text classification: Application in indexing biomedical articles. In Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB’15). DOI:http://dx.doi.org/10.1145/2808719.2808746Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Mark Hughes, Irene Li, Spyros Kotoulas, and Toyotaro Suzumura. 2017. Medical text classification using convolutional neural networks. Studies Health Technol. Info. (2017). DOI:http://dx.doi.org/10.3233/978-1-61499-753-5-246 Retrieved from https://arxiv:1704.06841.Google ScholarGoogle Scholar
  68. Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. 2011. Transforming auto-encoders. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 44--51.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Advances in Neural Information Processing Systems. MIT Press, 3856--3866.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Sara Sabour, Nicholas Frosst, and Geoffrey Hinton. 2018. Matrix capsules with EM routing. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). 1--15.Google ScholarGoogle Scholar
  71. Wei Zhao, Jianbo Ye, Min Yang, Zeyang Lei, Suofei Zhang, and Zhou Zhao. 2018. Investigating capsule networks with dynamic routing for text classification. Retrieved from https://arXiv:1804.00538.Google ScholarGoogle Scholar
  72. Min Yang, Wei Zhao, Lei Chen, Qiang Qu, Zhou Zhao, and Ying Shen. 2019. Investigating the transferring capability of capsule networks for text classification. Neural Netw. 118 (2019), 247--261.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Wei Zhao, Haiyun Peng, Steffen Eger, Erik Cambria, and Min Yang. 2019. Towards scalable and reliable capsule networks for challenging NLP applications. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’19). 1549--1559.Google ScholarGoogle ScholarCross RefCross Ref
  74. Jaeyoung Kim, Sion Jang, Eunjeong Park, and Sungchul Choi. 2020. Text classification using capsules. Neurocomputing 376 (2020), 214--221.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Rami Aly, Steffen Remus, and Chris Biemann. 2019. Hierarchical multi-label classification of text with capsule networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 323--330.Google ScholarGoogle ScholarCross RefCross Ref
  76. Hao Ren and Hong Lu. 2018. Compositional coding capsule network with k-means routing for text classification. Retrieved from https://arXiv:1810.09177.Google ScholarGoogle Scholar
  77. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Retrieved from https://arXiv:1409.0473.Google ScholarGoogle Scholar
  78. Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. Retrieved from https://arXiv:1508.04025.Google ScholarGoogle Scholar
  79. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480--1489.Google ScholarGoogle ScholarCross RefCross Ref
  80. Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2016. Attention-based LSTM network for cross-lingual sentiment classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 247--256.Google ScholarGoogle ScholarCross RefCross Ref
  81. Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. 2018. Disan: Directional self-attention network for rnn/cnn-free language understanding. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  82. Yang Liu, Chengjie Sun, Lei Lin, and Xiaolong Wang. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. Retrieved from https://arXiv:1605.09090.Google ScholarGoogle Scholar
  83. Cicero dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou. 2016. Attentive pooling networks. Retrieved from https://arXiv:1602.03609.Google ScholarGoogle Scholar
  84. Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. 2018. Joint embedding of words and labels for text classification. Retrieved from https://arXiv:1805.04174.Google ScholarGoogle Scholar
  85. Seonhoon Kim, Inho Kang, and Nojun Kwak. 2019. Semantic sentence matching with densely-connected recurrent and co-attentive information. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6586--6593.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Ling. 4 (2016), 259--272.Google ScholarGoogle ScholarCross RefCross Ref
  87. Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, and Ming Zhou. 2018. Multiway attention networks for modeling sentence pairs. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 4411--4417.Google ScholarGoogle ScholarCross RefCross Ref
  88. Liu Yang, Qingyao Ai, Jiafeng Guo, and W. Bruce Croft. 2016. aNMM: Ranking short answer texts with attention-based neural matching model. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 287--296.Google ScholarGoogle Scholar
  89. Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. Retrieved from https://arXiv:1703.03130.Google ScholarGoogle Scholar
  90. Shiyao Wang, Minlie Huang, and Zhidong Deng. 2018. Densely connected CNN with multi-scale feature attention for text classification. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 4468--4474.Google ScholarGoogle ScholarCross RefCross Ref
  91. Ikuya Yamada and Hiroyuki Shindo. 2019. Neural attentive bag-of-entities model for text classification. Retrieved from https://arXiv:1909.01259.Google ScholarGoogle Scholar
  92. Ankur P. Parikh, Oscar Tackstrom, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. Retrieved from https://arXiv:1606.01933.Google ScholarGoogle Scholar
  93. Qian Chen, Zhen-Hua Ling, and Xiaodan Zhu. 2018. Enhancing sentence embedding with generalized pooling. Retrieved from https://arXiv:1806.09828.Google ScholarGoogle Scholar
  94. Mohammad Ehsan Basiri, Shahla Nemati, Moloud Abdar, Erik Cambria, and U. Rajendra Acharya. 2020. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Gen. Comput. Syst. 115 (2020), 279--294.Google ScholarGoogle ScholarCross RefCross Ref
  95. Tsendsuren Munkhdalai and Hong Yu. 2017. Neural semantic encoders. In Proceedings of the Conference of the Association for Computational Linguistics, Vol. 1. NIH Public Access, 397.Google ScholarGoogle ScholarCross RefCross Ref
  96. Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory networks. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). Retrieved from https://arxiv:1410.3916.Google ScholarGoogle Scholar
  97. Sainbayar Sukhbaatar, Jason Weston, Rob Fergus et al. 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems. MIT Press, 2440--2448.Google ScholarGoogle Scholar
  98. Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. 2016. Ask me anything: Dynamic memory networks for natural language processing. In Proceedings of the 33rd International Conference on Machine Learning (ICML’16). Retrieved from https://arXiv:1506.07285.Google ScholarGoogle Scholar
  99. Caiming Xiong, Stephen Merity, and Richard Socher. 2016. Dynamic memory networks for visual and textual question answering. In Proceedings of the 33rd International Conference on Machine Learning (ICML’16). Retrieved from https://arxiv:1603.01417.Google ScholarGoogle Scholar
  100. Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 404--411.Google ScholarGoogle Scholar
  101. Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A comprehensive survey on graph neural networks. Retrieved from https://arXiv:1901.00596.Google ScholarGoogle Scholar
  102. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Retrieved from https://arXiv:1609.02907.Google ScholarGoogle Scholar
  103. Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. MIT Press, 1024--1034.Google ScholarGoogle Scholar
  104. Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. Retrieved from https://arXiv:1710.10903.Google ScholarGoogle Scholar
  105. Hao Peng, Jianxin Li, Yu He, Yaopeng Liu, Mengjiao Bao, Lihong Wang, Yangqiu Song, and Qiang Yang. 2018. Large-scale hierarchical text classification with recursively regularized deep graph-cnn. In Proceedings of the World Wide Web Conference. International World Wide Web Conferences Steering Committee, 1063--1072.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Hao Peng, Jianxin Li, Qiran Gong, Senzhang Wang, Lifang He, Bo Li, Lihong Wang, and Philip S. Yu. 2019. Hierarchical taxonomy-aware and attentional graph capsule RCNNs for large-scale multi-label text classification. Retrieved from https://arXiv:1906.04898.Google ScholarGoogle Scholar
  107. Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7370--7377.Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr., Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. 2019. Simplifying graph convolutional networks. Retrieved from https://arXiv:1902.07153.Google ScholarGoogle Scholar
  109. Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong Zhang, and Houfeng Wang. 2019. Text level graph neural network for text classification. Retrieved from https://arXiv:1910.02356.Google ScholarGoogle Scholar
  110. Pengfei Liu, Shuaichen Chang, Xuanjing Huang, Jian Tang, and Jackie Chi Kit Cheung. 2019. Contextualized non-local neural networks for sequence learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6762--6769.Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Jane Bromley, James W. Bentz, Leon Bottou, Isabelle Guyon, Yann Lecun, Cliff Moore, Eduard Sackinger, and Roopak Shah. 1993. Signature verification using a Siamese time delay neural network. Int. J. Pattern Recogn. Artific. Intell. 7, 4 (1993), 669--688. DOI:http://dx.doi.org/10.1142/s0218001493000339Google ScholarGoogle ScholarCross RefCross Ref
  112. Wen tau Yih, Kristina Toutanova, John C. Platt, and Christopher Meek. 2011. Learning discriminative projections for text similarity measures. In Proceedings of the 15th Conference on Computational Natural Language Learning (CoNLL’11).Google ScholarGoogle Scholar
  113. Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2333--2338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the ACM International Conference on Conference on Information and Knowledge Management. ACM, 101--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Jianfeng Gao, Michel Galley, and Lihong Li. 2019. Neural approaches to conversational ai. Found. Trends Info. Retriev. 13, 2-3 (2019), 127--298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Aliaksei Severyn and Alessandro Moschittiy. 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’15). DOI:http://dx.doi.org/10.1145/2766462.2767738Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Arpita Das, Harish Yenala, Manoj Chinnakotla, and Manish Shrivastava. 2016. Together we stand: Siamese networks for similar question retrieval. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). DOI:http://dx.doi.org/10.18653/v1/p16-1036Google ScholarGoogle ScholarCross RefCross Ref
  118. Ming Tan, Cicero Dos Santos, Bing Xiang, and Bowen Zhou. 2016. Improved representation learning for question answer matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). DOI:http://dx.doi.org/10.18653/v1/p16-1044Google ScholarGoogle ScholarCross RefCross Ref
  119. Jonas Mueller and Aditya Thyagarajan. 2016. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16).Google ScholarGoogle Scholar
  120. Paul Neculoiu, Maarten Versteegh, and Mihai Rotaru. 2016. Learning text similarity with siamese recurrent networks. Retrieved from DOI:http://dx.doi.org/10.18653/v1/w16-1617.Google ScholarGoogle Scholar
  121. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Modelling interaction of sentence pair with coupled-lstms. Retrieved from https://arXiv:1605.05573.Google ScholarGoogle Scholar
  122. Hua He, Kevin Gimpel, and Jimmy Lin. 2015. Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). DOI:http://dx.doi.org/10.18653/v1/d15-1181Google ScholarGoogle ScholarCross RefCross Ref
  123. Tom Renter, Alexey Borisov, and Maarten De Rijke. 2016. Siamese CBOW: Optimizing word embeddings for sentence representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). DOI:http://dx.doi.org/10.18653/v1/p16-1089 arxiv:1606.04640Google ScholarGoogle Scholar
  124. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-Networks. DOI:http://dx.doi.org/10.18653/v1/d19-1410 Retrieved from https://arxiv:1908.10084.Google ScholarGoogle Scholar
  125. Wenhao Lu, Jian Jiao, and Ruofei Zhang. 2020. TwinBERT: Distilling knowledge to twin-structured BERT models for efficient retrieval. Retrieved from https://arXiv:2002.06275.Google ScholarGoogle Scholar
  126. Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou. 2015. LSTM-based deep learning models for non-factoid answer selection. Retrieved from https://arXiv:1511.04108.Google ScholarGoogle Scholar
  127. Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Hyperbolic representation learning for fast and efficient neural question answering. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 583--591.Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Shervin Minaee and Zhu Liu. 2017. Automatic question-answering using a deep similarity neural network. In Proceedings of the IEEE Global Conference on Signal and Information Processing (GlobalSIP’17). IEEE, 923--927.Google ScholarGoogle ScholarCross RefCross Ref
  129. Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. 2015. A C-LSTM neural network for text classification. Retrieved from https://arXiv:1511.08630.Google ScholarGoogle Scholar
  130. Rui Zhang, Honglak Lee, and Dragomir Radev. 2016. Dependency sensitive convolutional neural networks for modeling sentences and documents. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT’16). DOI:http://dx.doi.org/10.18653/v1/n16-1177 arxiv:1611.02361Google ScholarGoogle ScholarCross RefCross Ref
  131. Guibin Chen, Deheng Ye, Erik Cambria, Jieshan Chen, and Zhenchang Xing. 2017. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’17). 2377--2383.Google ScholarGoogle ScholarCross RefCross Ref
  132. Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1422--1432.Google ScholarGoogle ScholarCross RefCross Ref
  133. Yijun Xiao and Kyunghyun Cho. 2016. Efficient character-level document classification by combining convolution and recurrent layers. Retrieved from https://arXiv:1602.00367.Google ScholarGoogle Scholar
  134. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  135. Tao Chen, Ruifeng Xu, Yulan He, and Xuan Wang. 2017. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst. Appl. 72 (2017), 221--230. DOI:http://dx.doi.org/10.1016/j.eswa.2016.10.065Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Kamran Kowsari, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S. Gerber, and Laura E. Barnes. 2017. Hdltex: Hierarchical deep learning for text classification. In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA’17). IEEE, 364--371.Google ScholarGoogle Scholar
  137. Xiaodong Liu, Yelong Shen, Kevin Duh, and Jianfeng Gao. 2017. Stochastic answer networks for machine reading comprehension. Retrieved from https://arXiv:1712.03556.Google ScholarGoogle Scholar
  138. Rupesh Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training very deep networks. In Advances in Neural Information Processing Systems. Retrieved from https://arxiv:1507.06228.Google ScholarGoogle Scholar
  139. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  140. Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-Aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). Retrieved from https://arxiv:1508.06615.Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutnik, and Jürgen Schmidhuber. 2017. Recurrent highway networks. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). Retrieved from https://arxiv:1607.03474.Google ScholarGoogle Scholar
  142. Ying Wen, Weinan Zhang, Rui Luo, and Jun Wang. 2016. Learning text representation using recurrent convolutional neural network with highway layers. Retrieved from https://arXiv:1606.06905.Google ScholarGoogle Scholar
  143. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (Aug. 2011), 2493--2537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  145. Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained models for natural language processing: A survey. Retrieved from https://arXiv:2003.08271.Google ScholarGoogle Scholar
  146. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. Retrieved from https://arXiv:1907.11692.Google ScholarGoogle Scholar
  147. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. Retrieved from https://arXiv:1909.11942.Google ScholarGoogle Scholar
  148. Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. Retrieved from https://arXiv:1910.01108.Google ScholarGoogle Scholar
  149. Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2019. Spanbert: Improving pre-training by representing and predicting spans. Retrieved from https://arXiv:1907.10529.Google ScholarGoogle Scholar
  150. Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. Retrieved from https://arXiv:2003.10555.Google ScholarGoogle Scholar
  151. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. Retrieved from https://arXiv:1904.09223.Google ScholarGoogle Scholar
  152. Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20). 8968--8975.Google ScholarGoogle ScholarCross RefCross Ref
  153. Siddhant Garg, Thuy Vu, and Alessandro Moschitti. 2019. TANDA: Transfer and adapt pre-trained transformer models for answer sentence selection. Retrieved from https://arXiv:1911.04118.Google ScholarGoogle Scholar
  154. Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune BERT for text classification?. In China National Conference on Chinese Computational Linguistics. Springer, 194--206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, and Xiang Zhou. 2019. Semantics-aware BERT for language understanding. Retrieved from https://arXiv:1909.02209.Google ScholarGoogle Scholar
  156. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems. MIT Press, 5754--5764.Google ScholarGoogle Scholar
  157. Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems. MIT Press, 13042--13054.Google ScholarGoogle Scholar
  158. Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou et al. 2020. UniLMv2: Pseudo-masked language models for unified language model pre-training. Retrieved from https://arXiv:2002.12804.Google ScholarGoogle Scholar
  159. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. Retrieved from https://arXiv:1910.10683.Google ScholarGoogle Scholar
  160. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1985. Learning Internal Representations by Error Propagation. Technical Report. University of California San Diego, La Jolla Institute for Cognitive Science.Google ScholarGoogle Scholar
  161. Ryan Kiros, Yukun Zhu, Russ R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in Neural Information Processing Systems. MIT Press, 3294--3302.Google ScholarGoogle Scholar
  162. Andrew M. Dai and Quoc V. Le. 2015. Semi-supervised sequence learning. In Advances in Neural Information Processing Systems. Retrieved from https://arxiv:1511.01432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. Minghua Zhang, Yunfang Wu, Weikang Li, and Wei Li. 2019. Learning universal sentence representations with mean-max attention autoencoder. DOI:http://dx.doi.org/10.18653/v1/d18-1481 Retrieved from https://arxiv:1809.06590.Google ScholarGoogle Scholar
  164. Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR’14). arxiv:1312.6114Google ScholarGoogle Scholar
  165. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. Proceedings of the International Conference on Machine Learning (ICML’14).Google ScholarGoogle Scholar
  166. Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In Proceedings of the International Conference on Machine Learning.Google ScholarGoogle Scholar
  167. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. DOI:http://dx.doi.org/10.18653/v1/k16-1002 Retrieved from https://arxiv:1511.06349.Google ScholarGoogle ScholarCross RefCross Ref
  168. Suchin Gururangan, Tam Dang, Dallas Card, and Noah A Smith. 2019. Variational pretraining for semi-supervised text classification. Retrieved from https://arXiv:1906.02242.Google ScholarGoogle Scholar
  169. Yu Meng, Jiaming Shen, Chao Zhang, and Jiawei Han. 2018. Weakly-supervised neural text classification. In Proceedings of the Conference on Information and Knowledge Management (CIKM’18).Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. Jiaao Chen, Zichao Yang, and Diyi Yang. 2020. MixText: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In Proceedings of the Meeting of the Association for Computational Linguistics (ACL’20).Google ScholarGoogle ScholarCross RefCross Ref
  171. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. Retrieved from https://arXiv:1412.6572.Google ScholarGoogle Scholar
  172. Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. 2016. Distributional smoothing with virtual adversarial training. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  173. Takeru Miyato, Andrew M. Dai, and Ian Goodfellow. 2016. Adversarial training methods for semi-supervised text classification. Retrieved from https://arXiv:1605.07725.Google ScholarGoogle Scholar
  174. Devendra Singh Sachan, Manzil Zaheer, and Ruslan Salakhutdinov. 2019. Revisiting LSTM networks for semi-supervised text classification via mixed objective function. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6940--6948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  175. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. Retrieved from https://arXiv:1704.05742.Google ScholarGoogle Scholar
  176. Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  177. Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Sen Wang, and Chengqi Zhang. 2018. Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling. Retrieved from https://arXiv:1801.10296.Google ScholarGoogle Scholar
  178. Xianggen Liu, Lili Mou, Haotian Cui, Zhengdong Lu, and Sen Song. 2020. Finding decision jumps in text classification. Neurocomputing 371 (2020), 177--187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen. 2017. Reasonet: Learning to stop reading in machine comprehension. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1047--1055.Google ScholarGoogle ScholarDigital LibraryDigital Library
  180. Yang Li, Quan Pan, Suhang Wang, Tao Yang, and Erik Cambria. 2018. A generative model for category text generation. Info. Sci. 450 (2018), 301--315.Google ScholarGoogle Scholar
  181. Tianyang Zhang, Minlie Huang, and Li Zhao. 2018. Learning structured representation for text classification via reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  182. Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2020. Domain-specific language model pretraining for biomedical natural language processing. Retrieved from https://arXiv:2007.15779.Google ScholarGoogle Scholar
  183. Subhabrata Mukherjee and Ahmed Hassan Awadallah. 2020. XtremeDistil: Multi-stage distillation for massive multilingual models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2221--2234.Google ScholarGoogle ScholarCross RefCross Ref
  184. Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy Lin. 2019. Distilling task-specific knowledge from BERT into simple neural networks. Retrieved from https://arXiv:1903.12136.Google ScholarGoogle Scholar
  185. kaggle.[n. d.]. Retrieved from https://www.kaggle.com/yelp-dataset/yelp-dataset.Google ScholarGoogle Scholar
  186. kaggle. [n. d.]. Retrieved from https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews.Google ScholarGoogle Scholar
  187. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing. 79--86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  188. Lingjia Deng and Janyce Wiebe. 2015. MPQA 3.0: An entity/event-level sentiment corpus. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1323--1328.Google ScholarGoogle ScholarCross RefCross Ref
  189. kaggle. [n.d.]. Retrieved from https://www.kaggle.com/datafiniti/consumer-reviews-of-amazon-products.Google ScholarGoogle Scholar
  190. 20 Newsgroups. [n.d.]. Retrieved from http://qwone.com/jason/20Newsgroups/.Google ScholarGoogle Scholar
  191. Reuters. [n.d.]. Retrieved from https://martin-thoma.com/nlp-reuters.Google ScholarGoogle Scholar
  192. Fang Wang, Zhongyuan Wang, Zhoujun Li, and Ji-Rong Wen. 2014. Concept-based short text classification and ranking. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 1069--1078.Google ScholarGoogle ScholarDigital LibraryDigital Library
  193. Derek Greene and Pádraig Cunningham. 2006. Practical solutions to the problem of diagonal dominance in kernel document clustering. In Proceedings of the 23rd International Conference on Machine learning (ICML’06). ACM Press, 377--384.Google ScholarGoogle ScholarDigital LibraryDigital Library
  194. Abhinandan S. Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: Scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web. ACM, 271--280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  195. Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer et al. 2015. DBpedia—A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.Google ScholarGoogle ScholarCross RefCross Ref
  196. Ohsumed. [n.d.]. Retrieved from http://davis.wpi.edu/xmdv/datasets/ohsumed.html.Google ScholarGoogle Scholar
  197. Eneldo Loza Mencia and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 50--65.Google ScholarGoogle ScholarDigital LibraryDigital Library
  198. Zhiyong Lu. 2011. PubMed and beyond: A survey of web tools for searching biomedical literature. Retrieved from https://pubmed.ncbi.nlm.nih.gov/21245076/.Google ScholarGoogle Scholar
  199. Franck Dernoncourt and Ji Young Lee. 2017. Pubmed 200k rct: A dataset for sequential sentence classification in medical abstracts. Retrieved from https://arXiv:1710.06071.Google ScholarGoogle Scholar
  200. Byron C. Wallace, Laura Kertz, Eugene Charniak et al. 2014. Humans require context to infer ironic intent (so computers probably do, too). In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 512--516.Google ScholarGoogle ScholarCross RefCross Ref
  201. Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. Retrieved from https://arXiv preprint:1806.03822.Google ScholarGoogle Scholar
  202. Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human-generated machine reading comprehension dataset. CoCo@ NIPS.Google ScholarGoogle Scholar
  203. University of Pennsylvania [n.d.]. Retrieved from https://cogcomp.seas.upenn.edu/Data/QA/QC/.Google ScholarGoogle Scholar
  204. Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013--2018.Google ScholarGoogle ScholarCross RefCross Ref
  205. Quora. [n.d.]. Retrieved from https://data.quora.com/First-Quora-Dataset-Release-QuestionPairs.Google ScholarGoogle Scholar
  206. Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. Swag: A large-scale adversarial dataset for grounded commonsense inference. Retrieved from https://arXiv:1808.05326.Google ScholarGoogle Scholar
  207. Tomasz Jurczyk, Michael Zhai, and Jinho D. Choi. 2016. Selqa: A new benchmark for selection-based question answering. In 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 820--827.Google ScholarGoogle Scholar
  208. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. Retrieved from https://arXiv:1508.05326.Google ScholarGoogle Scholar
  209. Adina Williams, Nikita Nangia, and Samuel R Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. Retrieved from https://arXiv:1704.05426.Google ScholarGoogle Scholar
  210. Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics. ACL, 350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  211. Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. 2017. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. Retrieved from https://arXiv:1708.00055.Google ScholarGoogle Scholar
  212. Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). DOI:http://dx.doi.org/10.1007/11736790_9Google ScholarGoogle Scholar
  213. Tushar Khot, Ashish Sabharwal, and Peter Clark. 2018. Scitail: A textual entailment dataset from science question answering. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google ScholarGoogle Scholar
  214. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142--150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  215. Justin Christopher Martineau and Tim Finin. 2009. Delta tfidf: An improved feature space for sentiment analysis. In Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media.Google ScholarGoogle Scholar
  216. Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. Retrieved from https://arXiv:1801.06146.Google ScholarGoogle Scholar
  217. Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Advances in Neural Information Processing Systems. MIT Press, 6294--6305.Google ScholarGoogle Scholar
  218. Scott Gray, Alec Radford, and Diederik P. Kingma. 2017. Gpu kernels for block-sparse weights. Retrieved from https://arXiv:1711.09224.Google ScholarGoogle Scholar
  219. Alexander Ratner, Braden Hancock, Jared Dunnmon, Frederic Sala, Shreyash Pandey, and Christopher Ré. 2019. Training complex models with multi-task weak supervision. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4763--4771.Google ScholarGoogle ScholarDigital LibraryDigital Library
  220. Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. 2019. Unsupervised data augmentation. Retrieved from https://arXiv:1904.12848.Google ScholarGoogle Scholar
  221. Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. 957--966.Google ScholarGoogle Scholar
  222. Matthew Richardson, Christopher J. C. Burges, and Erin Renshaw. 2013. Mctest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 193--203.Google ScholarGoogle Scholar
  223. Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, and Weizhu Chen. 2017. Fusionnet: Fusing via fully-aware attention with application to machine comprehension. Retrieved from https://arXiv:1711.07341.Google ScholarGoogle Scholar
  224. Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Recurrent neural network-based sentence encoder with gated attention for natural language inference. Retrieved from https://arXiv:1708.01353Google ScholarGoogle Scholar
  225. Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai, and Xiaofei He. 2019. Discourse marker augmented network with reinforcement learning for natural language inference. Retrieved from https://arXiv:1907.09692.Google ScholarGoogle Scholar
  226. Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Retrieved from https://arXiv:2002.10957.Google ScholarGoogle Scholar

Index Terms

  1. Deep Learning--based Text Classification: A Comprehensive Review

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Computing Surveys
          ACM Computing Surveys  Volume 54, Issue 3
          April 2022
          836 pages
          ISSN:0360-0300
          EISSN:1557-7341
          DOI:10.1145/3461619
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 April 2021
          • Accepted: 1 November 2020
          • Revised: 1 October 2020
          • Received: 1 April 2020
          Published in csur Volume 54, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format