research-article

Deep Learning--based Text Classification: A Comprehensive Review

Authors:
Shervin Minaee

Snapchat Inc., Seattle, WA

Snapchat Inc., Seattle, WA
View Profile

,
Nal Kalchbrenner

Google Brain, Amsterdam, Netherlands

Google Brain, Amsterdam, Netherlands
View Profile

,
Erik Cambria

Nanyang Technological University, Nanyang Ave, Singapore

Nanyang Technological University, Nanyang Ave, Singapore
View Profile

,
Narjes Nikzad

University of Tabriz, Bahman Boulevard, Iran

University of Tabriz, Bahman Boulevard, Iran
View Profile

,
Meysam Chenaghlu

University of Tabriz, Bahman Boulevard, Iran

University of Tabriz, Bahman Boulevard, Iran
View Profile

,
Jianfeng Gao

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 54 Issue 3Article No.: 62pp 1–40https://doi.org/10.1145/3439726

Published:17 April 2021Publication History

ACM Computing Surveys

Abstract

Deep learning--based models have surpassed classical machine learning--based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this article, we provide a comprehensive review of more than 150 deep learning--based models for text classification developed in recent years, and we discuss their technical contributions, similarities, and strengths. We also provide a summary of more than 40 popular datasets widely used for text classification. Finally, we provide a quantitative analysis of the performance of different deep learning models on popular benchmarks, and we discuss future research directions.

References

Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Info. Sci. 41, 6 (1990), 391--407.Google ScholarCross Ref
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Feb. 2003), 1137--1155.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. MIT Press, 3111--3119.Google ScholarDigital Library
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. Retrieved from https://arXiv:1802.05365 (2018).Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. MIT Press, 5998--6008.Google Scholar
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Retrieved from https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.Retrieved from https://arXiv:1810.04805.Google Scholar
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell et al. 2020. Language models are few-shot learners. Retrieved from https://arXiv:2005.14165.Google Scholar
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 2020. Gshard: Scaling giant models with conditional computation and automatic sharding. Retrieved from https://arXiv:2006.16668.Google Scholar
Gary Marcus and Ernest Davis. 2019. Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon.Google Scholar
Gary Marcus. 2020. The next decade in ai: Four steps towards robust artificial intelligence. Retrieved from https://arXiv:2002.06177.Google Scholar
Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, and Douwe Kiela. 2019. Adversarial nli: A new benchmark for natural language understanding. Retrieved from https://arXiv:1910.14599.Google Scholar
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2019. Is bert really robust? Natural language attack on text classification and entailment. Retrieved from https://arXiv:1907.11932 2.Google Scholar
Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, and Jianfeng Gao. 2020. Adversarial training for large neural language models. Retrieved from https://arXiv:2004.08994.Google Scholar
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to compose neural networks for question answering. Retrieved from https://arXiv:1601.01705.Google Scholar
Mohit Iyyer, Wen-tau Yih, and Ming-Wei Chang. 2017. Search-based neural structured learning for sequential question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1821--1831.Google ScholarCross Ref
Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, and Jianfeng Gao. 2019. Enhancing the transformer with explicit relational encoding for math problem solving. Retrieved from https://arXiv:1910.06611.Google Scholar
Jianfeng Gao, Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, and Heung-Yeung Shum. 2020. Robust conversational AI with grounded text generation. Retrieved from https://arXiv:2009.03457.Google Scholar
Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes, and Donald Brown. 2019. Text classification algorithms: A survey. Information 10, 4 (2019), 150.Google ScholarCross Ref
Christopher D. Manning, Hinrich Schütze, and Prabhakar Raghavan. 2008. Introduction to Information Retrieval. Cambridge University Press.Google Scholar
Daniel Jurasky and James H. Martin. 2008. Speech and language processing: An introduction to natural language Processing. Computational Linguistics and Speech Recognition. Prentice Hall, NJ.Google Scholar
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. Retrieved from https://arXiv:1804.07461.Google Scholar
Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2019. Multi-task deep neural networks for natural language understanding. Retrieved from https://arXiv:1901.11504.Google Scholar
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. Retrieved from https://arXiv:1606.05250.Google Scholar
Marco Marelli, Luisa Bentivogli, Marco Baroni, Raffaella Bernardi, Stefano Menini, and Roberto Zamparelli. 2014. Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14). 1--8.Google ScholarCross Ref
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google ScholarDigital Library
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Retrieved from https://arXiv:1301.3781.Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google ScholarCross Ref
Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daumé III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1681--1691.Google ScholarCross Ref
Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. Fasttext. zip: Compressing text classification models. Retrieved from https://arXiv:1612.03651.Google Scholar
Sida Wang and Christopher D. Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 90--94.Google Scholar
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188--1196.Google ScholarDigital Library
Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. Retrieved from https://arXiv:1503.00075.Google Scholar
Xiaodan Zhu, Parinaz Sobihani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In Proceedings of the International Conference on Machine Learning. 1604--1612.Google Scholar
Jianpeng Cheng, Li Dong, and Mirella Lapata. 2016. Long short-term memory-networks for machine reading. Retrieved from https://arXiv:1601.06733.Google Scholar
Pengfei Liu, Xipeng Qiu, Xinchi Chen, Shiyu Wu, and Xuan-Jing Huang. 2015. Multi-timescale long short-term memory neural network for modelling sentences and documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2326--2335.Google ScholarCross Ref
Adji B. Dieng, Chong Wang, Jianfeng Gao, and John Paisley. 2016. Topicrnn: A recurrent neural network with long-range semantic dependency. Retrieved from https://arXiv:1611.01702.Google Scholar
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. Retrieved from https://arXiv:1605.05101.Google Scholar
Rie Johnson and Tong Zhang. 2016. Supervised and semi-supervised text categorization using LSTM for region embeddings. Retrieved from https://arXiv:1602.02373.Google Scholar
Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. Retrieved from https://arXiv:1611.06639.Google Scholar
Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. Retrieved from https://arXiv:1702.03814.Google Scholar
Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A deep architecture for semantic matching with multiple positional sentence representations. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.Google Scholar
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631--1642.Google Scholar
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). DOI:http://dx.doi.org/10.3115/v1/p14-1062 arxiv:1404.2188Google ScholarCross Ref
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). DOI:http://dx.doi.org/10.3115/v1/d14-1181 arxiv:1408.5882Google ScholarCross Ref
Jingzhou Liu, Wei Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM Conference on Research and Development in Information Retrieval (SIGIR’17). DOI:http://dx.doi.org/10.1145/3077136.3080834Google ScholarDigital Library
Rie Johnson and Tong Zhang. 2015. Effective use of word order for text categorization with convolutional neural networks. In NAACL Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (HLT’15). DOI:http://dx.doi.org/10.3115/v1/n15-1011 arxiv:1412.1058Google ScholarCross Ref
Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 562--570.Google ScholarCross Ref
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. MIT Press, 649--657.Google Scholar
Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.Google Scholar
Joseph D. Prusa and Taghi M. Khoshgoftaar. 2016. Designing a better data representation for deep neural networks and text classification. In Proceedings of the IEEE 17th International Conference on Information Reuse and Integration (IRI’16). DOI:http://dx.doi.org/10.1109/IRI.2016.61Google Scholar
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). Retrieved from https://arxiv:1409.1556.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. DOI:http://dx.doi.org/10.1109/CVPR.2016.90 arxiv:1512.03385Google ScholarCross Ref
Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2016. Very deep convolutional networks for text classification. Retrieved from https://arXiv:1606.01781.Google Scholar
Andréa B. Duque, Luã Lázaro J. Santos, David Macêdo, and Cleber Zanchettin. 2019. Squeezed very deep convolutional neural networks for text classification. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). DOI:http://dx.doi.org/10.1007/978-3-030-30487-4_16 Retrieved from https://arxiv:1901.09821.Google Scholar
Hoa T. Le, Christophe Cerisara, and Alexandre Denis. 2018. Do convolutional networks need to be deep for text classification? In Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). DOI:http://dx.doi.org/10.1109/CVPR.2017.243 arxiv:1608.06993Google Scholar
Bao Guo, Chunxia Zhang, Junmin Liu, and Xiaoyi Ma. 2019. Improving text classification with weighted word embeddings via a multi-channel TextCNN model. Neurocomputing (2019). DOI:http://dx.doi.org/10.1016/j.neucom.2019.07.052Google Scholar
Ye Zhang and Byron Wallace. 2015. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. Retrieved from https://arXiv:1510.03820.Google Scholar
Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, and Zhi Jin. 2015. Natural language inference by tree-based convolution and heuristic matching. Retrieved from https://arXiv:1512.08422.Google Scholar
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). Retrieved from https://arxiv:1602.06359.Google Scholar
Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. 2017. Combining knowledge with deep convolutional neural networks for short text classification. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’17). DOI:http://dx.doi.org/10.24963/ijcai.2017/406Google ScholarCross Ref
Sarvnaz Karimi, Xiang Dai, Hamedh Hassanzadeh, and Anthony Nguyen. 2017. Automatic diagnosis coding of radiology reports: A comparison of deep learning and conventional classification methods. BioNLP. DOI:http://dx.doi.org/10.18653/v1/w17-2342Google Scholar
Shengwen Peng, Ronghui You, Hongning Wang, Chengxiang Zhai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2016. DeepMeSH: Deep semantic representation for improving large-scale MeSH indexing. Bioinformatics (2016). Retrieved from DOI:http://dx.doi.org/10.1093/bioinformatics/btw294Google Scholar
Anthony Rios and Ramakanth Kavuluru. 2015. Convolutional neural networks for biomedical text classification: Application in indexing biomedical articles. In Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB’15). DOI:http://dx.doi.org/10.1145/2808719.2808746Google ScholarDigital Library
Mark Hughes, Irene Li, Spyros Kotoulas, and Toyotaro Suzumura. 2017. Medical text classification using convolutional neural networks. Studies Health Technol. Info. (2017). DOI:http://dx.doi.org/10.3233/978-1-61499-753-5-246 Retrieved from https://arxiv:1704.06841.Google Scholar
Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. 2011. Transforming auto-encoders. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 44--51.Google ScholarDigital Library
Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Advances in Neural Information Processing Systems. MIT Press, 3856--3866.Google ScholarDigital Library
Sara Sabour, Nicholas Frosst, and Geoffrey Hinton. 2018. Matrix capsules with EM routing. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). 1--15.Google Scholar
Wei Zhao, Jianbo Ye, Min Yang, Zeyang Lei, Suofei Zhang, and Zhou Zhao. 2018. Investigating capsule networks with dynamic routing for text classification. Retrieved from https://arXiv:1804.00538.Google Scholar
Min Yang, Wei Zhao, Lei Chen, Qiang Qu, Zhou Zhao, and Ying Shen. 2019. Investigating the transferring capability of capsule networks for text classification. Neural Netw. 118 (2019), 247--261.Google ScholarDigital Library
Wei Zhao, Haiyun Peng, Steffen Eger, Erik Cambria, and Min Yang. 2019. Towards scalable and reliable capsule networks for challenging NLP applications. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’19). 1549--1559.Google ScholarCross Ref
Jaeyoung Kim, Sion Jang, Eunjeong Park, and Sungchul Choi. 2020. Text classification using capsules. Neurocomputing 376 (2020), 214--221.Google ScholarDigital Library
Rami Aly, Steffen Remus, and Chris Biemann. 2019. Hierarchical multi-label classification of text with capsule networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 323--330.Google ScholarCross Ref
Hao Ren and Hong Lu. 2018. Compositional coding capsule network with k-means routing for text classification. Retrieved from https://arXiv:1810.09177.Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Retrieved from https://arXiv:1409.0473.Google Scholar
Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. Retrieved from https://arXiv:1508.04025.Google Scholar
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480--1489.Google ScholarCross Ref
Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2016. Attention-based LSTM network for cross-lingual sentiment classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 247--256.Google ScholarCross Ref
Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. 2018. Disan: Directional self-attention network for rnn/cnn-free language understanding. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
Yang Liu, Chengjie Sun, Lei Lin, and Xiaolong Wang. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. Retrieved from https://arXiv:1605.09090.Google Scholar
Cicero dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou. 2016. Attentive pooling networks. Retrieved from https://arXiv:1602.03609.Google Scholar
Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. 2018. Joint embedding of words and labels for text classification. Retrieved from https://arXiv:1805.04174.Google Scholar
Seonhoon Kim, Inho Kang, and Nojun Kwak. 2019. Semantic sentence matching with densely-connected recurrent and co-attentive information. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6586--6593.Google ScholarDigital Library
Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Ling. 4 (2016), 259--272.Google ScholarCross Ref
Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, and Ming Zhou. 2018. Multiway attention networks for modeling sentence pairs. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 4411--4417.Google ScholarCross Ref
Liu Yang, Qingyao Ai, Jiafeng Guo, and W. Bruce Croft. 2016. aNMM: Ranking short answer texts with attention-based neural matching model. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 287--296.Google Scholar
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. Retrieved from https://arXiv:1703.03130.Google Scholar
Shiyao Wang, Minlie Huang, and Zhidong Deng. 2018. Densely connected CNN with multi-scale feature attention for text classification. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 4468--4474.Google ScholarCross Ref
Ikuya Yamada and Hiroyuki Shindo. 2019. Neural attentive bag-of-entities model for text classification. Retrieved from https://arXiv:1909.01259.Google Scholar
Ankur P. Parikh, Oscar Tackstrom, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. Retrieved from https://arXiv:1606.01933.Google Scholar
Qian Chen, Zhen-Hua Ling, and Xiaodan Zhu. 2018. Enhancing sentence embedding with generalized pooling. Retrieved from https://arXiv:1806.09828.Google Scholar
Mohammad Ehsan Basiri, Shahla Nemati, Moloud Abdar, Erik Cambria, and U. Rajendra Acharya. 2020. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Gen. Comput. Syst. 115 (2020), 279--294.Google ScholarCross Ref
Tsendsuren Munkhdalai and Hong Yu. 2017. Neural semantic encoders. In Proceedings of the Conference of the Association for Computational Linguistics, Vol. 1. NIH Public Access, 397.Google ScholarCross Ref
Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory networks. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). Retrieved from https://arxiv:1410.3916.Google Scholar
Sainbayar Sukhbaatar, Jason Weston, Rob Fergus et al. 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems. MIT Press, 2440--2448.Google Scholar
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. 2016. Ask me anything: Dynamic memory networks for natural language processing. In Proceedings of the 33rd International Conference on Machine Learning (ICML’16). Retrieved from https://arXiv:1506.07285.Google Scholar
Caiming Xiong, Stephen Merity, and Richard Socher. 2016. Dynamic memory networks for visual and textual question answering. In Proceedings of the 33rd International Conference on Machine Learning (ICML’16). Retrieved from https://arxiv:1603.01417.Google Scholar
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 404--411.Google Scholar
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A comprehensive survey on graph neural networks. Retrieved from https://arXiv:1901.00596.Google Scholar
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Retrieved from https://arXiv:1609.02907.Google Scholar
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. MIT Press, 1024--1034.Google Scholar
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. Retrieved from https://arXiv:1710.10903.Google Scholar
Hao Peng, Jianxin Li, Yu He, Yaopeng Liu, Mengjiao Bao, Lihong Wang, Yangqiu Song, and Qiang Yang. 2018. Large-scale hierarchical text classification with recursively regularized deep graph-cnn. In Proceedings of the World Wide Web Conference. International World Wide Web Conferences Steering Committee, 1063--1072.Google ScholarDigital Library
Hao Peng, Jianxin Li, Qiran Gong, Senzhang Wang, Lifang He, Bo Li, Lihong Wang, and Philip S. Yu. 2019. Hierarchical taxonomy-aware and attentional graph capsule RCNNs for large-scale multi-label text classification. Retrieved from https://arXiv:1906.04898.Google Scholar
Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7370--7377.Google ScholarDigital Library
Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr., Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. 2019. Simplifying graph convolutional networks. Retrieved from https://arXiv:1902.07153.Google Scholar
Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong Zhang, and Houfeng Wang. 2019. Text level graph neural network for text classification. Retrieved from https://arXiv:1910.02356.Google Scholar
Pengfei Liu, Shuaichen Chang, Xuanjing Huang, Jian Tang, and Jackie Chi Kit Cheung. 2019. Contextualized non-local neural networks for sequence learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6762--6769.Google ScholarDigital Library
Jane Bromley, James W. Bentz, Leon Bottou, Isabelle Guyon, Yann Lecun, Cliff Moore, Eduard Sackinger, and Roopak Shah. 1993. Signature verification using a Siamese time delay neural network. Int. J. Pattern Recogn. Artific. Intell. 7, 4 (1993), 669--688. DOI:http://dx.doi.org/10.1142/s0218001493000339Google ScholarCross Ref
Wen tau Yih, Kristina Toutanova, John C. Platt, and Christopher Meek. 2011. Learning discriminative projections for text similarity measures. In Proceedings of the 15th Conference on Computational Natural Language Learning (CoNLL’11).Google Scholar
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2333--2338.Google ScholarDigital Library
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the ACM International Conference on Conference on Information and Knowledge Management. ACM, 101--110.Google ScholarDigital Library
Jianfeng Gao, Michel Galley, and Lihong Li. 2019. Neural approaches to conversational ai. Found. Trends Info. Retriev. 13, 2-3 (2019), 127--298.Google ScholarDigital Library
Aliaksei Severyn and Alessandro Moschittiy. 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’15). DOI:http://dx.doi.org/10.1145/2766462.2767738Google ScholarDigital Library
Arpita Das, Harish Yenala, Manoj Chinnakotla, and Manish Shrivastava. 2016. Together we stand: Siamese networks for similar question retrieval. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). DOI:http://dx.doi.org/10.18653/v1/p16-1036Google ScholarCross Ref
Ming Tan, Cicero Dos Santos, Bing Xiang, and Bowen Zhou. 2016. Improved representation learning for question answer matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). DOI:http://dx.doi.org/10.18653/v1/p16-1044Google ScholarCross Ref
Jonas Mueller and Aditya Thyagarajan. 2016. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16).Google Scholar
Paul Neculoiu, Maarten Versteegh, and Mihai Rotaru. 2016. Learning text similarity with siamese recurrent networks. Retrieved from DOI:http://dx.doi.org/10.18653/v1/w16-1617.Google Scholar
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Modelling interaction of sentence pair with coupled-lstms. Retrieved from https://arXiv:1605.05573.Google Scholar
Hua He, Kevin Gimpel, and Jimmy Lin. 2015. Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). DOI:http://dx.doi.org/10.18653/v1/d15-1181Google ScholarCross Ref
Tom Renter, Alexey Borisov, and Maarten De Rijke. 2016. Siamese CBOW: Optimizing word embeddings for sentence representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). DOI:http://dx.doi.org/10.18653/v1/p16-1089 arxiv:1606.04640Google Scholar
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-Networks. DOI:http://dx.doi.org/10.18653/v1/d19-1410 Retrieved from https://arxiv:1908.10084.Google Scholar
Wenhao Lu, Jian Jiao, and Ruofei Zhang. 2020. TwinBERT: Distilling knowledge to twin-structured BERT models for efficient retrieval. Retrieved from https://arXiv:2002.06275.Google Scholar
Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou. 2015. LSTM-based deep learning models for non-factoid answer selection. Retrieved from https://arXiv:1511.04108.Google Scholar
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Hyperbolic representation learning for fast and efficient neural question answering. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 583--591.Google ScholarDigital Library
Shervin Minaee and Zhu Liu. 2017. Automatic question-answering using a deep similarity neural network. In Proceedings of the IEEE Global Conference on Signal and Information Processing (GlobalSIP’17). IEEE, 923--927.Google ScholarCross Ref
Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. 2015. A C-LSTM neural network for text classification. Retrieved from https://arXiv:1511.08630.Google Scholar
Rui Zhang, Honglak Lee, and Dragomir Radev. 2016. Dependency sensitive convolutional neural networks for modeling sentences and documents. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT’16). DOI:http://dx.doi.org/10.18653/v1/n16-1177 arxiv:1611.02361Google ScholarCross Ref
Guibin Chen, Deheng Ye, Erik Cambria, Jieshan Chen, and Zhenchang Xing. 2017. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’17). 2377--2383.Google ScholarCross Ref
Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1422--1432.Google ScholarCross Ref
Yijun Xiao and Kyunghyun Cho. 2016. Efficient character-level document classification by combining convolution and recurrent layers. Retrieved from https://arXiv:1602.00367.Google Scholar
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Tao Chen, Ruifeng Xu, Yulan He, and Xuan Wang. 2017. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst. Appl. 72 (2017), 221--230. DOI:http://dx.doi.org/10.1016/j.eswa.2016.10.065Google ScholarDigital Library
Kamran Kowsari, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S. Gerber, and Laura E. Barnes. 2017. Hdltex: Hierarchical deep learning for text classification. In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA’17). IEEE, 364--371.Google Scholar
Xiaodong Liu, Yelong Shen, Kevin Duh, and Jianfeng Gao. 2017. Stochastic answer networks for machine reading comprehension. Retrieved from https://arXiv:1712.03556.Google Scholar
Rupesh Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training very deep networks. In Advances in Neural Information Processing Systems. Retrieved from https://arxiv:1507.06228.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-Aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). Retrieved from https://arxiv:1508.06615.Google ScholarDigital Library
Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutnik, and Jürgen Schmidhuber. 2017. Recurrent highway networks. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). Retrieved from https://arxiv:1607.03474.Google Scholar
Ying Wen, Weinan Zhang, Rui Luo, and Jun Wang. 2016. Learning text representation using recurrent convolutional neural network with highway layers. Retrieved from https://arXiv:1606.06905.Google Scholar
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (Aug. 2011), 2493--2537.Google ScholarDigital Library
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.Google Scholar
Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained models for natural language processing: A survey. Retrieved from https://arXiv:2003.08271.Google Scholar
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. Retrieved from https://arXiv:1907.11692.Google Scholar
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. Retrieved from https://arXiv:1909.11942.Google Scholar
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. Retrieved from https://arXiv:1910.01108.Google Scholar
Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2019. Spanbert: Improving pre-training by representing and predicting spans. Retrieved from https://arXiv:1907.10529.Google Scholar
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. Retrieved from https://arXiv:2003.10555.Google Scholar
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. Retrieved from https://arXiv:1904.09223.Google Scholar
Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20). 8968--8975.Google ScholarCross Ref
Siddhant Garg, Thuy Vu, and Alessandro Moschitti. 2019. TANDA: Transfer and adapt pre-trained transformer models for answer sentence selection. Retrieved from https://arXiv:1911.04118.Google Scholar
Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune BERT for text classification?. In China National Conference on Chinese Computational Linguistics. Springer, 194--206.Google ScholarDigital Library
Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, and Xiang Zhou. 2019. Semantics-aware BERT for language understanding. Retrieved from https://arXiv:1909.02209.Google Scholar
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems. MIT Press, 5754--5764.Google Scholar
Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems. MIT Press, 13042--13054.Google Scholar
Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou et al. 2020. UniLMv2: Pseudo-masked language models for unified language model pre-training. Retrieved from https://arXiv:2002.12804.Google Scholar
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. Retrieved from https://arXiv:1910.10683.Google Scholar
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1985. Learning Internal Representations by Error Propagation. Technical Report. University of California San Diego, La Jolla Institute for Cognitive Science.Google Scholar
Ryan Kiros, Yukun Zhu, Russ R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in Neural Information Processing Systems. MIT Press, 3294--3302.Google Scholar
Andrew M. Dai and Quoc V. Le. 2015. Semi-supervised sequence learning. In Advances in Neural Information Processing Systems. Retrieved from https://arxiv:1511.01432.Google ScholarDigital Library
Minghua Zhang, Yunfang Wu, Weikang Li, and Wei Li. 2019. Learning universal sentence representations with mean-max attention autoencoder. DOI:http://dx.doi.org/10.18653/v1/d18-1481 Retrieved from https://arxiv:1809.06590.Google Scholar
Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR’14). arxiv:1312.6114Google Scholar
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. Proceedings of the International Conference on Machine Learning (ICML’14).Google Scholar
Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In Proceedings of the International Conference on Machine Learning.Google Scholar
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. DOI:http://dx.doi.org/10.18653/v1/k16-1002 Retrieved from https://arxiv:1511.06349.Google ScholarCross Ref
Suchin Gururangan, Tam Dang, Dallas Card, and Noah A Smith. 2019. Variational pretraining for semi-supervised text classification. Retrieved from https://arXiv:1906.02242.Google Scholar
Yu Meng, Jiaming Shen, Chao Zhang, and Jiawei Han. 2018. Weakly-supervised neural text classification. In Proceedings of the Conference on Information and Knowledge Management (CIKM’18).Google ScholarDigital Library
Jiaao Chen, Zichao Yang, and Diyi Yang. 2020. MixText: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In Proceedings of the Meeting of the Association for Computational Linguistics (ACL’20).Google ScholarCross Ref
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. Retrieved from https://arXiv:1412.6572.Google Scholar
Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. 2016. Distributional smoothing with virtual adversarial training. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google Scholar
Takeru Miyato, Andrew M. Dai, and Ian Goodfellow. 2016. Adversarial training methods for semi-supervised text classification. Retrieved from https://arXiv:1605.07725.Google Scholar
Devendra Singh Sachan, Manzil Zaheer, and Ruslan Salakhutdinov. 2019. Revisiting LSTM networks for semi-supervised text classification via mixed objective function. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6940--6948.Google ScholarDigital Library
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. Retrieved from https://arXiv:1704.05742.Google Scholar
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press.Google ScholarDigital Library
Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Sen Wang, and Chengqi Zhang. 2018. Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling. Retrieved from https://arXiv:1801.10296.Google Scholar
Xianggen Liu, Lili Mou, Haotian Cui, Zhengdong Lu, and Sen Song. 2020. Finding decision jumps in text classification. Neurocomputing 371 (2020), 177--187.Google ScholarDigital Library
Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen. 2017. Reasonet: Learning to stop reading in machine comprehension. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1047--1055.Google ScholarDigital Library
Yang Li, Quan Pan, Suhang Wang, Tao Yang, and Erik Cambria. 2018. A generative model for category text generation. Info. Sci. 450 (2018), 301--315.Google Scholar
Tianyang Zhang, Minlie Huang, and Li Zhao. 2018. Learning structured representation for text classification via reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2020. Domain-specific language model pretraining for biomedical natural language processing. Retrieved from https://arXiv:2007.15779.Google Scholar
Subhabrata Mukherjee and Ahmed Hassan Awadallah. 2020. XtremeDistil: Multi-stage distillation for massive multilingual models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2221--2234.Google ScholarCross Ref
Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy Lin. 2019. Distilling task-specific knowledge from BERT into simple neural networks. Retrieved from https://arXiv:1903.12136.Google Scholar
kaggle.[n. d.]. Retrieved from https://www.kaggle.com/yelp-dataset/yelp-dataset.Google Scholar
kaggle. [n. d.]. Retrieved from https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews.Google Scholar
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing. 79--86.Google ScholarDigital Library
Lingjia Deng and Janyce Wiebe. 2015. MPQA 3.0: An entity/event-level sentiment corpus. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1323--1328.Google ScholarCross Ref
kaggle. [n.d.]. Retrieved from https://www.kaggle.com/datafiniti/consumer-reviews-of-amazon-products.Google Scholar
20 Newsgroups. [n.d.]. Retrieved from http://qwone.com/jason/20Newsgroups/.Google Scholar
Reuters. [n.d.]. Retrieved from https://martin-thoma.com/nlp-reuters.Google Scholar
Fang Wang, Zhongyuan Wang, Zhoujun Li, and Ji-Rong Wen. 2014. Concept-based short text classification and ranking. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 1069--1078.Google ScholarDigital Library
Derek Greene and Pádraig Cunningham. 2006. Practical solutions to the problem of diagonal dominance in kernel document clustering. In Proceedings of the 23rd International Conference on Machine learning (ICML’06). ACM Press, 377--384.Google ScholarDigital Library
Abhinandan S. Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: Scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web. ACM, 271--280.Google ScholarDigital Library
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer et al. 2015. DBpedia—A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.Google ScholarCross Ref
Ohsumed. [n.d.]. Retrieved from http://davis.wpi.edu/xmdv/datasets/ohsumed.html.Google Scholar
Eneldo Loza Mencia and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 50--65.Google ScholarDigital Library
Zhiyong Lu. 2011. PubMed and beyond: A survey of web tools for searching biomedical literature. Retrieved from https://pubmed.ncbi.nlm.nih.gov/21245076/.Google Scholar
Franck Dernoncourt and Ji Young Lee. 2017. Pubmed 200k rct: A dataset for sequential sentence classification in medical abstracts. Retrieved from https://arXiv:1710.06071.Google Scholar
Byron C. Wallace, Laura Kertz, Eugene Charniak et al. 2014. Humans require context to infer ironic intent (so computers probably do, too). In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 512--516.Google ScholarCross Ref
Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. Retrieved from https://arXiv preprint:1806.03822.Google Scholar
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human-generated machine reading comprehension dataset. CoCo@ NIPS.Google Scholar
University of Pennsylvania [n.d.]. Retrieved from https://cogcomp.seas.upenn.edu/Data/QA/QC/.Google Scholar
Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013--2018.Google ScholarCross Ref
Quora. [n.d.]. Retrieved from https://data.quora.com/First-Quora-Dataset-Release-QuestionPairs.Google Scholar
Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. Swag: A large-scale adversarial dataset for grounded commonsense inference. Retrieved from https://arXiv:1808.05326.Google Scholar
Tomasz Jurczyk, Michael Zhai, and Jinho D. Choi. 2016. Selqa: A new benchmark for selection-based question answering. In 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 820--827.Google Scholar
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. Retrieved from https://arXiv:1508.05326.Google Scholar
Adina Williams, Nikita Nangia, and Samuel R Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. Retrieved from https://arXiv:1704.05426.Google Scholar
Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics. ACL, 350.Google ScholarDigital Library
Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. 2017. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. Retrieved from https://arXiv:1708.00055.Google Scholar
Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). DOI:http://dx.doi.org/10.1007/11736790_9Google Scholar
Tushar Khot, Ashish Sabharwal, and Peter Clark. 2018. Scitail: A textual entailment dataset from science question answering. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google Scholar
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142--150.Google ScholarDigital Library
Justin Christopher Martineau and Tim Finin. 2009. Delta tfidf: An improved feature space for sentiment analysis. In Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media.Google Scholar
Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. Retrieved from https://arXiv:1801.06146.Google Scholar
Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Advances in Neural Information Processing Systems. MIT Press, 6294--6305.Google Scholar
Scott Gray, Alec Radford, and Diederik P. Kingma. 2017. Gpu kernels for block-sparse weights. Retrieved from https://arXiv:1711.09224.Google Scholar
Alexander Ratner, Braden Hancock, Jared Dunnmon, Frederic Sala, Shreyash Pandey, and Christopher Ré. 2019. Training complex models with multi-task weak supervision. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4763--4771.Google ScholarDigital Library
Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. 2019. Unsupervised data augmentation. Retrieved from https://arXiv:1904.12848.Google Scholar
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. 957--966.Google Scholar
Matthew Richardson, Christopher J. C. Burges, and Erin Renshaw. 2013. Mctest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 193--203.Google Scholar
Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, and Weizhu Chen. 2017. Fusionnet: Fusing via fully-aware attention with application to machine comprehension. Retrieved from https://arXiv:1711.07341.Google Scholar
Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Recurrent neural network-based sentence encoder with gated attention for natural language inference. Retrieved from https://arXiv:1708.01353Google Scholar
Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai, and Xiaofei He. 2019. Discourse marker augmented network with reinforcement learning for natural language inference. Retrieved from https://arXiv:1907.09692.Google Scholar
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Retrieved from https://arXiv:2002.10957.Google Scholar

Index Terms

Deep Learning--based Text Classification: A Comprehensive Review

Recommendations

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Read More
Few-shot learning for short text classification

Due to the limited length and freely constructed sentence structures, it is a difficult classification task for short text classification. In this paper, a short text classification framework based on Siamese CNNs and few-shot learning is proposed. The ...
Read More
Text length considered adaptive bagging ensemble learning algorithm for text classification
Abstract
Ensemble learning constructs strong classifiers by training multiple weak classifiers, and is widely used in text classification field. In order to improve the text classification accuracy, a text length considered adaptive bootstrap aggregating (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 54, Issue 3
April 2022
836 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3461619
Editor:
Albert Zomaya
University of Sydney, Australia
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2021
- Accepted: 1 November 2020
- Revised: 1 October 2020
- Received: 1 April 2020
Published in csur Volume 54, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Text classification
deep learning
natural language inference
news categorization
question answering
sentiment analysis
topic classification
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 378
  Total Citations
  View Citations
- 14,331
  Total Downloads
- Downloads (Last 12 months)3,813
- Downloads (Last 6 weeks)497
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Deep Learning--based Text Classification: A Comprehensive Review

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Few-shot learning for short text classification

Text length considered adaptive bagging ensemble learning algorithm for text classification