skip to main content
10.1145/3443279.3443304acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article
Open Access

A Comparative Study on Word Embeddings in Deep Learning for Text Classification

Authors Info & Claims
Published:01 February 2021Publication History

ABSTRACT

Word embeddings act as an important component of deep models for providing input features in downstream language tasks, such as sequence labelling and text classification. In the last decade, a substantial number of word embedding methods have been proposed for this purpose, mainly falling into the categories of classic and context-based word embeddings. In this paper, we conduct controlled experiments to systematically examine both classic and contextualised word embeddings for the purposes of text classification. To encode a sequence from word representations, we apply two encoders, namely CNN and BiLSTM, in the downstream network architecture. To study the impact of word embeddings on different datasets, we select four benchmarking classification datasets with varying average sample length, comprising both single-label and multi-label classification tasks. The evaluation results with confidence intervals indicate that CNN as the downstream encoder outperforms BiLSTM in most situations, especially for document context-insensitive datasets. This study recommends choosing CNN over BiLSTM for document classification datasets where the context in sequence is not as indicative of class membership as sentence datasets. For word embeddings, concatenation of multiple classic embeddings or increasing their size does not lead to a statistically significant difference in performance despite a slight improvement in some cases. For context-based embeddings, we studied both ELMo and BERT. The results show that BERT overall outperforms ELMo, especially for long document datasets. Compared with classic embeddings, both achieve an improved performance for short datasets while the improvement is not observed in longer datasets.

References

  1. Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin. 2019. DocBERT: BERT for Document Classification. arXiv preprint arXiv:1904.08398 (2019).Google ScholarGoogle Scholar
  2. Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics. 1638--1649.Google ScholarGoogle Scholar
  3. Chidanand Apté, Fred Damerau, and Sholom M Weiss. 1994. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems (TOIS) 12, 3 (1994), 233--251.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 238--247.Google ScholarGoogle ScholarCross RefCross Ref
  5. Emily M Bender. 2019. The# BenderRule: On Naming the Languages We Study and Why It Matters. The Gradient (2019). https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/Google ScholarGoogle Scholar
  6. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  8. Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W Cohen. 2017. A comparative study of word embeddings for reading comprehension. arXiv preprint arXiv:1703.00993 (2017).Google ScholarGoogle Scholar
  9. Manaal Faruqui and Chris Dyer. 2014. Community evaluation and exchange of word vectors at wordvectors.org. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 19--24.Google ScholarGoogle ScholarCross RefCross Ref
  10. Peter W Foltz. 1996. Latent semantic analysis for text-based research. Behavior Research Methods, Instruments, & Computers 28, 2 (1996), 197--202.Google ScholarGoogle ScholarCross RefCross Ref
  11. Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 2018. AllenNLP: A Deep Semantic Natural Language Processing Platform. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wei Huang, Enhong Chen, Qi Liu, Yuying Chen, Zai Huang, Yang Liu, Zhou Zhao, Dan Zhang, and Shijin Wang. 2019. Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach. In Proceedings of the 28th ACM CIKM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, CHINA, Nov 3--7, 2019. 1051--1060.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ibrahim Kaibi, Hassan Satori, et al. 2019. A comparative evaluation of word embeddings techniques for twitter sentiment analysis. In 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS). IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  15. Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1746--1751. https://doi.org/10.3115/v1/D14-1181Google ScholarGoogle ScholarCross RefCross Ref
  16. Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google ScholarGoogle Scholar
  17. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).Google ScholarGoogle Scholar
  18. Ken Lang. 1995. Newsweeder: Learning to filter netnews. In Machine Learning Proceedings 1995. Elsevier, 331--339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google ScholarGoogle Scholar
  20. Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Advances in Neural Information Processing Systems. 6294--6305.Google ScholarGoogle Scholar
  21. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google ScholarGoogle Scholar
  22. Marwa Naili, Anja Habacha Chaibi, and Henda Hajjami Ben Ghezala. 2017. Comparative study of word embedding methods in topic segmentation. Procedia computer science 112 (2017), 340--349.Google ScholarGoogle Scholar
  23. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  24. Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of NAACL-HLT. 2227--2237.Google ScholarGoogle ScholarCross RefCross Ref
  25. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing. 1631--1642.Google ScholarGoogle Scholar
  26. Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to Fine-Tune BERT for Text Classification? Chinese Computational Linguistics (2019), 194--206. https://doi.org/10.1007/978-3-030-32381-3_16Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Avijit Thawani, Biplav Srivastava, and Anil Singh. 2019. SWOW-8500: Word Association Task for Intrinsic Evaluation of Word Embeddings. In Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP. 43--51.Google ScholarGoogle ScholarCross RefCross Ref
  28. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google ScholarGoogle Scholar
  29. Bin Wang, Angela Wang, Fenxiao Chen, Yuncheng Wang, and C-C Jay Kuo. 2019. Evaluating word embedding models: Methods and experimental results. APSIPA transactions on signal and information processing 8 (2019).Google ScholarGoogle Scholar
  30. Edwin B Wilson. 1927. Probable inference, the law of succession, and statistical inference. J. Amer. Statist. Assoc. 22, 158 (1927), 209--212.Google ScholarGoogle ScholarCross RefCross Ref
  31. Han Xiao. 2018. bert-as-service. https://github.com/hanxiao/bert-as-service.Google ScholarGoogle Scholar
  32. Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: Sequence Generation Model for Multi-label Classification. In Proceedings of the 27th International Conference on Computational Linguistics. 3915--3926.Google ScholarGoogle Scholar
  33. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems. 5754--5764.Google ScholarGoogle Scholar
  34. Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1253.Google ScholarGoogle ScholarCross RefCross Ref
  35. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649--657.Google ScholarGoogle Scholar
  36. Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. 2015. A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015).Google ScholarGoogle Scholar
  37. Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 3485--3495.Google ScholarGoogle Scholar

Index Terms

  1. A Comparative Study on Word Embeddings in Deep Learning for Text Classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval
      December 2020
      217 pages
      ISBN:9781450377607
      DOI:10.1145/3443279

      Copyright © 2020 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 February 2021

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader