research-article

Open Access

A Comparative Study on Word Embeddings in Deep Learning for Text Classification

Authors:
Congcong Wang

School of Computer Science, University College Dublin

School of Computer Science, University College Dublin
View Profile

,
Paul Nulty

School of Computer Science, University College Dublin

School of Computer Science, University College Dublin
View Profile

,
David Lillis

School of Computer Science, University College Dublin

School of Computer Science, University College Dublin
View Profile

NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information RetrievalDecember 2020Pages 37–46https://doi.org/10.1145/3443279.3443304

Published:01 February 2021Publication History

NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval

Pages 37–46

ABSTRACT

Word embeddings act as an important component of deep models for providing input features in downstream language tasks, such as sequence labelling and text classification. In the last decade, a substantial number of word embedding methods have been proposed for this purpose, mainly falling into the categories of classic and context-based word embeddings. In this paper, we conduct controlled experiments to systematically examine both classic and contextualised word embeddings for the purposes of text classification. To encode a sequence from word representations, we apply two encoders, namely CNN and BiLSTM, in the downstream network architecture. To study the impact of word embeddings on different datasets, we select four benchmarking classification datasets with varying average sample length, comprising both single-label and multi-label classification tasks. The evaluation results with confidence intervals indicate that CNN as the downstream encoder outperforms BiLSTM in most situations, especially for document context-insensitive datasets. This study recommends choosing CNN over BiLSTM for document classification datasets where the context in sequence is not as indicative of class membership as sentence datasets. For word embeddings, concatenation of multiple classic embeddings or increasing their size does not lead to a statistically significant difference in performance despite a slight improvement in some cases. For context-based embeddings, we studied both ELMo and BERT. The results show that BERT overall outperforms ELMo, especially for long document datasets. Compared with classic embeddings, both achieve an improved performance for short datasets while the improvement is not observed in longer datasets.

References

Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin. 2019. DocBERT: BERT for Document Classification. arXiv preprint arXiv:1904.08398 (2019).Google Scholar
Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics. 1638--1649.Google Scholar
Chidanand Apté, Fred Damerau, and Sholom M Weiss. 1994. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems (TOIS) 12, 3 (1994), 233--251.Google ScholarDigital Library
Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 238--247.Google ScholarCross Ref
Emily M Bender. 2019. The# BenderRule: On Naming the Languages We Study and Why It Matters. The Gradient (2019). https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/Google Scholar
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W Cohen. 2017. A comparative study of word embeddings for reading comprehension. arXiv preprint arXiv:1703.00993 (2017).Google Scholar
Manaal Faruqui and Chris Dyer. 2014. Community evaluation and exchange of word vectors at wordvectors.org. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 19--24.Google ScholarCross Ref
Peter W Foltz. 1996. Latent semantic analysis for text-based research. Behavior Research Methods, Instruments, & Computers 28, 2 (1996), 197--202.Google ScholarCross Ref
Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 2018. AllenNLP: A Deep Semantic Natural Language Processing Platform. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS). 1--6.Google ScholarCross Ref
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
Wei Huang, Enhong Chen, Qi Liu, Yuying Chen, Zai Huang, Yang Liu, Zhou Zhao, Dan Zhang, and Shijin Wang. 2019. Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach. In Proceedings of the 28th ACM CIKM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, CHINA, Nov 3--7, 2019. 1051--1060.Google ScholarDigital Library
Ibrahim Kaibi, Hassan Satori, et al. 2019. A comparative evaluation of word embeddings techniques for twitter sentiment analysis. In 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS). IEEE, 1--4.Google ScholarCross Ref
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1746--1751. https://doi.org/10.3115/v1/D14-1181Google ScholarCross Ref
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google Scholar
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).Google Scholar
Ken Lang. 1995. Newsweeder: Learning to filter netnews. In Machine Learning Proceedings 1995. Elsevier, 331--339.Google ScholarDigital Library
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Advances in Neural Information Processing Systems. 6294--6305.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google Scholar
Marwa Naili, Anja Habacha Chaibi, and Henda Hajjami Ben Ghezala. 2017. Comparative study of word embedding methods in topic segmentation. Procedia computer science 112 (2017), 340--349.Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarCross Ref
Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of NAACL-HLT. 2227--2237.Google ScholarCross Ref
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing. 1631--1642.Google Scholar
Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to Fine-Tune BERT for Text Classification? Chinese Computational Linguistics (2019), 194--206. https://doi.org/10.1007/978-3-030-32381-3_16Google ScholarDigital Library
Avijit Thawani, Biplav Srivastava, and Anil Singh. 2019. SWOW-8500: Word Association Task for Intrinsic Evaluation of Word Embeddings. In Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP. 43--51.Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google Scholar
Bin Wang, Angela Wang, Fenxiao Chen, Yuncheng Wang, and C-C Jay Kuo. 2019. Evaluating word embedding models: Methods and experimental results. APSIPA transactions on signal and information processing 8 (2019).Google Scholar
Edwin B Wilson. 1927. Probable inference, the law of succession, and statistical inference. J. Amer. Statist. Assoc. 22, 158 (1927), 209--212.Google ScholarCross Ref
Han Xiao. 2018. bert-as-service. https://github.com/hanxiao/bert-as-service.Google Scholar
Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: Sequence Generation Model for Multi-label Classification. In Proceedings of the 27th International Conference on Computational Linguistics. 3915--3926.Google Scholar
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems. 5754--5764.Google Scholar
Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1253.Google ScholarCross Ref
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649--657.Google Scholar
Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. 2015. A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015).Google Scholar
Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 3485--3495.Google Scholar

Index Terms

A Comparative Study on Word Embeddings in Deep Learning for Text Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Learning class-specific word embeddings
Abstract
Recent years have seen the success of applying word embedding algorithms to natural language processing (NLP) tasks. Most word embedding algorithms only produce a single embedding per word. This makes the learned embeddings indiscriminative since ...
Read More
Text Classification with Topic-based Word Embedding and Convolutional Neural Networks
BCB '16: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Recently, distributed word embeddings trained by neural language models are commonly used for text classification with Convolutional Neural Networks (CNNs). In this paper, we propose a novel neural language model, Topic-based Skip-gram, to learn topic-...
Read More
Impact of Text Specificity and Size on Word Embeddings Performance: An Empirical Evaluation in Brazilian Legal Domain
Intelligent Systems
Abstract
Word embeddings is a text representation technique capable of capturing syntactic and semantic linguistic patterns and of representing each word as an n-dimensional dense vector. In the domain of legal texts, there are trained word embeddings in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval
December 2020
217 pages
ISBN:9781450377607
DOI:10.1145/3443279

Copyright © 2020 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 February 2021
Check for updates
Author Tags
Neural Networks
Text Classification
Word Embeddings
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 4,062
  Total Downloads
- Downloads (Last 12 months)1,739
- Downloads (Last 6 weeks)193
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Comparative Study on Word Embeddings in Deep Learning for Text Classification

NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning class-specific word embeddings

Text Classification with Topic-based Word Embedding and Convolutional Neural Networks

Impact of Text Specificity and Size on Word Embeddings Performance: An Empirical Evaluation in Brazilian Legal Domain

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Comparative Study on Word Embeddings in Deep Learning for Text Classification

NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning class-specific word embeddings

Text Classification with Topic-based Word Embedding and Convolutional Neural Networks

Impact of Text Specificity and Size on Word Embeddings Performance: An Empirical Evaluation in Brazilian Legal Domain

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media