Top

Published in:

2018 | OriginalPaper | Chapter

Evaluation and Analysis of Word Embedding Vectors of English Text Using Deep Learning Technique

Authors : Jaspreet Singh, Gurvinder Singh, Rajinder Singh, Prithvipal Singh

Published in: Smart and Innovative Trends in Next Generation Computing Technologies

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Word embedding is a process of mapping words into real number vectors. The representation of a word as vector maps uniquely each word to exclusive vector in the vector space of word corpus. The word embedding in natural language processing is gaining popularity these days due to its capability to exploit real world tasks such as syntactic and semantic entailment of text. Syntactic text entailment comprises of tasks like Parts of Speech (POS) tagging, chunking and tokenization whereas semantic text entailment contains tasks such as Named Entity Recognition (NER), Complex Word Identification (CWI), Sentiment classification, community question answering, word analogies and Natural Language Inferences (NLI). This study has explored eight word embedding models used for aforementioned real world tasks and proposed a novice word embedding using deep learning neural networks. The experimentation performed on two freely available datasets of English Wikipedia dump corpus of April, 2017 and pre-processed Wikipedia text8 corpus. The performance of proposed word embedding is validated against the baseline of four traditional word embedding techniques evaluated on the same corpus. The average result of 10 epochs shows the better performance of proposed technique than other word embedding techniques.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Development of a Micro Hindi Opinion WordNet and Aligning with Hown Ontology for Automatic Recognition of Opinion Words from Hindi Documents

next chapter POS Tagging of Hindi Language Using Hybrid Approach

The dataset is freely available at https://dumps.wikimedia.org/enwiki/20170401/.

The cleaned, truncated and compressed version of file9 of Wikipedia text using 25 compressors is available at http://mattmahoney.net/dc/text8.zip.

Nayak, N., Angeli, G., Manning, C.D.: Evaluating word embeddings using a representative suite of practical tasks. In: Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP (2016). https://doi.org/10.18653/v1/w16-2504

Melamud, O., Levy, O., Dagan, I.: A simple word embedding model for lexical substitution. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing (2015). https://doi.org/10.3115/v1/w15-1501

Goldberg, Y., Levy, O.: Word2vec explained: deriving Mikolov et al.’s negative sampling word embedding method (2014). https://arxiv.org/abs/1402.3722

Goldberg, Y.: A Note on Latent Semantic Analysis (2014)

Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.: From word embeddings to document distances. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML 2015, vol. 37, pp. 957–966 (2015). http://dl.acm.org/citation.cfm?id=3045221

Stratos, K., Collins, M., Hsu, D.: Model-based word embeddings from decompositions of count matrices. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1 (2015). https://doi.org/10.3115/v1/p15-1124

Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS 2013), pp. 2265–2273. Curran Associates Inc. (2013)

Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 2 (2014). https://doi.org/10.3115/v1/p14-2050

Spasojevic, N., Rao, A.: Identifying actionable messages on social media. In: IEEE International Conference on Big Data (2015). https://doi.org/10.1109/bigdata.2015.7364016

10.

Ororbia II, A.G., Mikolov, T., Reitter, D.: Learning simpler language models with the differential state framework (2017). https://arxiv.org/abs/1703.08864

11.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch (2011). https://arxiv.org/abs/1103.0398

12.

Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. TACL Trans. Assoc. Comput. Linguist. 3, 211–225 (2015)

13.

Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford Core NLP Natural Language Processing Toolkit (2014). https://nlp.stanford.edu/pubs/StanfordCoreNlp2014.pdf

14.

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). https://arxiv.org/abs/1301.3781

15.

Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: ACL 2012 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882 (2012)

16.

Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 302–308 (2014)

17.

Thater, S., Furstenau, H., Pinkal, M.: Word meaning in context: a simple and effective vector model. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 1134–1143 (2011)

18.

Petterson, J., Smola, A., Caetano, T., Buntine, W., Narayanamurthy, S.: Word Features for Latent Dirichlet Allocation (2010)

19.

Parker, R., Graff, D., Kong, J., Chen, K., Maeda, K.: English Gigaword Fifth Edition (2011). ISBN 1-58563-581-2

20.

Sanders, N.J.: Twitter Sentiment Corpus, Sanders Analytics LLC (2011). http://www.sananalytics.com/lab/twitter-sentiment/

21.

Ratinov, L., Bengio, Y., Turian, J.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)

22.

Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (2014)

23.

Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems 27 (2014)

24.

Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Methods 39(3), 510–526 (2007)CrossRef

25.

Ritter, A., Etzioni, M.O.: A latent dirichlet allocation method for selectional preferences. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 424–434 (2010)

26.

Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. (2010). https://arxiv.org/abs/1003.1141

27.

Cambria, E.: Affective computing and sentiment analysis. IEEE Intell. Syst. 31(2), 102–107 (2016). https://doi.org/10.1109/mis.2016.31

28.

Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA (2011)

29.

Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

30.

Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics (2014)

31.

Zhang, X., LeCun, Y.: Text Understanding from Scratch (2015). https://arxiv.org/abs/1502.01710

32.

Hochreiter, S., Schmidhuber, J.: LSTM Can Solve Hard Log Time Lag Problems (1997)

33.

Jernite, Y., Grave, E., Joulin, A., Mikolov, T.: Variable Computation in Recurrent Neural Networks (2016). https://arxiv.org/abs/1611.06188

34.

Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y.: Learning word vectors for sentiment analysis. In: Proceeding HLT 2011 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150 (2011). ISBN 978-1-932432-87-9

35.

Turian, J., Bergstra, J., Bengio, Y.: Quadratic features and deep architectures for chunking. In: Proceedings of NAACL HLT 2009, pp. 245–248, Boulder, Colorado, June 2009

Title: Evaluation and Analysis of Word Embedding Vectors of English Text Using Deep Learning Technique
Authors: Jaspreet Singh
Gurvinder Singh
Rajinder Singh
Prithvipal Singh
Publisher: Springer Singapore
Book: Smart and Innovative Trends in Next Generation Computing Technologies
Print ISBN: 978-981-10-8656-4

Electronic ISBN: 978-981-10-8657-1

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-981-10-8657-1_55

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner