Skip to main content
Top

2018 | OriginalPaper | Chapter

Evaluation and Analysis of Word Embedding Vectors of English Text Using Deep Learning Technique

Authors : Jaspreet Singh, Gurvinder Singh, Rajinder Singh, Prithvipal Singh

Published in: Smart and Innovative Trends in Next Generation Computing Technologies

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Word embedding is a process of mapping words into real number vectors. The representation of a word as vector maps uniquely each word to exclusive vector in the vector space of word corpus. The word embedding in natural language processing is gaining popularity these days due to its capability to exploit real world tasks such as syntactic and semantic entailment of text. Syntactic text entailment comprises of tasks like Parts of Speech (POS) tagging, chunking and tokenization whereas semantic text entailment contains tasks such as Named Entity Recognition (NER), Complex Word Identification (CWI), Sentiment classification, community question answering, word analogies and Natural Language Inferences (NLI). This study has explored eight word embedding models used for aforementioned real world tasks and proposed a novice word embedding using deep learning neural networks. The experimentation performed on two freely available datasets of English Wikipedia dump corpus of April, 2017 and pre-processed Wikipedia text8 corpus. The performance of proposed word embedding is validated against the baseline of four traditional word embedding techniques evaluated on the same corpus. The average result of 10 epochs shows the better performance of proposed technique than other word embedding techniques.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
2
The cleaned, truncated and compressed version of file9 of Wikipedia text using 25 compressors is available at http://​mattmahoney.​net/​dc/​text8.​zip.
 
Literature
4.
go back to reference Goldberg, Y.: A Note on Latent Semantic Analysis (2014) Goldberg, Y.: A Note on Latent Semantic Analysis (2014)
5.
6.
go back to reference Stratos, K., Collins, M., Hsu, D.: Model-based word embeddings from decompositions of count matrices. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1 (2015). https://doi.org/10.3115/v1/p15-1124 Stratos, K., Collins, M., Hsu, D.: Model-based word embeddings from decompositions of count matrices. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1 (2015). https://​doi.​org/​10.​3115/​v1/​p15-1124
7.
go back to reference Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS 2013), pp. 2265–2273. Curran Associates Inc. (2013) Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS 2013), pp. 2265–2273. Curran Associates Inc. (2013)
12.
go back to reference Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. TACL Trans. Assoc. Comput. Linguist. 3, 211–225 (2015) Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. TACL Trans. Assoc. Comput. Linguist. 3, 211–225 (2015)
15.
go back to reference Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: ACL 2012 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882 (2012) Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: ACL 2012 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882 (2012)
16.
go back to reference Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 302–308 (2014) Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 302–308 (2014)
17.
go back to reference Thater, S., Furstenau, H., Pinkal, M.: Word meaning in context: a simple and effective vector model. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 1134–1143 (2011) Thater, S., Furstenau, H., Pinkal, M.: Word meaning in context: a simple and effective vector model. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 1134–1143 (2011)
18.
go back to reference Petterson, J., Smola, A., Caetano, T., Buntine, W., Narayanamurthy, S.: Word Features for Latent Dirichlet Allocation (2010) Petterson, J., Smola, A., Caetano, T., Buntine, W., Narayanamurthy, S.: Word Features for Latent Dirichlet Allocation (2010)
19.
go back to reference Parker, R., Graff, D., Kong, J., Chen, K., Maeda, K.: English Gigaword Fifth Edition (2011). ISBN 1-58563-581-2 Parker, R., Graff, D., Kong, J., Chen, K., Maeda, K.: English Gigaword Fifth Edition (2011). ISBN 1-58563-581-2
21.
go back to reference Ratinov, L., Bengio, Y., Turian, J.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010) Ratinov, L., Bengio, Y., Turian, J.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)
22.
go back to reference Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (2014) Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (2014)
23.
go back to reference Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems 27 (2014) Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems 27 (2014)
24.
go back to reference Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Methods 39(3), 510–526 (2007)CrossRef Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Methods 39(3), 510–526 (2007)CrossRef
25.
go back to reference Ritter, A., Etzioni, M.O.: A latent dirichlet allocation method for selectional preferences. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 424–434 (2010) Ritter, A., Etzioni, M.O.: A latent dirichlet allocation method for selectional preferences. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 424–434 (2010)
28.
go back to reference Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA (2011) Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA (2011)
29.
go back to reference Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
30.
go back to reference Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics (2014) Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics (2014)
32.
go back to reference Hochreiter, S., Schmidhuber, J.: LSTM Can Solve Hard Log Time Lag Problems (1997) Hochreiter, S., Schmidhuber, J.: LSTM Can Solve Hard Log Time Lag Problems (1997)
34.
go back to reference Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y.: Learning word vectors for sentiment analysis. In: Proceeding HLT 2011 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150 (2011). ISBN 978-1-932432-87-9 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y.: Learning word vectors for sentiment analysis. In: Proceeding HLT 2011 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150 (2011). ISBN 978-1-932432-87-9
35.
go back to reference Turian, J., Bergstra, J., Bengio, Y.: Quadratic features and deep architectures for chunking. In: Proceedings of NAACL HLT 2009, pp. 245–248, Boulder, Colorado, June 2009 Turian, J., Bergstra, J., Bengio, Y.: Quadratic features and deep architectures for chunking. In: Proceedings of NAACL HLT 2009, pp. 245–248, Boulder, Colorado, June 2009
Metadata
Title
Evaluation and Analysis of Word Embedding Vectors of English Text Using Deep Learning Technique
Authors
Jaspreet Singh
Gurvinder Singh
Rajinder Singh
Prithvipal Singh
Copyright Year
2018
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-8657-1_55

Premium Partner