Skip to main content

2018 | OriginalPaper | Buchkapitel

Evaluation and Analysis of Word Embedding Vectors of English Text Using Deep Learning Technique

verfasst von : Jaspreet Singh, Gurvinder Singh, Rajinder Singh, Prithvipal Singh

Erschienen in: Smart and Innovative Trends in Next Generation Computing Technologies

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Word embedding is a process of mapping words into real number vectors. The representation of a word as vector maps uniquely each word to exclusive vector in the vector space of word corpus. The word embedding in natural language processing is gaining popularity these days due to its capability to exploit real world tasks such as syntactic and semantic entailment of text. Syntactic text entailment comprises of tasks like Parts of Speech (POS) tagging, chunking and tokenization whereas semantic text entailment contains tasks such as Named Entity Recognition (NER), Complex Word Identification (CWI), Sentiment classification, community question answering, word analogies and Natural Language Inferences (NLI). This study has explored eight word embedding models used for aforementioned real world tasks and proposed a novice word embedding using deep learning neural networks. The experimentation performed on two freely available datasets of English Wikipedia dump corpus of April, 2017 and pre-processed Wikipedia text8 corpus. The performance of proposed word embedding is validated against the baseline of four traditional word embedding techniques evaluated on the same corpus. The average result of 10 epochs shows the better performance of proposed technique than other word embedding techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
The cleaned, truncated and compressed version of file9 of Wikipedia text using 25 compressors is available at http://​mattmahoney.​net/​dc/​text8.​zip.
 
Literatur
1.
4.
Zurück zum Zitat Goldberg, Y.: A Note on Latent Semantic Analysis (2014) Goldberg, Y.: A Note on Latent Semantic Analysis (2014)
5.
6.
Zurück zum Zitat Stratos, K., Collins, M., Hsu, D.: Model-based word embeddings from decompositions of count matrices. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1 (2015). https://doi.org/10.3115/v1/p15-1124 Stratos, K., Collins, M., Hsu, D.: Model-based word embeddings from decompositions of count matrices. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1 (2015). https://​doi.​org/​10.​3115/​v1/​p15-1124
7.
Zurück zum Zitat Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS 2013), pp. 2265–2273. Curran Associates Inc. (2013) Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS 2013), pp. 2265–2273. Curran Associates Inc. (2013)
12.
Zurück zum Zitat Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. TACL Trans. Assoc. Comput. Linguist. 3, 211–225 (2015) Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. TACL Trans. Assoc. Comput. Linguist. 3, 211–225 (2015)
15.
Zurück zum Zitat Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: ACL 2012 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882 (2012) Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: ACL 2012 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882 (2012)
16.
Zurück zum Zitat Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 302–308 (2014) Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 302–308 (2014)
17.
Zurück zum Zitat Thater, S., Furstenau, H., Pinkal, M.: Word meaning in context: a simple and effective vector model. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 1134–1143 (2011) Thater, S., Furstenau, H., Pinkal, M.: Word meaning in context: a simple and effective vector model. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 1134–1143 (2011)
18.
Zurück zum Zitat Petterson, J., Smola, A., Caetano, T., Buntine, W., Narayanamurthy, S.: Word Features for Latent Dirichlet Allocation (2010) Petterson, J., Smola, A., Caetano, T., Buntine, W., Narayanamurthy, S.: Word Features for Latent Dirichlet Allocation (2010)
19.
Zurück zum Zitat Parker, R., Graff, D., Kong, J., Chen, K., Maeda, K.: English Gigaword Fifth Edition (2011). ISBN 1-58563-581-2 Parker, R., Graff, D., Kong, J., Chen, K., Maeda, K.: English Gigaword Fifth Edition (2011). ISBN 1-58563-581-2
21.
Zurück zum Zitat Ratinov, L., Bengio, Y., Turian, J.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010) Ratinov, L., Bengio, Y., Turian, J.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)
22.
Zurück zum Zitat Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (2014) Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (2014)
23.
Zurück zum Zitat Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems 27 (2014) Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems 27 (2014)
24.
Zurück zum Zitat Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Methods 39(3), 510–526 (2007)CrossRef Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Methods 39(3), 510–526 (2007)CrossRef
25.
Zurück zum Zitat Ritter, A., Etzioni, M.O.: A latent dirichlet allocation method for selectional preferences. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 424–434 (2010) Ritter, A., Etzioni, M.O.: A latent dirichlet allocation method for selectional preferences. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 424–434 (2010)
28.
Zurück zum Zitat Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA (2011) Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA (2011)
29.
Zurück zum Zitat Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
30.
Zurück zum Zitat Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics (2014) Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics (2014)
32.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: LSTM Can Solve Hard Log Time Lag Problems (1997) Hochreiter, S., Schmidhuber, J.: LSTM Can Solve Hard Log Time Lag Problems (1997)
34.
Zurück zum Zitat Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y.: Learning word vectors for sentiment analysis. In: Proceeding HLT 2011 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150 (2011). ISBN 978-1-932432-87-9 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y.: Learning word vectors for sentiment analysis. In: Proceeding HLT 2011 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150 (2011). ISBN 978-1-932432-87-9
35.
Zurück zum Zitat Turian, J., Bergstra, J., Bengio, Y.: Quadratic features and deep architectures for chunking. In: Proceedings of NAACL HLT 2009, pp. 245–248, Boulder, Colorado, June 2009 Turian, J., Bergstra, J., Bengio, Y.: Quadratic features and deep architectures for chunking. In: Proceedings of NAACL HLT 2009, pp. 245–248, Boulder, Colorado, June 2009
Metadaten
Titel
Evaluation and Analysis of Word Embedding Vectors of English Text Using Deep Learning Technique
verfasst von
Jaspreet Singh
Gurvinder Singh
Rajinder Singh
Prithvipal Singh
Copyright-Jahr
2018
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-8657-1_55