Skip to main content
Top

2017 | OriginalPaper | Chapter

Combining Large-Scale Unlabeled Corpus and Lexicon for Chinese Polysemous Word Similarity Computation

Authors : Huiwei Zhou, Chen Jia, Yunlong Yang, Shixian Ning, Yingyu Lin, Degen Huang

Published in: Information Retrieval

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Word embeddings have achieved an outstanding performance in word similarity measurement. However, most prior works focus on building models with one embedding per word, neglect the fact that a word can have multiple senses. This paper proposes two sense embedding learning methods based on large-scale unlabeled corpus and Lexicon respectively for Chinese polysemous words. The corpus-based method labels the senses of polysemous words by clustering the contexts with tf-idf weight, and using the HowNet to initialize the number of senses instead of simply inducing a fixed number for each polysemous word. The lexicon-based method extends the AutoExtend to Tongyici Cilin with some related lexicon constraints for sense embedding learning. Furthermore, these two methods are combined for Chinese polysemous word similarity computation. The experiments on the Chinese Polysemous Word Similarity Dataset show the effectiveness and complementarity of our two sense embedding learning methods. The final Spearman rank correlation coefficient achieves 0.582, which outperforms the state-of-the-art performance on the evaluation dataset.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of ACL, pp. 384–394 (2010) Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of ACL, pp. 384–394 (2010)
2.
go back to reference Li, J., Jurafsky, D.: Do multi-sense embeddings improve natural language understanding. In: Proceedings of EMNLP, pp. 1722–1732 (2015) Li, J., Jurafsky, D.: Do multi-sense embeddings improve natural language understanding. In: Proceedings of EMNLP, pp. 1722–1732 (2015)
3.
go back to reference Reisinger, J., Mooney, R.J.: Multi-prototype vector-space models of word meaning. In: Proceedings of NAACL-HLT, pp. 109–117 (2010) Reisinger, J., Mooney, R.J.: Multi-prototype vector-space models of word meaning. In: Proceedings of NAACL-HLT, pp. 109–117 (2010)
4.
go back to reference Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of ACL, pp. 873–882 (2012) Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of ACL, pp. 873–882 (2012)
5.
go back to reference Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRef Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRef
6.
go back to reference Dong, Z.D., Dong, Q.: HowNet and the computation of meaning. In: World Scientific, pp. 85–95 (2006) Dong, Z.D., Dong, Q.: HowNet and the computation of meaning. In: World Scientific, pp. 85–95 (2006)
7.
go back to reference Rothe, S., Schütze, H.: Autoextend: extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of ACL, pp. 1793–1803 (2015) Rothe, S., Schütze, H.: Autoextend: extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of ACL, pp. 1793–1803 (2015)
8.
go back to reference Che, W.X., Li, Z.H., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of COLING, pp. 13–16 (2010) Che, W.X., Li, Z.H., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of COLING, pp. 13–16 (2010)
9.
go back to reference Guo, J., Che, W.X., Wang, H.F., Liu, T.: Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING, pp. 497–507 (2014) Guo, J., Che, W.X., Wang, H.F., Liu, T.: Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING, pp. 497–507 (2014)
10.
go back to reference Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)MATH Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)MATH
11.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Workshop at ICLR (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Workshop at ICLR (2013)
12.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
13.
go back to reference Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)
14.
go back to reference Neelakantan, A., Shankar, J., Passos, A., McCallum, A.: Efficient non-parametric estimation of multiple embeddings per word in vector space. In: Proceedings of EMNLP, pp. 1059–1069 (2014) Neelakantan, A., Shankar, J., Passos, A., McCallum, A.: Efficient non-parametric estimation of multiple embeddings per word in vector space. In: Proceedings of EMNLP, pp. 1059–1069 (2014)
15.
go back to reference Zheng, X.Q., Feng, J.T., Chen, Y., Peng, H.Y., Zhang, W.Q.: Learning context-specific word/character embeddings. In: Proceedings of the AAAI 2017, pp. 3393–3399 (2017) Zheng, X.Q., Feng, J.T., Chen, Y., Peng, H.Y., Zhang, W.Q.: Learning context-specific word/character embeddings. In: Proceedings of the AAAI 2017, pp. 3393–3399 (2017)
16.
go back to reference Chen, T., Xu, R.F., He, Y.L., Wang, X.: Improving distributed representation of word sense via WordNet gloss composition and context clustering. In: Proceedings of ACL, pp. 15–20 (2015) Chen, T., Xu, R.F., He, Y.L., Wang, X.: Improving distributed representation of word sense via WordNet gloss composition and context clustering. In: Proceedings of ACL, pp. 15–20 (2015)
17.
go back to reference Pei, J.H., Zhang, C., Huang, D.G., Ma, J.J.: Combining word embedding and semantic lexicon for Chinese word similarity computation. In: Proceedings of NLPCC, pp. 766–777 (2016) Pei, J.H., Zhang, C., Huang, D.G., Ma, J.J.: Combining word embedding and semantic lexicon for Chinese word similarity computation. In: Proceedings of NLPCC, pp. 766–777 (2016)
Metadata
Title
Combining Large-Scale Unlabeled Corpus and Lexicon for Chinese Polysemous Word Similarity Computation
Authors
Huiwei Zhou
Chen Jia
Yunlong Yang
Shixian Ning
Yingyu Lin
Degen Huang
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-68699-8_16