Skip to main content

2015 | OriginalPaper | Buchkapitel

Incorporating Word Clustering into Complex Noun Phrase Identification

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Since the professional technical literature include amounts of complex noun phrases, identifying those phrases has an important practical value for such tasks as machine translation. Through analysis of those phrases in Chinese-English bilingual sentence pairs from the aircraft technical publications, we present an annotation specification based on the existing specification to label those phrases and a method for the complex noun phrase identification. In addition to the basic features including the word and the part-of-speech, we incorporate the word clustering features trained by Brown clustering model and Word Vector Class (WVC) model on a large unlabeled data into the machine learning model. Experimental results indicate that the combination of different word clustering features and basic features can leverage system performance, and improve the F-score by 1.83 % in contrast with the method only adding the basic features.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Xu, H.: Application of commercial aircraft technical publication specifications. J. Aviat. Maint. Eng. 6, 91–93 (2012) Xu, H.: Application of commercial aircraft technical publication specifications. J. Aviat. Maint. Eng. 6, 91–93 (2012)
2.
Zurück zum Zitat Zhou, Q.: Annotation scheme for Chinese treebank. J. Chin. Inf. 18(4), 1–8 (2004) Zhou, Q.: Annotation scheme for Chinese treebank. J. Chin. Inf. 18(4), 1–8 (2004)
3.
Zurück zum Zitat Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: Proceedings of 46th Annual Meetings of the Association for Computational Linguistics (ACL), pp. 595–603 (2008) Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: Proceedings of 46th Annual Meetings of the Association for Computational Linguistics (ACL), pp. 595–603 (2008)
4.
Zurück zum Zitat Candito, M., Crabbé, B.: Improving generative statistical parsing with semi-supervised word clustering. In: Proceedings of the 11th International Conference on Parsing Technologies. Association for Computational Linguistics, pp. 138–141 (2009) Candito, M., Crabbé, B.: Improving generative statistical parsing with semi-supervised word clustering. In: Proceedings of the 11th International Conference on Parsing Technologies. Association for Computational Linguistics, pp. 138–141 (2009)
5.
Zurück zum Zitat Liang, P.: Semi-supervised learning for natural language. Massachusetts Institute of Technology (2005) Liang, P.: Semi-supervised learning for natural language. Massachusetts Institute of Technology (2005)
6.
Zurück zum Zitat Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. J. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)MATH Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. J. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)MATH
7.
Zurück zum Zitat Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–497 (1992) Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–497 (1992)
8.
Zurück zum Zitat Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data, pp. 139–141 (2001) Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data, pp. 139–141 (2001)
9.
Zurück zum Zitat Sun, R., Liu, Q.: Chinese base noun phrase identification based on mutual information. J. Chin. Comput. Commun. 11, 71–72 (2012) Sun, R., Liu, Q.: Chinese base noun phrase identification based on mutual information. J. Chin. Comput. Commun. 11, 71–72 (2012)
10.
Zurück zum Zitat Meng, W., Zhu, H., Xu, Y.: A study of automatic acquisition of Chinese compound noun phrases based on corpus. J. Leshan Teach. 12, 57–61 (2014) Meng, W., Zhu, H., Xu, Y.: A study of automatic acquisition of Chinese compound noun phrases based on corpus. J. Leshan Teach. 12, 57–61 (2014)
11.
Zurück zum Zitat Guochen, L., Jianbing, D., et al.: Chinese base-chunk identification based on distributed character representation. J. Chin. Inf. 28(6), 18–25 (2014) Guochen, L., Jianbing, D., et al.: Chinese base-chunk identification based on distributed character representation. J. Chin. Inf. 28(6), 18–25 (2014)
12.
Zurück zum Zitat Kaixu, Z., Changle, Z.: Unsupervised feature learning for Chinese lexicon based on auto-encoder. J. Chin. Inf. 27(5), 1–7 (2013) Kaixu, Z., Changle, Z.: Unsupervised feature learning for Chinese lexicon based on auto-encoder. J. Chin. Inf. 27(5), 1–7 (2013)
13.
Zurück zum Zitat Munkhdalai, T., Li, M., Batsuren, K., et al.: Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations. J. Cheminf. 7, s9 (2015)CrossRef Munkhdalai, T., Li, M., Batsuren, K., et al.: Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations. J. Cheminf. 7, s9 (2015)CrossRef
14.
Zurück zum Zitat Wu, Y.-C.: A top-down information theoretic word clustering algorithm for phrase recognition. J. Inf. Sci. 275, 213–225 (2014)CrossRef Wu, Y.-C.: A top-down information theoretic word clustering algorithm for phrase recognition. J. Inf. Sci. 275, 213–225 (2014)CrossRef
15.
Zurück zum Zitat Zhu, L., Chao, L.S., Wong, D.F., et al.: A noun-phrase chunking model based on SBCB ensemble learning algorithm. In: International Conference on Machine Learning and Cybernetics (ICMLC). IEEE, pp. 11–16 (2012) Zhu, L., Chao, L.S., Wong, D.F., et al.: A noun-phrase chunking model based on SBCB ensemble learning algorithm. In: International Conference on Machine Learning and Cybernetics (ICMLC). IEEE, pp. 11–16 (2012)
16.
Zurück zum Zitat Konkol, M., Brychcín, T., Konopík, M.: Latent semantics in named entity recognition. J. Expert Syst. Appl. 42, 3470–3479 (2015)CrossRef Konkol, M., Brychcín, T., Konopík, M.: Latent semantics in named entity recognition. J. Expert Syst. Appl. 42, 3470–3479 (2015)CrossRef
17.
Zurück zum Zitat Yu, S., Huiming, D., Xuefeng, Z.: The basic processing of contemporary Chinese corpus at Peking university. J. Chin. Inf. Process. 16(5), 49–64 (2002) Yu, S., Huiming, D., Xuefeng, Z.: The basic processing of contemporary Chinese corpus at Peking university. J. Chin. Inf. Process. 16(5), 49–64 (2002)
18.
Zurück zum Zitat Wang, Z.: A contrastive study between English and Chinese of attributive-centered structure. Liaoning Normal University (2012) Wang, Z.: A contrastive study between English and Chinese of attributive-centered structure. Liaoning Normal University (2012)
19.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
21.
Zurück zum Zitat Qian, Y., Suen, C.Y.: Clustering combination method. In: 15th International Conference on IEEE, vol. 2, pp. 732–735 (2000) Qian, Y., Suen, C.Y.: Clustering combination method. In: 15th International Conference on IEEE, vol. 2, pp. 732–735 (2000)
Metadaten
Titel
Incorporating Word Clustering into Complex Noun Phrase Identification
verfasst von
Lihua Xue
Guiping Zhang
Qiaoli Zhou
Na Ye
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-25816-4_3