Top

Published in:

2020 | OriginalPaper | Chapter

High Order N-gram Model Construction and Application Based on Natural Annotation

Authors : Qibo Wang, Gaoqi Rao, Endong Xun

Published in: Chinese Lexical Semantics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The language model based on the n-gram grammar plays an important role in NLP tasks. In this paper, language models based on language boundary are proposed to conquer the challenge of the very big language data: intra-sentence boundary model and inter-sentence boundary model. We developed a training tool on the Hadoop platform based on MapReduce programming, and conducted the prefix tree to compress and store the model. We implemented our model in identifying the boundary in the syntactic parsing, achieving a good result.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Microblog Sentiment Classification Method Based on Dual Attention Mechanism and Bidirectional LSTM

next chapter A Printed Chinese Character Recognition Method Based on Area Brightness Feature

Li, Z., Sun, M.: Punctuation as implicit annotations for Chinese word segmentation. Comput. Linguist. 35(4), 505–512 (2009)CrossRef

Rao, G., et al.: Natural annotation research in large-scale corpora with a focus on Chinese word segmentation. Acta Sci. Nat. Univ. Pekin. 49(1), 140–146 (2013)

Rosenfeld, R., Carbonell, J., Rudnicky, A., et al.: Adaptive statistical language modeling: a maximum entropy approach. A maximum entropy approach (1994)

Huang, X., Alleva, F., Hon, H.W., et al.: The SPHINX-II speech recognition system: an overview. Comput. Speech Lang. 7(2), 137–148 (1992)CrossRef

Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependences in stochastic language modelling. Comput. Speech Lang. 8(1), 1–38 (1994)CrossRef

Brown, P.F., Desouza, P.V., Mercer, R.L., et al.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)

Goodman, J.T.: A bit of progress in language modeling. Comput. Speech Lang. 15(4), 403–434 (2001)CrossRef

Kuhn, R.: Speech recognition and the frequency of recently used words: a modified Markov model for natural language. In: Proceedings of ACL, pp. 348–350 (1988)

Kuhn, R., De Mori, R.: A cache-based natural language model for speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. 14(6), 219–228 (1992)

10.

Kuhn, R., Mori, R.D.: Correction to: a cache-based natural language model for speech re-production (1992)

11.

Stolcke, A.: SRILM-an extensible language modeling toolkit. In: Interspeech, pp. 17–43 (2002)

12.

Federico, M., Cettolo, M.: Efficient handling of n-gram language models for statistical machine translation. In: Proceedings of the 2nd WSMT, pp. 88–95. ACL (2007)

13.

Nguyen, P., Gao, J., Mahajan, M.: MSRLM: a scalable language modeling toolkit. Microsoft Research MSR-TR-2007-144 (2007)

14.

Zhang, R.: Research on Large Model and Its Application in Machine Translation, Ph.D thesis of Xiamen University (2009)

15.

Zhang, Y., Hildebrand, A.S., Vogel, S.: Distributed language modeling for N-best list re-ranking. In: EMNLP, pp. 216–223 (2007)

16.

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef

17.

Yu, X.: Estimating language models using Hadoop and HBase. Ph.D thesis of University of Edinburgh (2008)

18.

Zhou, Q., Sun, M., Huang, C.: Automatic identification of Chinese maximal noun phrases. J. Softw. 11(2), 195–201 (2000)

19.

Zhao, J., Huang, C.: Chinese basic noun phrase recognition model based on conversion. J. Chin. Inf. Process. 13(2), 1–7 (1999)

20.

Li, H., Yang, F., Zhu, J.: Transductive HMM based text chunking. Comput. Sci. 31(2), 152–154 (2004)

21.

Ma, Y., Liu, Y.: Base noun phrase identification based on HMM and candidates sorting by weighted templates. In: Proceedings of CCL (2005)

22.

Liu, F., Zhao, T., Yu, H.: Statistics based Chinese chunk Parsin. J. Chin. Inf. Process. 14(6), 28–32 (2000)

23.

Huang, D., Wang, Y.: Chunk parsing based on SVM and error-driven learning methods. J. Chin. Inf. Process. 20(6), 17–24 (2006)

24.

Li, Y., Zhu, J., Yao, T.: Combined multiple classifiers based on a stacking algorithm and their application to Chinese text Chinese text chunking. J. Comput. Res. Dev. 42(5), 844–848 (2005)CrossRef

25.

Liu, S., Li, Y., Zhang, L.: Chinese text chunking using co-training method. J. Chin. Inf. Process. 19(3), 73–79 (2005)

Title: High Order N-gram Model Construction and Application Based on Natural Annotation
Authors: Qibo Wang
Gaoqi Rao
Endong Xun
Publisher: Springer International Publishing
Book: Chinese Lexical Semantics
Print ISBN: 978-3-030-38188-2

Electronic ISBN: 978-3-030-38189-9

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-38189-9_34

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner