Skip to main content
Top

2014 | OriginalPaper | Chapter

8. Part-of-Speech Tagging Using Statistical Techniques

Author : Pierre M. Nugues

Published in: Language Processing with Perl and Prolog

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Like transformation-based tagging, statistical part-of-speech (POS) tagging assumes that each word is known and has a finite set of possible tags. These tags can be drawn from a dictionary or a morphological analysis. Statistical methods enable us to determine a sequence of part-of-speech tags \(T = t_{1},t_{2},t_{3},\ldots,t_{n}\), given a sequence of words \(W = w_{1},w_{2},w_{3},\ldots,w_{n}\).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311. Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.
go back to reference Carlberger, J., & Kann, V. (1999). Implementing an efficient part-of-speech tagger. Software – Practice and Experience, 29(2), 815–832.CrossRef Carlberger, J., & Kann, V. (1999). Implementing an efficient part-of-speech tagger. Software – Practice and Experience, 29(2), 815–832.CrossRef
go back to reference Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27.
go back to reference Church, K. W. (1988). A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the second conference on applied natural language processing, Austin (pp. 136–143). ACL. Church, K. W. (1988). A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the second conference on applied natural language processing, Austin (pp. 136–143). ACL.
go back to reference Collins, M. J. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 conference on empirical methods in natural language processing, Prague (pp. 1–8). Collins, M. J. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 conference on empirical methods in natural language processing, Prague (pp. 1–8).
go back to reference Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATH Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATH
go back to reference Gale, W. A., & Church, K. W. (1993). A program for aligning sentences in bilingual corpora. Computational Linguistics, 19(1), 75–102. Gale, W. A., & Church, K. W. (1993). A program for aligning sentences in bilingual corpora. Computational Linguistics, 19(1), 75–102.
go back to reference Giménez, J., & Màrquez, L. (2004). SVMTool: A general POS tagger generator based on support vector machines. In Proceedings of the 4th international conference on language resources and evaluation (LREC’04), Lisbon (pp. 43–46). Giménez, J., & Màrquez, L. (2004). SVMTool: A general POS tagger generator based on support vector machines. In Proceedings of the 4th international conference on language resources and evaluation (LREC’04), Lisbon (pp. 43–46).
go back to reference Halácsy, P., Kornai, A., & Oravecz, C. (2007). HunPos – an open source trigram tagger. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague (pp. 209–212). Halácsy, P., Kornai, A., & Oravecz, C. (2007). HunPos – an open source trigram tagger. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague (pp. 209–212).
go back to reference Kernighan, M. D., Church, K. W., & Gale, W. A. (1990). A spelling correction program based on a noisy channel model. In Papers presented to the 13th international conference on computational linguistics (COLING-90), Helsinki (Vol. II, pp. 205–210). Kernighan, M. D., Church, K. W., & Gale, W. A. (1990). A spelling correction program based on a noisy channel model. In Papers presented to the 13th international conference on computational linguistics (COLING-90), Helsinki (Vol. II, pp. 205–210).
go back to reference Koehn, P. (2010). Statistical machine translation. Cambridge: Cambridge University Press.MATH Koehn, P. (2010). Statistical machine translation. Cambridge: Cambridge University Press.MATH
go back to reference Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (ICML-01), Williamstown (pp. 282–289). Morgan Kaufmann Publishers. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (ICML-01), Williamstown (pp. 282–289). Morgan Kaufmann Publishers.
go back to reference Marcus, M., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330. Marcus, M., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
go back to reference Merialdo, B. (1994). Tagging English text with a probabilistic model. Computational Linguistics, 20(2), 155–171. Merialdo, B. (1994). Tagging English text with a probabilistic model. Computational Linguistics, 20(2), 155–171.
go back to reference Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51.CrossRefMATH Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51.CrossRefMATH
go back to reference Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of 40th annual meeting of the association for computational linguistics, Philadelphia (pp. 311–318). Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of 40th annual meeting of the association for computational linguistics, Philadelphia (pp. 311–318).
go back to reference Petrov, S., Das, D., & McDonald, R. (2012). A universal part-of-speech tagset. In Proceedings of the eighth international conference on language resources and evaluation (LREC 2012), Istanbul (pp. 2089–2096). Petrov, S., Das, D., & McDonald, R. (2012). A universal part-of-speech tagset. In Proceedings of the eighth international conference on language resources and evaluation (LREC 2012), Istanbul (pp. 2089–2096).
go back to reference Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRef Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRef
go back to reference Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In E. Brill & K. Church (Eds.), Proceedings of the conference on empirical methods in natural language processing, Philadelphia (pp. 133–142). Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In E. Brill & K. Church (Eds.), Proceedings of the conference on empirical methods in natural language processing, Philadelphia (pp. 133–142).
go back to reference Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of international conference on new methods in language processing, Manchester. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of international conference on new methods in language processing, Manchester.
go back to reference Schmid, H. (1995). Improvements in part-of-speech tagging with an application to German. In Proceedings of the ACL SIGDAT workshop, Dublin. Schmid, H. (1995). Improvements in part-of-speech tagging with an application to German. In Proceedings of the ACL SIGDAT workshop, Dublin.
go back to reference Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 398–403; 623–656.CrossRefMathSciNet Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 398–403; 623–656.CrossRefMathSciNet
go back to reference Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–267.CrossRefMATH Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–267.CrossRefMATH
Metadata
Title
Part-of-Speech Tagging Using Statistical Techniques
Author
Pierre M. Nugues
Copyright Year
2014
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-41464-0_8

Premium Partner