Skip to main content

2014 | OriginalPaper | Buchkapitel

8. Part-of-Speech Tagging Using Statistical Techniques

verfasst von : Pierre M. Nugues

Erschienen in: Language Processing with Perl and Prolog

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Like transformation-based tagging, statistical part-of-speech (POS) tagging assumes that each word is known and has a finite set of possible tags. These tags can be drawn from a dictionary or a morphological analysis. Statistical methods enable us to determine a sequence of part-of-speech tags \(T = t_{1},t_{2},t_{3},\ldots,t_{n}\), given a sequence of words \(W = w_{1},w_{2},w_{3},\ldots,w_{n}\).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311. Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.
Zurück zum Zitat Carlberger, J., & Kann, V. (1999). Implementing an efficient part-of-speech tagger. Software – Practice and Experience, 29(2), 815–832.CrossRef Carlberger, J., & Kann, V. (1999). Implementing an efficient part-of-speech tagger. Software – Practice and Experience, 29(2), 815–832.CrossRef
Zurück zum Zitat Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27.
Zurück zum Zitat Church, K. W. (1988). A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the second conference on applied natural language processing, Austin (pp. 136–143). ACL. Church, K. W. (1988). A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the second conference on applied natural language processing, Austin (pp. 136–143). ACL.
Zurück zum Zitat Collins, M. J. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 conference on empirical methods in natural language processing, Prague (pp. 1–8). Collins, M. J. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 conference on empirical methods in natural language processing, Prague (pp. 1–8).
Zurück zum Zitat Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATH Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATH
Zurück zum Zitat Gale, W. A., & Church, K. W. (1993). A program for aligning sentences in bilingual corpora. Computational Linguistics, 19(1), 75–102. Gale, W. A., & Church, K. W. (1993). A program for aligning sentences in bilingual corpora. Computational Linguistics, 19(1), 75–102.
Zurück zum Zitat Giménez, J., & Màrquez, L. (2004). SVMTool: A general POS tagger generator based on support vector machines. In Proceedings of the 4th international conference on language resources and evaluation (LREC’04), Lisbon (pp. 43–46). Giménez, J., & Màrquez, L. (2004). SVMTool: A general POS tagger generator based on support vector machines. In Proceedings of the 4th international conference on language resources and evaluation (LREC’04), Lisbon (pp. 43–46).
Zurück zum Zitat Halácsy, P., Kornai, A., & Oravecz, C. (2007). HunPos – an open source trigram tagger. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague (pp. 209–212). Halácsy, P., Kornai, A., & Oravecz, C. (2007). HunPos – an open source trigram tagger. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague (pp. 209–212).
Zurück zum Zitat Kernighan, M. D., Church, K. W., & Gale, W. A. (1990). A spelling correction program based on a noisy channel model. In Papers presented to the 13th international conference on computational linguistics (COLING-90), Helsinki (Vol. II, pp. 205–210). Kernighan, M. D., Church, K. W., & Gale, W. A. (1990). A spelling correction program based on a noisy channel model. In Papers presented to the 13th international conference on computational linguistics (COLING-90), Helsinki (Vol. II, pp. 205–210).
Zurück zum Zitat Koehn, P. (2010). Statistical machine translation. Cambridge: Cambridge University Press.MATH Koehn, P. (2010). Statistical machine translation. Cambridge: Cambridge University Press.MATH
Zurück zum Zitat Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (ICML-01), Williamstown (pp. 282–289). Morgan Kaufmann Publishers. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (ICML-01), Williamstown (pp. 282–289). Morgan Kaufmann Publishers.
Zurück zum Zitat Marcus, M., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330. Marcus, M., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Zurück zum Zitat Merialdo, B. (1994). Tagging English text with a probabilistic model. Computational Linguistics, 20(2), 155–171. Merialdo, B. (1994). Tagging English text with a probabilistic model. Computational Linguistics, 20(2), 155–171.
Zurück zum Zitat Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51.CrossRefMATH Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51.CrossRefMATH
Zurück zum Zitat Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of 40th annual meeting of the association for computational linguistics, Philadelphia (pp. 311–318). Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of 40th annual meeting of the association for computational linguistics, Philadelphia (pp. 311–318).
Zurück zum Zitat Petrov, S., Das, D., & McDonald, R. (2012). A universal part-of-speech tagset. In Proceedings of the eighth international conference on language resources and evaluation (LREC 2012), Istanbul (pp. 2089–2096). Petrov, S., Das, D., & McDonald, R. (2012). A universal part-of-speech tagset. In Proceedings of the eighth international conference on language resources and evaluation (LREC 2012), Istanbul (pp. 2089–2096).
Zurück zum Zitat Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRef Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRef
Zurück zum Zitat Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In E. Brill & K. Church (Eds.), Proceedings of the conference on empirical methods in natural language processing, Philadelphia (pp. 133–142). Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In E. Brill & K. Church (Eds.), Proceedings of the conference on empirical methods in natural language processing, Philadelphia (pp. 133–142).
Zurück zum Zitat Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of international conference on new methods in language processing, Manchester. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of international conference on new methods in language processing, Manchester.
Zurück zum Zitat Schmid, H. (1995). Improvements in part-of-speech tagging with an application to German. In Proceedings of the ACL SIGDAT workshop, Dublin. Schmid, H. (1995). Improvements in part-of-speech tagging with an application to German. In Proceedings of the ACL SIGDAT workshop, Dublin.
Zurück zum Zitat Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 398–403; 623–656.CrossRefMathSciNet Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 398–403; 623–656.CrossRefMathSciNet
Zurück zum Zitat Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–267.CrossRefMATH Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–267.CrossRefMATH
Metadaten
Titel
Part-of-Speech Tagging Using Statistical Techniques
verfasst von
Pierre M. Nugues
Copyright-Jahr
2014
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-41464-0_8

Premium Partner