skip to main content
10.3115/1073445.1073478dlproceedingsArticle/Chapter ViewAbstractPublication PagesnaaclConference Proceedingsconference-collections
Article
Free Access

Feature-rich part-of-speech tagging with a cyclic dependency network

Published:27 May 2003Publication History

ABSTRACT

We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24% accuracy on the Penn Treebank WSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.

References

  1. Steven Abney, Robert E. Schapire, and Yoram Singer. 1999. Boosting applied to tagging and PP attachment. In EMNLP/VLC 1999, pages 38--45.Google ScholarGoogle Scholar
  2. Thorsten Brants. 2000. TnT -- a statistical part-of-speech tagger. In ANLP 6, pages 224--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Eric Brill and Jun Wu. 1998. Classifier combination for improved lexical disambiguation. In ACL 36/COLING 17, pages 191--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eric Brill. 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Eugene Charniak, Curtis Hendrickson, Neil Jacobson, and Mike Perkowitz. 1993. Equations for part-of-speech tagging. In AAAI 11, pages 784--789.Google ScholarGoogle Scholar
  6. Stanley F. Chen and Ronald Rosenfeld. 2000. A survey of smoothing techniques for maximum entropy models. IEEE Transactions on Speech and Audio Processing, 8(1):37--50.Google ScholarGoogle ScholarCross RefCross Ref
  7. Kenneth W. Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In ANLP 2, pages 136--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Michael Collins. 2002. Discriminative training methods for Hidden Markov Models: Theory and experiments with perceptron algorithms. In EMNLP 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Robert G. Cowell, A. Philip Dawid, Steffen L. Lauritzen, and David J. Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer-Verlag, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. David Heckerman, David Maxwell Chickering, Christopher Meek, Robert Rounthwaite, and Carl Myers Kadie. 2000. Dependency networks for inference, collaborative filtering and data visualization. Journal of Machine Learning Research, 1(1):49--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mark Johnson, Stuart Geman, Stephen Canon, Zhiyi Chi, and Stefan Riezler. 1999. Estimators for stochastic "unification-based" grammars. In ACL 37, pages 535--541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dan Klein and Christopher D. Manning. 2002. Conditional structure versus conditional estimation in NLP models. In EMNLP 2002, pages 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML-2001, pages 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sang-Zoo Lee, Jun ichi Tsujii, and Hae-Chang Rim. 2000. Part-of-speech tagging based on Hidden Markov Model assuming joint independence. In ACL 38, pages 263--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mitchell P. Marcus, Beatrice Santorini, and Mary A. Marcinkiewicz. 1994. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19:313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ian Marshall. 1987. Tag selection using probabilistic methods. In Roger Garside, Geoffrey Sampson, and Geoffrey Leech, editors, The Computational analysis of English: a corpus-based approach, pages 42--65. Longman, London.Google ScholarGoogle Scholar
  17. Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In EMNLP 1, pages 133--142.Google ScholarGoogle Scholar
  18. Scott M. Thede and Mary P. Harper. 1999. Second-order hidden Markov model for part-of-speech tagging. In ACL 37, pages 175--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kristina Toutanova and Christopher Manning. 2000. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In EMNLP/VLC 1999, pages 63--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tong Zhang and Frank J. Oles. 2001. Text categorization based on regularized linear classification methods. Information Retrieval, 4:5--31. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
    May 2003
    293 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 27 May 2003

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate21of29submissions,72%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader