Article

Free Access

Effective self-training for parsing

Authors:
David McClosky

Brown University, Providence, RI

Brown University, Providence, RI
View Profile

,
Eugene Charniak

Brown University, Providence, RI

Brown University, Providence, RI
View Profile

,
Mark Johnson

Brown University, Providence, RI

Brown University, Providence, RI
View Profile

HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational LinguisticsJune 2006Pages 152–159https://doi.org/10.3115/1220835.1220855

Published:04 June 2006Publication History

HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

Pages 152–159

ABSTRACT

We present a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f-score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.

References

Michiel Bacchiani, Michael Riley, Brian Roark, and Richard Sproat. 2006. MAP adaptation of stochastic grammars. Computer Speech and Language, 20(1):41--68.]] Google ScholarDigital Library
Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98).]] Google ScholarDigital Library
Eugene Charniak and Mark Johnson. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 173--180, Ann Arbor, Michigan, June. Association for Computational Linguistics.]] Google ScholarDigital Library
Eugene Charniak. 1997. Statistical parsing with a context-free grammar and word statistics. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, Menlo Park. AAAI Press/MIT Press.]]Google Scholar
Eugene Charniak. 2000. A maximum-entropy-inspired parser. In 1st Annual Meeting of the NAACL.]] Google ScholarDigital Library
Stanley F. Chen and Joshua Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Arivind Joshi and Martha Palmer, editors, Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics.]] Google ScholarDigital Library
Stephen Clark, James Curran, and Miles Osborne. 2003. Bootstrapping POS-taggers using unlabelled data. In Proceedings of CoNLL-2003.]] Google ScholarDigital Library
Michael Collins. 2000. Discriminative reranking for natural language parsing. In Machine Learning: Proceedings of the 17th International Conference (ICML 2000), pages 175--182, Stanford, California.]] Google ScholarDigital Library
Sanjoy Dasgupta, M. L. Littman, and D. McAllester. 2001. PAC generalization bounds for co-training. In Advances in Neural Information Processing Systems (NIPS), 2001.]]Google Scholar
Daniel Gildea. 2001. Corpus variation and parser performance. In Conference on Empirical Methods in Natural Language Processing (EMNLP).]]Google Scholar
David Graff. 1995. North American News Text Corpus. Linguistic Data Consortium. LDC95T21.]]Google Scholar
James Henderson. 2004. Discriminative training of a neural network statistical parser. In Proc. 42nd Meeting of Association for Computational Linguistics (ACL 2004), Barcelona, Spain.]] Google ScholarDigital Library
Donald Hindle and Mats Rooth. 1993. Structural ambiguity and lexical relations. Computational Linguistics, 19(1):103--120.]] Google ScholarDigital Library
Liang Huang and David Chang. 2005. Better k-best parsing. Technical Report MS-CIS-05-08, Department of Computer Science, University of Pennsylvania.]]Google Scholar
Victor M. Jimenez and Andres Marzal. 2000. Computation of the n best parse trees for weighted and stochastic context-free grammars. In Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition. Springer LNCS 1876.]] Google ScholarDigital Library
Mark Johnson, Stuart Geman, Stephen Canon, Zhiyi Chi, and Stefan Riezler. 1999. Estimators for stochastic "unification-based" grammars. In The Proceedings of the 37th Annual Conference of the Association for Computational Linguistics, pages 535--541, San Francisco. Morgan Kaufmann.]] Google ScholarDigital Library
Dan Klein and Christopher Manning. 2002. A generative constituent-context model for improved grammar induction. In Proceedings of the 40th Annual Meeting of the ACL.]] Google ScholarDigital Library
Michell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330.]] Google ScholarDigital Library
Anoop Sarkar. 2001. Applying cotraining methods to statistical parsing. In Proceedings of the 2001 NAACL Conference.]] Google ScholarDigital Library
Mark Steedman, Miles Osborne, Anoop Sarkar, Stephen Clark, Rebecca Hwa, Julia Hockenmaier, Paul Ruhlen, Steven Baker, and Jeremiah Crim. 2003. Bootstrapping statistical parsers from small datasets. In Proceedings of EACL 03.]] Google ScholarDigital Library

Effective self-training for parsing
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Self-training PCFG grammars with latent annotations across languages
EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

We investigate the effectiveness of self-training PCFG grammars with latent annotations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared to Charniak's lexicalized parser, the PCFG-LA parser was more effectively ...
Read More
LLLR parsing
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing

The idea of an LLLR parsing is presented. An LLLR(k) parser can be constructed for any LR(k) grammar but it produces the left parse of the input string in linear time (in respect to the length of the derivation) without backtracking. If used as a basis ...
Read More
Parsing minimalist languages
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
June 2006
522 pages
General Chair:
Robert C. Moore
Microsoft Research
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 4 June 2006
Qualifiers
- Article
Conference

Acceptance Rates
HLT-NAACL '06 Paper Acceptance Rate62of257submissions,24%Overall Acceptance Rate240of768submissions,31%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 121
  Total Citations
  View Citations
- 1,212
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effective self-training for parsing

HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Self-training PCFG grammars with latent annotations across languages

LLLR parsing

Parsing minimalist languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Effective self-training for parsing

HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Self-training PCFG grammars with latent annotations across languages

LLLR parsing

Parsing minimalist languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media