Article

Free Access

A maximum entropy approach to identifying sentence boundaries

Authors:
Jeffrey C. Reynar

University of Pennsylvania, Philadelphia, Pennsylvania

University of Pennsylvania, Philadelphia, Pennsylvania
View Profile

,
Adwait Ratnaparkhi

University of Pennsylvania, Philadelphia, Pennsylvania

University of Pennsylvania, Philadelphia, Pennsylvania
View Profile

ANLC '97: Proceedings of the fifth conference on Applied natural language processingMarch 1997Pages 16–19https://doi.org/10.3115/974557.974561

Published:31 March 1997Publication History

ANLC '97: Proceedings of the fifth conference on Applied natural language processing

Pages 16–19

ABSTRACT

We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lexica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Romanalphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains.

References

Brill, Eric. 1994. Some advances in transformation-based part-of-speech tagging. In Proceedings of the Twelfth National Conference on Artificial Intelligence, volume 1, pages 722--727.]] Google ScholarDigital Library
Collins, Michael. 1996. A new statistical parser based on bigram lexical dependencies. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, June.]] Google ScholarDigital Library
Cutting, Doug, Julian Kupiec. Jan Pedersen, and Penelope Sibun. 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, pages 133--140, Trento, Italy, April.]] Google ScholarDigital Library
Darroch, J. N. and D. Ratcliff. 1972. Generalized Iterative Scaling for Log-Linear Models. The Annals of Mathematical Statistics, 43(5):1470--1480.]]Google ScholarCross Ref
Liberman, Mark Y. and Kenneth W. Church. 1992. Text analysis and word pronunciation in text-to-speech synthesis. In Sadaoki Furui and M. Mohan Sondi, editors, Advances in Speech Signal Processing. Marcel Dekker, Incorporated, New York.]]Google Scholar
Marcus, Mitchell, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313--330.]] Google ScholarDigital Library
Nunberg, Geoffrey. 1990. The Linguistics of Punctuation. Number 18 in CSLI Lecture Notes. University of Chicago Press.]]Google Scholar
Palmer, David D. and Marti A. Hearst. 1994. Adaptive sentence boundary disambiguation. In Proceedings of the 1994 conference on Applied Natural Language Processing (ANLP). Stuttgart, Germany, October.]] Google ScholarDigital Library
Palmer, David D. and Marti A. Hearst: To appear. Adaptive multilingual sentence boundary disambiguation. Computational Linguistics.]] Google ScholarDigital Library
Ratnaparkhi, Adwait. 1996. A maximum entropy model for part-of-speech tagging. In Conference on Empirical Methods in Natural Language Processing, pages 133--142, University of Pennsylvania, May 17--18.]]Google Scholar
Riley, Michael D. 1989. Some applications of tree-based modelling to speech and language. In DARPA Speech and Language Technology Workshop, pages 339--352, Cape Cod, Massachusetts.]] Google ScholarDigital Library
White, Michael. 1995. Presenting punctuation. In Proceedings of the Fifth European Workshop on Natural Language Generation, pages 107--125, Leiden. The Netherlands.]]Google Scholar

A maximum entropy approach to identifying sentence boundaries
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Tagging sentence boundaries
NAACL 2000: Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference

In this paper we tackle sentence boundary disambiguation through a part-of-speech (POS) tagging framework. We describe necessary changes in text tokenization and the implementation of a POS tagger and provide results of an evaluation of this system on ...
Read More
Prosodic word prediction using a maximum entropy approach
ISCSLP'06: Proceedings of the 5th international conference on Chinese Spoken Language Processing

As the basic prosodic unit, the prosodic word influences the naturalness and the intelligibility greatly. Although the research shows that the lexicon word are greatly different from the prosodic word, the lexicon word still provides the important cues ...
Read More
Maximum entropy models for word sense disambiguation
COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

A maximum entropy-based word sense disambiguation system is presented, consisting of individual word experts that are trained on both labeled and partially labeled corpora. The classification probabilities from the individual word experts are integrated ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ANLC '97: Proceedings of the fifth conference on Applied natural language processing
March 1997
417 pages
Program Chair:
Ralph Grishman
New York University, New York, NY
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 31 March 1997
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 94
  Total Citations
  View Citations
- 1,729
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A maximum entropy approach to identifying sentence boundaries

ANLC '97: Proceedings of the fifth conference on Applied natural language processing

ABSTRACT

References

Cited By

Recommendations

Tagging sentence boundaries

Prosodic word prediction using a maximum entropy approach

Maximum entropy models for word sense disambiguation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A maximum entropy approach to identifying sentence boundaries

ANLC '97: Proceedings of the fifth conference on Applied natural language processing

ABSTRACT

References

Cited By

Recommendations

Tagging sentence boundaries

Prosodic word prediction using a maximum entropy approach

Maximum entropy models for word sense disambiguation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media