Article

Free Access

Incorporating non-local information into information extraction systems by Gibbs sampling

Authors:
Jenny Rose Finkel

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Trond Grenager

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Christopher Manning

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational LinguisticsJune 2005Pages 363–370https://doi.org/10.3115/1219840.1219885

Published:25 June 2005Publication History

ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Pages 363–370

ABSTRACT

Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.

References

S. Abney. 1997. Stochastic attribute-value grammars. Computational Linguistics, 23:597--618. Google ScholarDigital Library
C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. 2003. An introduction to MCMC for machine learning. Machine Learning, 50:5--43.Google ScholarCross Ref
A. Borthwick. 1999. A Maximum Entropy Approach to Named Entity Recognition. Ph.D. thesis, New York University. Google ScholarDigital Library
R. Bunescu and R. J. Mooney. 2004. Collective information extraction with relational Markov networks. In Proceedings of the 42nd ACL, pages 439--446. Google ScholarDigital Library
H. L. Chieu and H. T. Ng. 2002. Named entity recognition: a maximum entropy approach using global information. In Proceedings of the 19th Coling, pages 190--196. Google ScholarDigital Library
R. G. Cowell, A. Philip Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer-Verlag, New York. Google ScholarDigital Library
J. R. Curran and S. Clark. 2003. Language independent NER using a maximum entropy tagger. In Proceedings of the 7th CoNLL, pages 164--167. Google ScholarDigital Library
S. Della Pietra, V. Della Pietra, and J. Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:380--393. Google ScholarDigital Library
J. Finkel, S. Dingare, H. Nguyen, M. Nissim, and C. D. Manning. 2004. Exploiting context for biomedical entity recognition: from syntax to the web. In Joint Workshop on Natural Language Processing in Biomedicine and Its Applications at Coling 2004. Google ScholarDigital Library
D. Freitag and A. McCallum. 1999. Information extraction with HMMs and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction. Google ScholarDigital Library
D. Freitag. 1998. Machine learning for information extraction in informal domains. Ph.D. thesis, Carnegie Mellon University. Google ScholarDigital Library
S. Geman and D. Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transitions on Pattern Analysis and Machine Intelligence, 6:721--741.Google ScholarDigital Library
M. Kim, Y. S. Han, and K. Choi. 1995. Collocation map for overcoming data sparseness. In Proceedings of the 7th EACL, pages 53--59. Google ScholarDigital Library
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science, 220:671--680.Google ScholarCross Ref
P. J. Van Laarhoven and E. H. L. Arts. 1987. Simulated Annealing: Theory and Applications. Reidel Publishers. Google ScholarCross Ref
J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th ICML, pages 282--289. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
T. R. Leek. 1997. Information extraction using hidden Markov models. Master's thesis, U.C. San Diego.Google Scholar
R. Malouf. 2002. Markov models for language-independent named entity recognition. In Proceedings of the 6th CoNLL, pages 187--190. Google ScholarDigital Library
A. Mikheev, M. Moens, and C. Grover. 1999. Named entity recognition without gazetteers. In Proceedings of the 9th EACL, pages 1--8. Google ScholarDigital Library
L. R. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--286.Google ScholarCross Ref
C. Sutton and A. McCallum. 2004. Collective segmentation and labeling of distant entities in information extraction. In ICML Workshop on Statistical Relational Learning and Its connections to Other Fields.Google Scholar
B. Taskar, P. Abbeel, and D. Koller. 2002. Discriminative probabilistic models for relational data. In Proceedings of the 18th Conference on Uncertianty in Artificial Intelligence (UAI-02), pages 485--494, Edmonton, Canada. Google ScholarDigital Library

Incorporating non-local information into information extraction systems by Gibbs sampling
1. Hardware
  1. Power and energy
    1. Power estimation and optimization
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic reasoning algorithms

Recommendations

Online but accurate inference for latent variable models with local Gibbs sampling

We study parameter inference in large-scale latent variable models. We first propose a unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed ...
Read More
Likelihood-free approximate Gibbs sampling
Abstract
Likelihood-free methods such as approximate Bayesian computation (ABC) have extended the reach of statistical inference to problems with computationally intractable likelihoods. Such approaches perform well for small-to-moderate dimensional ...
Read More
Hierarchical hidden conditional random fields for information extraction
LION'05: Proceedings of the 5th international conference on Learning and Intelligent Optimization

Hidden Markov Models (HMMs) are very popular generative models for time series data. Recent work, however, has shown that for many tasks Conditional Random Fields (CRFs), a type of discriminative model, perform better than HMMs. Information extraction ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
June 2005
657 pages
General Chair:
Kevin Knight
University of Southern California
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 25 June 2005
Qualifiers
- Article
Conference

Acceptance Rates
ACL '05 Paper Acceptance Rate77of423submissions,18%Overall Acceptance Rate85of443submissions,19%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 543
  Total Citations
  View Citations
- 3,908
  Total Downloads
- Downloads (Last 12 months)132
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Online but accurate inference for latent variable models with local Gibbs sampling

Likelihood-free approximate Gibbs sampling

Hierarchical hidden conditional random fields for information extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Online but accurate inference for latent variable models with local Gibbs sampling

Likelihood-free approximate Gibbs sampling

Hierarchical hidden conditional random fields for information extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media