ABSTRACT
Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.
- S. Abney. 1997. Stochastic attribute-value grammars. Computational Linguistics, 23:597--618. Google ScholarDigital Library
- C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. 2003. An introduction to MCMC for machine learning. Machine Learning, 50:5--43.Google ScholarCross Ref
- A. Borthwick. 1999. A Maximum Entropy Approach to Named Entity Recognition. Ph.D. thesis, New York University. Google ScholarDigital Library
- R. Bunescu and R. J. Mooney. 2004. Collective information extraction with relational Markov networks. In Proceedings of the 42nd ACL, pages 439--446. Google ScholarDigital Library
- H. L. Chieu and H. T. Ng. 2002. Named entity recognition: a maximum entropy approach using global information. In Proceedings of the 19th Coling, pages 190--196. Google ScholarDigital Library
- R. G. Cowell, A. Philip Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer-Verlag, New York. Google ScholarDigital Library
- J. R. Curran and S. Clark. 2003. Language independent NER using a maximum entropy tagger. In Proceedings of the 7th CoNLL, pages 164--167. Google ScholarDigital Library
- S. Della Pietra, V. Della Pietra, and J. Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:380--393. Google ScholarDigital Library
- J. Finkel, S. Dingare, H. Nguyen, M. Nissim, and C. D. Manning. 2004. Exploiting context for biomedical entity recognition: from syntax to the web. In Joint Workshop on Natural Language Processing in Biomedicine and Its Applications at Coling 2004. Google ScholarDigital Library
- D. Freitag and A. McCallum. 1999. Information extraction with HMMs and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction. Google ScholarDigital Library
- D. Freitag. 1998. Machine learning for information extraction in informal domains. Ph.D. thesis, Carnegie Mellon University. Google ScholarDigital Library
- S. Geman and D. Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transitions on Pattern Analysis and Machine Intelligence, 6:721--741.Google ScholarDigital Library
- M. Kim, Y. S. Han, and K. Choi. 1995. Collocation map for overcoming data sparseness. In Proceedings of the 7th EACL, pages 53--59. Google ScholarDigital Library
- S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science, 220:671--680.Google ScholarCross Ref
- P. J. Van Laarhoven and E. H. L. Arts. 1987. Simulated Annealing: Theory and Applications. Reidel Publishers. Google ScholarCross Ref
- J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th ICML, pages 282--289. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- T. R. Leek. 1997. Information extraction using hidden Markov models. Master's thesis, U.C. San Diego.Google Scholar
- R. Malouf. 2002. Markov models for language-independent named entity recognition. In Proceedings of the 6th CoNLL, pages 187--190. Google ScholarDigital Library
- A. Mikheev, M. Moens, and C. Grover. 1999. Named entity recognition without gazetteers. In Proceedings of the 9th EACL, pages 1--8. Google ScholarDigital Library
- L. R. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--286.Google ScholarCross Ref
- C. Sutton and A. McCallum. 2004. Collective segmentation and labeling of distant entities in information extraction. In ICML Workshop on Statistical Relational Learning and Its connections to Other Fields.Google Scholar
- B. Taskar, P. Abbeel, and D. Koller. 2002. Discriminative probabilistic models for relational data. In Proceedings of the 18th Conference on Uncertianty in Artificial Intelligence (UAI-02), pages 485--494, Edmonton, Canada. Google ScholarDigital Library
- Incorporating non-local information into information extraction systems by Gibbs sampling
Recommendations
Online but accurate inference for latent variable models with local Gibbs sampling
We study parameter inference in large-scale latent variable models. We first propose a unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed ...
Likelihood-free approximate Gibbs sampling
AbstractLikelihood-free methods such as approximate Bayesian computation (ABC) have extended the reach of statistical inference to problems with computationally intractable likelihoods. Such approaches perform well for small-to-moderate dimensional ...
Hierarchical hidden conditional random fields for information extraction
LION'05: Proceedings of the 5th international conference on Learning and Intelligent OptimizationHidden Markov Models (HMMs) are very popular generative models for time series data. Recent work, however, has shown that for many tasks Conditional Random Fields (CRFs), a type of discriminative model, perform better than HMMs. Information extraction ...
Comments