skip to main content
10.3115/1219840.1219885dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

Incorporating non-local information into information extraction systems by Gibbs sampling

Published:25 June 2005Publication History

ABSTRACT

Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.

References

  1. S. Abney. 1997. Stochastic attribute-value grammars. Computational Linguistics, 23:597--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. 2003. An introduction to MCMC for machine learning. Machine Learning, 50:5--43.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Borthwick. 1999. A Maximum Entropy Approach to Named Entity Recognition. Ph.D. thesis, New York University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Bunescu and R. J. Mooney. 2004. Collective information extraction with relational Markov networks. In Proceedings of the 42nd ACL, pages 439--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. L. Chieu and H. T. Ng. 2002. Named entity recognition: a maximum entropy approach using global information. In Proceedings of the 19th Coling, pages 190--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. G. Cowell, A. Philip Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer-Verlag, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. R. Curran and S. Clark. 2003. Language independent NER using a maximum entropy tagger. In Proceedings of the 7th CoNLL, pages 164--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Della Pietra, V. Della Pietra, and J. Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:380--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Finkel, S. Dingare, H. Nguyen, M. Nissim, and C. D. Manning. 2004. Exploiting context for biomedical entity recognition: from syntax to the web. In Joint Workshop on Natural Language Processing in Biomedicine and Its Applications at Coling 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Freitag and A. McCallum. 1999. Information extraction with HMMs and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Freitag. 1998. Machine learning for information extraction in informal domains. Ph.D. thesis, Carnegie Mellon University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Geman and D. Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transitions on Pattern Analysis and Machine Intelligence, 6:721--741.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Kim, Y. S. Han, and K. Choi. 1995. Collocation map for overcoming data sparseness. In Proceedings of the 7th EACL, pages 53--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science, 220:671--680.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. J. Van Laarhoven and E. H. L. Arts. 1987. Simulated Annealing: Theory and Applications. Reidel Publishers. Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th ICML, pages 282--289. Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. R. Leek. 1997. Information extraction using hidden Markov models. Master's thesis, U.C. San Diego.Google ScholarGoogle Scholar
  18. R. Malouf. 2002. Markov models for language-independent named entity recognition. In Proceedings of the 6th CoNLL, pages 187--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Mikheev, M. Moens, and C. Grover. 1999. Named entity recognition without gazetteers. In Proceedings of the 9th EACL, pages 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. R. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--286.Google ScholarGoogle ScholarCross RefCross Ref
  21. C. Sutton and A. McCallum. 2004. Collective segmentation and labeling of distant entities in information extraction. In ICML Workshop on Statistical Relational Learning and Its connections to Other Fields.Google ScholarGoogle Scholar
  22. B. Taskar, P. Abbeel, and D. Koller. 2002. Discriminative probabilistic models for relational data. In Proceedings of the 18th Conference on Uncertianty in Artificial Intelligence (UAI-02), pages 485--494, Edmonton, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Incorporating non-local information into information extraction systems by Gibbs sampling

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
        June 2005
        657 pages
        • General Chair:
        • Kevin Knight

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 25 June 2005

        Qualifiers

        • Article

        Acceptance Rates

        ACL '05 Paper Acceptance Rate77of423submissions,18%Overall Acceptance Rate85of443submissions,19%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader