Skip to main content
Log in

Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Traditional information extraction systems for compound tasks adopt pipeline architectures, which are highly ineffective and suffer from several problems such as cascading accumulation of errors. In this paper, we propose a joint discriminative probabilistic framework to optimize all relevant subtasks simultaneously. This framework offers a great flexibility to incorporate the advantage of both uncertainty for sequence modeling and first-order logic for domain knowledge. The first-order logic model provides a more expressive formalism tackling the issue of limited expressiveness of traditional attribute-value representation. Our framework defines a joint probability distribution for both segmentations in sequence data and possible worlds of relations between segments in the form of an exponential family. Since exact parameter estimation and inference are prohibitively intractable in this model, a structured variational inference algorithm is developed to perform parameter estimation approximately. For inference, we propose a highly coupled, bi-directional Metropolis-Hastings (MH) algorithm to find the maximum a posteriori (MAP) assignments for both segmentations and relations. Extensive experiments on two real-world information extraction tasks, entity identification and relation extraction from Wikipedia, and citation matching show that (1) the proposed model achieves significant improvement on both tasks compared to state-of-the-art pipeline models and other joint models; (2) the bi-directional MH inference algorithm obtains boosted performance compared to the greedy, N-best list, and uni-directional MH sampling algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cowie J, Lehnert W (1996) Information extraction. Commun ACM 39: 80–91

    Article  Google Scholar 

  2. Yu X (2007) Chinese named entity recognition with cascaded hybrid model. In: Proceedings of HLT/NAACL-07, Rochester, New York, pp 197–200

  3. Yu X, Lam W, Chen B (2009) An integrated discriminative probabilistic approach to information extraction. In: Proceedings of CIKM-09, Hong Kong, China, pp 325–334

  4. Song M, Rudniy A (2010) Detecting duplicate biological entities using Markov random field-based edit distance. Knowl Inf Syst 25(2): 371–387

    Article  Google Scholar 

  5. Culotta A, McCallum A, Betz J (2006) Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of HLT/NAACL-06, New York, pp 296–303

  6. Yu X, Lam W (2008) An integrated probabilistic and logic approach to encyclopedia relation extraction with multiple features. In: Proceedings of COLING-08, Manchester, United Kingdom, pp 1065–1072

  7. Zhu J, Nie Z, Liu X, Zhang B, Wen J-R (2009) Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of WWW-09, Madrid, Spain, pp 101–110

  8. Patnaik D, Laxman S, Ramakrishnan N (2010) Discovering excitatory relationships using dynamic Bayesian networks. Knowl Inf Syst 29(2): 273–303

    Article  Google Scholar 

  9. Reichartz F, Korte H, Paass G (2010) Semantic relation extraction with kernels over typed dependency trees. In: Proceedings of KDD-10, New York, pp 773–782

  10. Wellner B, McCallum A, Peng F, Hay M (2004) An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proceedings of UAI-04, Banff, Canada, pp 593–601

  11. Poon H, Domingos P (2007) Joint inference in information extraction. In: Proceedings of AAAI-07, Vancouver, British Columbia, Canada, pp 913–918

  12. Finkel JR, Manning CD, Ng AY (2006) Solving the problem of cascading errors: approximate Bayesian inference for linguistic annotation pipelines. In: Proceedings of EMNLP-06, Sydney, Australia, pp 618–626

  13. Hollingshead K, Roark B (2007) Pipeline iteration. In: Proceedings of ACL-07, Prague, Czech Republic, pp 952–959

  14. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML-01, pp 282–289

  15. Bunescu R, Mooney RJ (2004) Collective information extraction with relational Markov networks. In: Proceedings of ACL-04, Barcelona, Spain

  16. Zhu J, Zhang B, Nie Z, Wen J-R, Hon H-W (2007) Webpage understanding: an integrated approach. In: Proceedings of KDD-07, San Jose, California, USA, pp 903–912

  17. Zhu J, Nie Z, Zhang B, Wen J-R (2007) Dynamic hierarchical Markov random fields and their application to Web data extraction. In: Proceedings of ICML-07, Corvalis, Oregon, pp 1175–1182

  18. Yang C, Cao Y, Nie Z, Zhou J, Wen J-R (2010) Closing the loop in webpage understanding. IEEE Trans Knowl Data Eng (forthcoming)

  19. Luo P, Lin F, Xiong Y, Zhao Y, Shi Z (2009) Towards combining Web classification and Web information extraction: a case study. In: Proceedings of KDD-09, Paris, France, pp 1235–1244

  20. Pascot D, Bouslama F, Mellouli S (2011) Architecturing large integrated complex information systems: an application to healthcare. Knowl Inf Syst 27(1): 115–140

    Article  Google Scholar 

  21. Sutton C, McCallum A (2006) An introduction to conditional random fields for relational learning. In: Getoor L, Taskar B Introduction to statistical relational learning. MIT Press, Cambridge, MA, USA

  22. Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1–2): 107–136

    Article  Google Scholar 

  23. Sarawagi S, Cohen WW (2004) Semi-Markov conditional random fields for information extraction. In: Proceedings of NIPS-04

  24. Saul LK, Jordan MI (1996) Exploiting tractable substructures in intractable networks. In: Proceedings of NIPS-96, Cambridge, MA, pp 486–492

  25. Wiegerinck W (2000) Variational approximations between mean field theory and the junction tree algorithm. In: Proceedings of UAI-2000, San Francisco, CA, pp 626–633

  26. Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1: 1–305

    Article  MATH  Google Scholar 

  27. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21(6): 1087–1092

    Article  Google Scholar 

  28. Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1): 97–109

    Article  MATH  Google Scholar 

  29. Carbonetto P, Kisyński J, Freitas OD, Poole D (2005) Nonparametric Bayesian logic. In: Proceedings of UAI-05, pp 85–93

  30. Culotta A, Wick M, McCallum A (2007) First-order probabilistic models for coreference resolution. In: Proceedings of HLT/NAACL-07, pp 81–88

  31. Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: Proceedings of UAI-02, pp 485–492

  32. Zhu J, Nie Z, Wen J-R, Zhang B, Ma W-Y (2006) Simultaneous record detection and attribute labeling in Web data extraction. In: Proceedings of KDD-06, Philadelphia, Pennsylvania, USA, pp 494–503

  33. Liao L, Fox D, Kautz H (2007) Extracting places and activities from GPS traces using hierarchical conditional random fields. Int J Robotics Res 26: 119–134

    Article  Google Scholar 

  34. Hu B (2010) WiKi’mantics: interpreting ontologies with Wikipedia. Knowl Inf Syst 25(3): 445–472

    Article  Google Scholar 

  35. Genesereth MR, Nislsson NJ (1987) Logical foundations of artificial intelligence. Morgan Kaufmann Publishers Inc., San Mateo

    MATH  Google Scholar 

  36. Zhu J, Lao N, Xing EP (2010) Grafting-light: fast, incremental feature selection and structure learning of Markov random fields. In: Proceedings of KDD-10, New York, pp 303–312

  37. Kschischang FR, Frey BJ, Loeliger H-A (2001) Factor graphs and the sum-product algorithm. IEEE Trans Inf Theory 47(2): 498–519

    Article  MathSciNet  MATH  Google Scholar 

  38. Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J Royal Stat Soc 36: 192–236

    MathSciNet  MATH  Google Scholar 

  39. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical methods. Mach Learn 37: 183–233

    Article  MATH  Google Scholar 

  40. Jaakkola TS (2000) Tutorial on variational approximation methods. In: Advanced mean field methods: theory and practice. MIT Press, Cambrige, pp 129–159

  41. Xing EP, Jordan MI, Russell S (2003) A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of UAI-03, pp 583–591

  42. Xing EP, Jordan MI, Russell S (2004) Graph partition strategies for generalized mean field inference. In: Proceedings of UAI-04, pp 602–610

  43. Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45: 503–528

    Article  MathSciNet  MATH  Google Scholar 

  44. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220: 671–680

    Article  MathSciNet  MATH  Google Scholar 

  45. Nguyen DPT, Matsuo Y, Ishizuka M (2007) Relation extraction from Wikipedia using subtree mining. In: Proceedings of AAAI-07, Vancouver, British Columbia, Canada, pp 1414–1420

  46. Singla P, Domingos P (2006) Entity resolution with Markov logic. In: Proceedings of ICDM-06, pp 572–582

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng Yu.

Additional information

The work described in this paper is substantially supported by grants from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Codes: CUHK4128/07 and CUHK413510) and the Direct Grant of the Faculty of Engineering, CUHK (Project Codes: 2050442 and 2050476). This work is also affiliated with the Microsoft-CUHK Joint Laboratory for Human-centric Computing and Interface Technologies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, X., Lam, W. Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction. Knowl Inf Syst 32, 415–444 (2012). https://doi.org/10.1007/s10115-011-0455-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0455-8

Keywords

Navigation