Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction

Yu, Xiaofeng; Lam, Wai

doi:10.1007/s10115-011-0455-8

Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction

Regular Paper
Published: 10 November 2011

Volume 32, pages 415–444, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xiaofeng Yu¹ &
Wai Lam¹

204 Accesses
6 Citations
3 Altmetric
Explore all metrics

Abstract

Traditional information extraction systems for compound tasks adopt pipeline architectures, which are highly ineffective and suffer from several problems such as cascading accumulation of errors. In this paper, we propose a joint discriminative probabilistic framework to optimize all relevant subtasks simultaneously. This framework offers a great flexibility to incorporate the advantage of both uncertainty for sequence modeling and first-order logic for domain knowledge. The first-order logic model provides a more expressive formalism tackling the issue of limited expressiveness of traditional attribute-value representation. Our framework defines a joint probability distribution for both segmentations in sequence data and possible worlds of relations between segments in the form of an exponential family. Since exact parameter estimation and inference are prohibitively intractable in this model, a structured variational inference algorithm is developed to perform parameter estimation approximately. For inference, we propose a highly coupled, bi-directional Metropolis-Hastings (MH) algorithm to find the maximum a posteriori (MAP) assignments for both segmentations and relations. Extensive experiments on two real-world information extraction tasks, entity identification and relation extraction from Wikipedia, and citation matching show that (1) the proposed model achieves significant improvement on both tasks compared to state-of-the-art pipeline models and other joint models; (2) the bi-directional MH inference algorithm obtains boosted performance compared to the greedy, N-best list, and uni-directional MH sampling algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts

Article Open access 18 October 2022

Structured information extraction from scientific text with large language models

Article Open access 15 February 2024

A Novel, Cognitively Inspired, Unified Graph-based Multi-Task Framework for Information Extraction

Article 06 July 2023

References

Cowie J, Lehnert W (1996) Information extraction. Commun ACM 39: 80–91
Article Google Scholar
Yu X (2007) Chinese named entity recognition with cascaded hybrid model. In: Proceedings of HLT/NAACL-07, Rochester, New York, pp 197–200
Yu X, Lam W, Chen B (2009) An integrated discriminative probabilistic approach to information extraction. In: Proceedings of CIKM-09, Hong Kong, China, pp 325–334
Song M, Rudniy A (2010) Detecting duplicate biological entities using Markov random field-based edit distance. Knowl Inf Syst 25(2): 371–387
Article Google Scholar
Culotta A, McCallum A, Betz J (2006) Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of HLT/NAACL-06, New York, pp 296–303
Yu X, Lam W (2008) An integrated probabilistic and logic approach to encyclopedia relation extraction with multiple features. In: Proceedings of COLING-08, Manchester, United Kingdom, pp 1065–1072
Zhu J, Nie Z, Liu X, Zhang B, Wen J-R (2009) Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of WWW-09, Madrid, Spain, pp 101–110
Patnaik D, Laxman S, Ramakrishnan N (2010) Discovering excitatory relationships using dynamic Bayesian networks. Knowl Inf Syst 29(2): 273–303
Article Google Scholar
Reichartz F, Korte H, Paass G (2010) Semantic relation extraction with kernels over typed dependency trees. In: Proceedings of KDD-10, New York, pp 773–782
Wellner B, McCallum A, Peng F, Hay M (2004) An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proceedings of UAI-04, Banff, Canada, pp 593–601
Poon H, Domingos P (2007) Joint inference in information extraction. In: Proceedings of AAAI-07, Vancouver, British Columbia, Canada, pp 913–918
Finkel JR, Manning CD, Ng AY (2006) Solving the problem of cascading errors: approximate Bayesian inference for linguistic annotation pipelines. In: Proceedings of EMNLP-06, Sydney, Australia, pp 618–626
Hollingshead K, Roark B (2007) Pipeline iteration. In: Proceedings of ACL-07, Prague, Czech Republic, pp 952–959
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML-01, pp 282–289
Bunescu R, Mooney RJ (2004) Collective information extraction with relational Markov networks. In: Proceedings of ACL-04, Barcelona, Spain
Zhu J, Zhang B, Nie Z, Wen J-R, Hon H-W (2007) Webpage understanding: an integrated approach. In: Proceedings of KDD-07, San Jose, California, USA, pp 903–912
Zhu J, Nie Z, Zhang B, Wen J-R (2007) Dynamic hierarchical Markov random fields and their application to Web data extraction. In: Proceedings of ICML-07, Corvalis, Oregon, pp 1175–1182
Yang C, Cao Y, Nie Z, Zhou J, Wen J-R (2010) Closing the loop in webpage understanding. IEEE Trans Knowl Data Eng (forthcoming)
Luo P, Lin F, Xiong Y, Zhao Y, Shi Z (2009) Towards combining Web classification and Web information extraction: a case study. In: Proceedings of KDD-09, Paris, France, pp 1235–1244
Pascot D, Bouslama F, Mellouli S (2011) Architecturing large integrated complex information systems: an application to healthcare. Knowl Inf Syst 27(1): 115–140
Article Google Scholar
Sutton C, McCallum A (2006) An introduction to conditional random fields for relational learning. In: Getoor L, Taskar B Introduction to statistical relational learning. MIT Press, Cambridge, MA, USA
Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1–2): 107–136
Article Google Scholar
Sarawagi S, Cohen WW (2004) Semi-Markov conditional random fields for information extraction. In: Proceedings of NIPS-04
Saul LK, Jordan MI (1996) Exploiting tractable substructures in intractable networks. In: Proceedings of NIPS-96, Cambridge, MA, pp 486–492
Wiegerinck W (2000) Variational approximations between mean field theory and the junction tree algorithm. In: Proceedings of UAI-2000, San Francisco, CA, pp 626–633
Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1: 1–305
Article MATH Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21(6): 1087–1092
Article Google Scholar
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1): 97–109
Article MATH Google Scholar
Carbonetto P, Kisyński J, Freitas OD, Poole D (2005) Nonparametric Bayesian logic. In: Proceedings of UAI-05, pp 85–93
Culotta A, Wick M, McCallum A (2007) First-order probabilistic models for coreference resolution. In: Proceedings of HLT/NAACL-07, pp 81–88
Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: Proceedings of UAI-02, pp 485–492
Zhu J, Nie Z, Wen J-R, Zhang B, Ma W-Y (2006) Simultaneous record detection and attribute labeling in Web data extraction. In: Proceedings of KDD-06, Philadelphia, Pennsylvania, USA, pp 494–503
Liao L, Fox D, Kautz H (2007) Extracting places and activities from GPS traces using hierarchical conditional random fields. Int J Robotics Res 26: 119–134
Article Google Scholar
Hu B (2010) WiKi’mantics: interpreting ontologies with Wikipedia. Knowl Inf Syst 25(3): 445–472
Article Google Scholar
Genesereth MR, Nislsson NJ (1987) Logical foundations of artificial intelligence. Morgan Kaufmann Publishers Inc., San Mateo
MATH Google Scholar
Zhu J, Lao N, Xing EP (2010) Grafting-light: fast, incremental feature selection and structure learning of Markov random fields. In: Proceedings of KDD-10, New York, pp 303–312
Kschischang FR, Frey BJ, Loeliger H-A (2001) Factor graphs and the sum-product algorithm. IEEE Trans Inf Theory 47(2): 498–519
Article MathSciNet MATH Google Scholar
Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J Royal Stat Soc 36: 192–236
MathSciNet MATH Google Scholar
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical methods. Mach Learn 37: 183–233
Article MATH Google Scholar
Jaakkola TS (2000) Tutorial on variational approximation methods. In: Advanced mean field methods: theory and practice. MIT Press, Cambrige, pp 129–159
Xing EP, Jordan MI, Russell S (2003) A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of UAI-03, pp 583–591
Xing EP, Jordan MI, Russell S (2004) Graph partition strategies for generalized mean field inference. In: Proceedings of UAI-04, pp 602–610
Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45: 503–528
Article MathSciNet MATH Google Scholar
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220: 671–680
Article MathSciNet MATH Google Scholar
Nguyen DPT, Matsuo Y, Ishizuka M (2007) Relation extraction from Wikipedia using subtree mining. In: Proceedings of AAAI-07, Vancouver, British Columbia, Canada, pp 1414–1420
Singla P, Domingos P (2006) Entity resolution with Markov logic. In: Proceedings of ICDM-06, pp 572–582

Download references

Author information

Authors and Affiliations

Information Systems Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Xiaofeng Yu & Wai Lam

Authors

Xiaofeng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wai Lam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofeng Yu.

Additional information

The work described in this paper is substantially supported by grants from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Codes: CUHK4128/07 and CUHK413510) and the Direct Grant of the Faculty of Engineering, CUHK (Project Codes: 2050442 and 2050476). This work is also affiliated with the Microsoft-CUHK Joint Laboratory for Human-centric Computing and Interface Technologies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, X., Lam, W. Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction. Knowl Inf Syst 32, 415–444 (2012). https://doi.org/10.1007/s10115-011-0455-8

Download citation

Received: 12 October 2010
Revised: 20 July 2011
Accepted: 22 October 2011
Published: 10 November 2011
Issue Date: August 2012
DOI: https://doi.org/10.1007/s10115-011-0455-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction

Abstract

Access this article

Similar content being viewed by others

Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts

Structured information extraction from scientific text with large language models

A Novel, Cognitively Inspired, Unified Graph-based Multi-Task Framework for Information Extraction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction

Abstract

Access this article

Similar content being viewed by others

Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts

Structured information extraction from scientific text with large language models

A Novel, Cognitively Inspired, Unified Graph-based Multi-Task Framework for Information Extraction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation