Abstract
Traditional information extraction systems for compound tasks adopt pipeline architectures, which are highly ineffective and suffer from several problems such as cascading accumulation of errors. In this paper, we propose a joint discriminative probabilistic framework to optimize all relevant subtasks simultaneously. This framework offers a great flexibility to incorporate the advantage of both uncertainty for sequence modeling and first-order logic for domain knowledge. The first-order logic model provides a more expressive formalism tackling the issue of limited expressiveness of traditional attribute-value representation. Our framework defines a joint probability distribution for both segmentations in sequence data and possible worlds of relations between segments in the form of an exponential family. Since exact parameter estimation and inference are prohibitively intractable in this model, a structured variational inference algorithm is developed to perform parameter estimation approximately. For inference, we propose a highly coupled, bi-directional Metropolis-Hastings (MH) algorithm to find the maximum a posteriori (MAP) assignments for both segmentations and relations. Extensive experiments on two real-world information extraction tasks, entity identification and relation extraction from Wikipedia, and citation matching show that (1) the proposed model achieves significant improvement on both tasks compared to state-of-the-art pipeline models and other joint models; (2) the bi-directional MH inference algorithm obtains boosted performance compared to the greedy, N-best list, and uni-directional MH sampling algorithms.
Similar content being viewed by others
References
Cowie J, Lehnert W (1996) Information extraction. Commun ACM 39: 80–91
Yu X (2007) Chinese named entity recognition with cascaded hybrid model. In: Proceedings of HLT/NAACL-07, Rochester, New York, pp 197–200
Yu X, Lam W, Chen B (2009) An integrated discriminative probabilistic approach to information extraction. In: Proceedings of CIKM-09, Hong Kong, China, pp 325–334
Song M, Rudniy A (2010) Detecting duplicate biological entities using Markov random field-based edit distance. Knowl Inf Syst 25(2): 371–387
Culotta A, McCallum A, Betz J (2006) Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of HLT/NAACL-06, New York, pp 296–303
Yu X, Lam W (2008) An integrated probabilistic and logic approach to encyclopedia relation extraction with multiple features. In: Proceedings of COLING-08, Manchester, United Kingdom, pp 1065–1072
Zhu J, Nie Z, Liu X, Zhang B, Wen J-R (2009) Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of WWW-09, Madrid, Spain, pp 101–110
Patnaik D, Laxman S, Ramakrishnan N (2010) Discovering excitatory relationships using dynamic Bayesian networks. Knowl Inf Syst 29(2): 273–303
Reichartz F, Korte H, Paass G (2010) Semantic relation extraction with kernels over typed dependency trees. In: Proceedings of KDD-10, New York, pp 773–782
Wellner B, McCallum A, Peng F, Hay M (2004) An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proceedings of UAI-04, Banff, Canada, pp 593–601
Poon H, Domingos P (2007) Joint inference in information extraction. In: Proceedings of AAAI-07, Vancouver, British Columbia, Canada, pp 913–918
Finkel JR, Manning CD, Ng AY (2006) Solving the problem of cascading errors: approximate Bayesian inference for linguistic annotation pipelines. In: Proceedings of EMNLP-06, Sydney, Australia, pp 618–626
Hollingshead K, Roark B (2007) Pipeline iteration. In: Proceedings of ACL-07, Prague, Czech Republic, pp 952–959
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML-01, pp 282–289
Bunescu R, Mooney RJ (2004) Collective information extraction with relational Markov networks. In: Proceedings of ACL-04, Barcelona, Spain
Zhu J, Zhang B, Nie Z, Wen J-R, Hon H-W (2007) Webpage understanding: an integrated approach. In: Proceedings of KDD-07, San Jose, California, USA, pp 903–912
Zhu J, Nie Z, Zhang B, Wen J-R (2007) Dynamic hierarchical Markov random fields and their application to Web data extraction. In: Proceedings of ICML-07, Corvalis, Oregon, pp 1175–1182
Yang C, Cao Y, Nie Z, Zhou J, Wen J-R (2010) Closing the loop in webpage understanding. IEEE Trans Knowl Data Eng (forthcoming)
Luo P, Lin F, Xiong Y, Zhao Y, Shi Z (2009) Towards combining Web classification and Web information extraction: a case study. In: Proceedings of KDD-09, Paris, France, pp 1235–1244
Pascot D, Bouslama F, Mellouli S (2011) Architecturing large integrated complex information systems: an application to healthcare. Knowl Inf Syst 27(1): 115–140
Sutton C, McCallum A (2006) An introduction to conditional random fields for relational learning. In: Getoor L, Taskar B Introduction to statistical relational learning. MIT Press, Cambridge, MA, USA
Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1–2): 107–136
Sarawagi S, Cohen WW (2004) Semi-Markov conditional random fields for information extraction. In: Proceedings of NIPS-04
Saul LK, Jordan MI (1996) Exploiting tractable substructures in intractable networks. In: Proceedings of NIPS-96, Cambridge, MA, pp 486–492
Wiegerinck W (2000) Variational approximations between mean field theory and the junction tree algorithm. In: Proceedings of UAI-2000, San Francisco, CA, pp 626–633
Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1: 1–305
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21(6): 1087–1092
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1): 97–109
Carbonetto P, Kisyński J, Freitas OD, Poole D (2005) Nonparametric Bayesian logic. In: Proceedings of UAI-05, pp 85–93
Culotta A, Wick M, McCallum A (2007) First-order probabilistic models for coreference resolution. In: Proceedings of HLT/NAACL-07, pp 81–88
Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: Proceedings of UAI-02, pp 485–492
Zhu J, Nie Z, Wen J-R, Zhang B, Ma W-Y (2006) Simultaneous record detection and attribute labeling in Web data extraction. In: Proceedings of KDD-06, Philadelphia, Pennsylvania, USA, pp 494–503
Liao L, Fox D, Kautz H (2007) Extracting places and activities from GPS traces using hierarchical conditional random fields. Int J Robotics Res 26: 119–134
Hu B (2010) WiKi’mantics: interpreting ontologies with Wikipedia. Knowl Inf Syst 25(3): 445–472
Genesereth MR, Nislsson NJ (1987) Logical foundations of artificial intelligence. Morgan Kaufmann Publishers Inc., San Mateo
Zhu J, Lao N, Xing EP (2010) Grafting-light: fast, incremental feature selection and structure learning of Markov random fields. In: Proceedings of KDD-10, New York, pp 303–312
Kschischang FR, Frey BJ, Loeliger H-A (2001) Factor graphs and the sum-product algorithm. IEEE Trans Inf Theory 47(2): 498–519
Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J Royal Stat Soc 36: 192–236
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical methods. Mach Learn 37: 183–233
Jaakkola TS (2000) Tutorial on variational approximation methods. In: Advanced mean field methods: theory and practice. MIT Press, Cambrige, pp 129–159
Xing EP, Jordan MI, Russell S (2003) A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of UAI-03, pp 583–591
Xing EP, Jordan MI, Russell S (2004) Graph partition strategies for generalized mean field inference. In: Proceedings of UAI-04, pp 602–610
Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45: 503–528
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220: 671–680
Nguyen DPT, Matsuo Y, Ishizuka M (2007) Relation extraction from Wikipedia using subtree mining. In: Proceedings of AAAI-07, Vancouver, British Columbia, Canada, pp 1414–1420
Singla P, Domingos P (2006) Entity resolution with Markov logic. In: Proceedings of ICDM-06, pp 572–582
Author information
Authors and Affiliations
Corresponding author
Additional information
The work described in this paper is substantially supported by grants from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Codes: CUHK4128/07 and CUHK413510) and the Direct Grant of the Faculty of Engineering, CUHK (Project Codes: 2050442 and 2050476). This work is also affiliated with the Microsoft-CUHK Joint Laboratory for Human-centric Computing and Interface Technologies.
Rights and permissions
About this article
Cite this article
Yu, X., Lam, W. Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction. Knowl Inf Syst 32, 415–444 (2012). https://doi.org/10.1007/s10115-011-0455-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0455-8