Article

Free Access

Integrating probabilistic extraction models and data mining to discover relations and patterns in text

Authors:
Aron Culotta

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
Andrew McCallum

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
Jonathan Betz

Google, Inc., New York, NY

Google, Inc., New York, NY
View Profile

HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational LinguisticsJune 2006Pages 296–303https://doi.org/10.3115/1220835.1220873

Published:04 June 2006Publication History

HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

Pages 296–303

ABSTRACT

In order for relation extraction systems to obtain human-level performance, they must be able to incorporate relational patterns inherent in the data (for example, that one's sister is likely one's mother's daughter, or that children are likely to attend the same college as their parents). Hand-coding such knowledge can be time-consuming and inadequate. Additionally, there may exist many interesting, unknown relational patterns that both improve extraction performance and provide insight into text. We describe a probabilistic extraction model that provides mutual benefits to both "top-down" relational pattern discovery and "bottom-up" relation extraction.

References

Eugene Agichtein and Luis Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the Fifth ACM International Conference on Digital Libraries. Google ScholarDigital Library
Sergey Brin. 1998. Extracting patterns and relations from the world wide web. In WebDB Workshop at 6th International Conference on Extending Database Technology. Google ScholarDigital Library
Razvan Bunescu and Raymond Mooney. 2006. Subsequence kernels for relation extraction. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA.Google Scholar
Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew K. McCallum, Tom M. Mitchell, Kamal Nigam, and Seán Slattery. 1998. Learning to extract symbolic knowledge from the World Wide Web. In Proceedings of AAAI-98, 15th Conference of the American Association for Artificial Intelligence, pages 509--516, Madison, US. AAAI Press, Menlo Park, US. Google ScholarDigital Library
Aron Culotta and Andrew McCallum. 2004. Confidence estimation for information extraction. In Human Langauge Technology Conference (HLT 2004), Boston, MA. Google ScholarDigital Library
Aron Culotta and Jeffrey Sorensen. 2004. Dependency tree kernels for relation extraction. In ACL. Google ScholarDigital Library
L. Dehaspe. 1997. Maximum entropy modeling with clausal constraints. In Proceedings of the Seventh International Workshop on Inductive Logic Programming, pages 109--125, Prague, Czech Republic. Google ScholarDigital Library
M. Hearst. 1999. Untangling text data mining. In 37th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
Nanda Kambhatla. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In ACL. Google ScholarDigital Library
John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. 18th International Conf. on Machine Learning, pages 282--289. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
Gideon Mann and David Yarowsky. 2005. Multi-field information extraction and cross-document fusion. In ACL. Google ScholarDigital Library
D. Masterson and N. Kushmerik. 2003. Information extraction from multi-document threads. In ECML-2003: Workshop on Adaptive Text Extraction and Mining, pages 34--41.Google Scholar
Andrew McCallum and David Jensen. 2003. A note on the unification of information extraction and data mining using conditional-probability, relational models. In IJCAI03 Workshop on Learning Statistical Models from Relational Data.Google Scholar
Andrew McCallum. 2002. Mallet: A machine learning for language toolkit, http://mallet.cs.umass.edu.Google Scholar
Andrew McCallum. 2003. Efficiently inducing features of conditional random fields. In Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03). Google ScholarDigital Library
Scott Miller, Heidi Fox, Lance A. Ramshaw, and Ralph Weischedel. 2000. A novel use of statistical parsing to extract information from text. In ANLP. Google ScholarDigital Library
Raymond J. Mooney and Razvan Bunescu. 2005. Mining knowledge from text using information extraction. SigKDD Explorations on Text Mining and Natural Language Processing. Google ScholarDigital Library
Un Yong Nahm and Raymond J. Mooney. 2000. A mutually beneficial integration of data mining and information extraction. In AAAI/IAAI.Google Scholar
Bradley L. Richards and Raymond J. Mooney. 1992. Learning relations by pathfinding. In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), pages 50--55, San Jose, CA.Google ScholarDigital Library
Dan Roth and Wen tau Yih. 2002. Probabilistic reasoning for entity and relation recognition. In COLING. Google ScholarDigital Library
Sunita Sarawagi and William W. Cohen. 2004. Semi-markov conditional random fields for information extraction. In NIPS 04.Google Scholar
Charles Sutton and Andrew McCallum. 2004. Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML). Google ScholarDigital Library
Charles Sutton and Andrew McCallum. 2006. An introduction to conditional random fields for relational learning. In Lise Getoor and Ben Taskar, editors, Introduction to Statistical Relational Learning. MIT Press. To appear.Google Scholar
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2003. Kernel methods for relation extraction. Journal of Machine Learning Research, 3:1083--1106. Google ScholarDigital Library

Integrating probabilistic extraction models and data mining to discover relations and patterns in text
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Efficient algorithms for mining constrained frequent patterns from uncertain data
U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data

Mining of frequent patterns is one of the popular knowledge discovery and data mining (KDD) tasks. It also plays an essential role in the mining of many other patterns such as correlation, sequences, and association rules. Hence, it has been the subject ...
Read More
Effective algorithms for vertical mining probabilistic frequent patterns in uncertain mobile environments

Data uncertainty is inherent in mobile applications. The traditional methods of mining frequent patterns are confronted with enormous challenges in uncertain mobile environments. The present achievements have shown that vertical mining algorithms are ...
Read More
Data Mining Patterns: New Methods and Applications
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
June 2006
522 pages
General Chair:
Robert C. Moore
Microsoft Research
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 4 June 2006
Qualifiers
- Article
Conference

Acceptance Rates
HLT-NAACL '06 Paper Acceptance Rate62of257submissions,24%Overall Acceptance Rate240of768submissions,31%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 1,161
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Integrating probabilistic extraction models and data mining to discover relations and patterns in text

HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Efficient algorithms for mining constrained frequent patterns from uncertain data

Effective algorithms for vertical mining probabilistic frequent patterns in uncertain mobile environments

Data Mining Patterns: New Methods and Applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Integrating probabilistic extraction models and data mining to discover relations and patterns in text

HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Efficient algorithms for mining constrained frequent patterns from uncertain data

Effective algorithms for vertical mining probabilistic frequent patterns in uncertain mobile environments

Data Mining Patterns: New Methods and Applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media