research-article

Biperpedia: an ontology for search applications

Authors:
Rahul Gupta

Google Research

Google Research
View Profile

,
Alon Halevy

Google Research

Google Research
View Profile

,
Xuezhi Wang

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Steven Euijong Whang

Google Research

Google Research
View Profile

,
Fei Wu

Google Research

Google Research
View Profile

Proceedings of the VLDB Endowment Volume 7 Issue 7pp 505–516https://doi.org/10.14778/2732286.2732288

Published:01 March 2014Publication History

Proceedings of the VLDB Endowment

Abstract

Search engines make significant efforts to recognize queries that can be answered by structured data and invest heavily in creating and maintaining high-precision databases. While these databases have a relatively wide coverage of entities, the number of attributes they model (e.g., GDP, CAPITAL, ANTHEM) is relatively small. Extending the number of attributes known to the search engine can enable it to more precisely answer queries from the long and heavy tail, extract a broader range of facts from the Web, and recover the semantics of tables on the Web.

We describe Biperpedia, an ontology with 1.6M (class, attribute) pairs and 67K distinct attribute names. Biperpedia extracts attributes from the query stream, and then uses the best extractions to seed attribute extraction from text. For every attribute Biperpedia saves a set of synonyms and text patterns in which it appears, thereby enabling it to recognize the attribute in more contexts. In addition to a detailed analysis of the quality of Biperpedia, we show that it can increase the number of Web tables whose semantics we can recover by more than a factor of 4 compared with Freebase.

References

M. D. Adelfio and H. Samet. Schema extraction for tabular data on the web. PVLDB, 2013. Google ScholarDigital Library
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, pages 722--735, 2007. Google ScholarDigital Library
K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD Conference, pages 1247--1250, 2008. Google ScholarDigital Library
M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web. PVLDB, 1(1):538--549, 2008. Google ScholarDigital Library
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.Google ScholarDigital Library
C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. Flumejava: easy, efficient data-parallel pipelines. In PLDI, pages 363--375, 2010. Google ScholarDigital Library
H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In WWW, pages 325--332, 2002. Google ScholarDigital Library
A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012. Google ScholarDigital Library
O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. Open information extraction: The second generation. In IJCAI, pages 3--10, 2011. Google ScholarDigital Library
A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, pages 1535--1545, 2011. Google ScholarDigital Library
C. Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, 1998.Google ScholarCross Ref
J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005. Google ScholarDigital Library
A. Haghighi and D. Klein. Simple coreference resolution with rich syntactic and semantic features. In EMNLP, pages 1152--1161, 2009. Google ScholarDigital Library
M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, pages 539--545, 1992. Google ScholarDigital Library
J. Lee, J.-K. Min, and C.-W. Chung. An effective semantic search technique using ontology. In WWW, pages 1057--1058, 2009. Google ScholarDigital Library
T. Lee, Z. Wang, H. Wang, and S.-W. Hwang. Attribute extraction and scoring: A probabilistic approach. In ICDE, pages 194--205, 2013. Google ScholarDigital Library
G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and searching web tables using entities, types and relationships. PVLDB, 3(1):1338--1347, 2010. Google ScholarDigital Library
Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. Open language learning for information extraction. In EMNLP-CoNLL, pages 523--534, 2012. Google ScholarDigital Library
M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In ACL, pages 1003--1011, 2009. Google ScholarDigital Library
N. Nakashole, G. Weikum, and F. M. Suchanek. Patty: A taxonomy of relational patterns with semantic types. In EMNLP-CoNLL, pages 1135--1145, 2012. Google ScholarDigital Library
M. Pasca and B. V. Durme. What you seek is what you get: Extraction of class attributes from query logs. In IJCAI, pages 2832--2837, 2007. Google ScholarDigital Library
M. Pasca, B. V. Durme, and N. Garera. The role of documents vs. queries in extracting class attributes from text. In CIKM, pages 485--494, 2007. Google ScholarDigital Library
T. Tran, P. Cimiano, S. Rudolph, and R. Studer. Ontology-based interpretation of keywords for semantic search. In ISWC/ASWC, pages 523--536, 2007. Google ScholarDigital Library
P. Venetis, A. Y. Halevy, J. Madhavan, M. Pasca, W. Shen, F. Wu, G. Miao, and C. Wu. Recovering semantics of tables on the web. PVLDB, 4(9):528--538, 2011. Google ScholarDigital Library
J. Wang, H. Wang, Z. Wang, and K. Q. Zhu. Understanding tables on the web. In ER, pages 141--155, 2012. Google ScholarDigital Library
M. Yakout, K. Ganjam, K. Chakrabarti, and S. Chaudhuri. Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In SIGMOD Conference, pages 97--108, 2012. Google ScholarDigital Library
L. Yao, S. Riedel, and A. McCallum. Collective cross-document relation extraction without labelled data. In EMNLP, pages 1013--1023, 2010. Google ScholarDigital Library

Index Terms

Biperpedia: an ontology for search applications
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Index terms have been assigned to the content through auto-classification.

Recommendations

A graph-based approach for ontology population with named entities
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Automatically populating ontology with named entities extracted from the unstructured text has become a key issue for Semantic Web and knowledge management techniques. This issue naturally consists of two subtasks: (1) for the entity mention whose ...
Read More
Comparison of Methods to Annotate Named Entity Corpora

The authors compared two methods for annotating a corpus for the named entity (NE) recognition task using non-expert annotators: (i) revising the results of an existing NE recognizer and (ii) manually annotating the NEs completely. The annotation time, ...
Read More
APOLLO: a general framework for populating ontology with named entities via random walks on graphs
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

Automatically populating ontology with named entities extracted from the unstructured text has become a key issue for Semantic Web. This issue naturally consists of two subtasks: (1) for the entity mention whose mapping entity does not exist in the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 7, Issue 7
March 2014
108 pages
ISSN:2150-8097
Editors:
H. V. Jagadish
University of Michigan
,
Aoying Zhou
East Normal University, China
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 March 2014
Published in pvldb Volume 7, Issue 7
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 299
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Biperpedia: an ontology for search applications

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

A graph-based approach for ontology population with named entities

Comparison of Methods to Annotate Named Entity Corpora

APOLLO: a general framework for populating ontology with named entities via random walks on graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Biperpedia: an ontology for search applications

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

A graph-based approach for ontology population with named entities

Comparison of Methods to Annotate Named Entity Corpora

APOLLO: a general framework for populating ontology with named entities via random walks on graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media