skip to main content
research-article

Biperpedia: an ontology for search applications

Published:01 March 2014Publication History
Skip Abstract Section

Abstract

Search engines make significant efforts to recognize queries that can be answered by structured data and invest heavily in creating and maintaining high-precision databases. While these databases have a relatively wide coverage of entities, the number of attributes they model (e.g., GDP, CAPITAL, ANTHEM) is relatively small. Extending the number of attributes known to the search engine can enable it to more precisely answer queries from the long and heavy tail, extract a broader range of facts from the Web, and recover the semantics of tables on the Web.

We describe Biperpedia, an ontology with 1.6M (class, attribute) pairs and 67K distinct attribute names. Biperpedia extracts attributes from the query stream, and then uses the best extractions to seed attribute extraction from text. For every attribute Biperpedia saves a set of synonyms and text patterns in which it appears, thereby enabling it to recognize the attribute in more contexts. In addition to a detailed analysis of the quality of Biperpedia, we show that it can increase the number of Web tables whose semantics we can recover by more than a factor of 4 compared with Freebase.

References

  1. M. D. Adelfio and H. Samet. Schema extraction for tabular data on the web. PVLDB, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, pages 722--735, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD Conference, pages 1247--1250, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web. PVLDB, 1(1):538--549, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. Flumejava: easy, efficient data-parallel pipelines. In PLDI, pages 363--375, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In WWW, pages 325--332, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. Open information extraction: The second generation. In IJCAI, pages 3--10, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, pages 1535--1545, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Haghighi and D. Klein. Simple coreference resolution with rich syntactic and semantic features. In EMNLP, pages 1152--1161, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, pages 539--545, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Lee, J.-K. Min, and C.-W. Chung. An effective semantic search technique using ontology. In WWW, pages 1057--1058, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Lee, Z. Wang, H. Wang, and S.-W. Hwang. Attribute extraction and scoring: A probabilistic approach. In ICDE, pages 194--205, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and searching web tables using entities, types and relationships. PVLDB, 3(1):1338--1347, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. Open language learning for information extraction. In EMNLP-CoNLL, pages 523--534, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In ACL, pages 1003--1011, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Nakashole, G. Weikum, and F. M. Suchanek. Patty: A taxonomy of relational patterns with semantic types. In EMNLP-CoNLL, pages 1135--1145, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Pasca and B. V. Durme. What you seek is what you get: Extraction of class attributes from query logs. In IJCAI, pages 2832--2837, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Pasca, B. V. Durme, and N. Garera. The role of documents vs. queries in extracting class attributes from text. In CIKM, pages 485--494, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Tran, P. Cimiano, S. Rudolph, and R. Studer. Ontology-based interpretation of keywords for semantic search. In ISWC/ASWC, pages 523--536, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Venetis, A. Y. Halevy, J. Madhavan, M. Pasca, W. Shen, F. Wu, G. Miao, and C. Wu. Recovering semantics of tables on the web. PVLDB, 4(9):528--538, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Wang, H. Wang, Z. Wang, and K. Q. Zhu. Understanding tables on the web. In ER, pages 141--155, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Yakout, K. Ganjam, K. Chakrabarti, and S. Chaudhuri. Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In SIGMOD Conference, pages 97--108, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Yao, S. Riedel, and A. McCallum. Collective cross-document relation extraction without labelled data. In EMNLP, pages 1013--1023, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Biperpedia: an ontology for search applications
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 7, Issue 7
      March 2014
      108 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 March 2014
      Published in pvldb Volume 7, Issue 7

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader