skip to main content
10.1145/1807167.1807251acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Structured annotations of web queries

Published:06 June 2010Publication History

ABSTRACT

Queries asked on web search engines often target structured data, such as commercial products, movie showtimes, or airline schedules. However, surfacing relevant results from such data is a highly challenging problem, due to the unstructured language of the web queries, and the imposing scalability and speed requirements of web search. In this paper, we discover latent structured semantics in web queries and produce Structured Annotations for them. We consider an annotation as a mapping of a query to a table of structured data and attributes of this table. Given a collection of structured tables, we present a fast and scalable tagging mechanism for obtaining all possible annotations of a query over these tables. However, we observe that for a given query only few are sensible for the user needs. We thus propose a principled probabilistic scoring mechanism, using a generative model, for assessing the likelihood of a structured annotation, and we define a dynamic threshold for filtering out misinterpreted query annotations. Our techniques are completely unsupervised, obviating the need for costly manual labeling effort. We evaluated our techniques using real world queries and data and present promising experimental results.

References

  1. J. L. Bentley and R. Sedgewick. Fast Algorithms for Sorting and Searching Strings. In SODA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Bergman. The Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing, 7(1), 2001.Google ScholarGoogle ScholarCross RefCross Ref
  3. C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 1st edition, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. WebTables: Exploring the Power of Tables on the Web. PVLDB, 1(1):538--549, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Calado, A. S. da Silva, A. H. F. Laender, B. A. Ribeiro-Neto, and R. C. Vieira. A Bayesian Network Approach to Searching Web Databases through Keyword-based Queries. Inf. Process. Man., 40(5), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chaudhuri, V. Ganti, and D. Xin. Exploiting Web Search to Generate Synonyms for Entities. In WWW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Chen, W. Wang, Z. Liu, and X. Lin. Keyword Search on Structured and Semi-structured Data. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Cheng, H. Lauw, and S. Paparizos. Fuzzy Matching of Web Queries to Structured Data. In ICDE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  9. F. de Sá Mesquita, A. S. da Silva, E. S. de Moura, P. Calado, and A. H. F. Laender. LABRADOR: Efficiently Publishing Relational Databases on the Web by Using Keyword-based Query Interfaces. Inf. Process. Manage., 43(4), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked Keyword Search over XML Documents. In SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: Ranked Keyword Searches on Graphs. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-Style Keyword Search over Relational Databases. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. E. Ioannidis. The History of Histograms. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional Expansion For Keyword Search on Graph Databases. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Kandogan, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar Semantic Search: A Database Approach to Information Retrieval. In SIGMOD06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Kim, X. Xue, and W. B. Croft. A Probabilistic Retrieval Model for Semistructured Data. In ECIR, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. Li, Y.-Y. Wang, and A. Acero. Extracting Structured Information from User Queries with Semi-supervised Conditional Random Fields. In SIGIR, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Liu, C. T. Yu, W. Meng, and A. Chowdhury. Effective Keyword Search in Relational Databases. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. Liu and Y. Chen. Reasoning and Identifying Relevant Matches for XML Keyword Search. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Markl, P. J. Haas, M. Kutsch, N. Megiddo, U. Srivastava, and T. M. Tran. Consistent selectivity estimation via maximum entropy. VLDB J., 16(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. A. Miller. WordNet: A Lexical Database for English. Commun. ACM, 38(11):39--41, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Paparizos, A. Ntoulas, J. C. Shafer, and R. Agrawal. Answering Web Queries Using Structured Data Sources. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Q. Pu and X. Yu. Keyword Query Cleaning. PVLDB, 1(1):909--920, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Structured annotations of web queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
      June 2010
      1286 pages
      ISBN:9781450300322
      DOI:10.1145/1807167

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 June 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader