ABSTRACT
Queries asked on web search engines often target structured data, such as commercial products, movie showtimes, or airline schedules. However, surfacing relevant results from such data is a highly challenging problem, due to the unstructured language of the web queries, and the imposing scalability and speed requirements of web search. In this paper, we discover latent structured semantics in web queries and produce Structured Annotations for them. We consider an annotation as a mapping of a query to a table of structured data and attributes of this table. Given a collection of structured tables, we present a fast and scalable tagging mechanism for obtaining all possible annotations of a query over these tables. However, we observe that for a given query only few are sensible for the user needs. We thus propose a principled probabilistic scoring mechanism, using a generative model, for assessing the likelihood of a structured annotation, and we define a dynamic threshold for filtering out misinterpreted query annotations. Our techniques are completely unsupervised, obviating the need for costly manual labeling effort. We evaluated our techniques using real world queries and data and present promising experimental results.
- J. L. Bentley and R. Sedgewick. Fast Algorithms for Sorting and Searching Strings. In SODA, 1997. Google ScholarDigital Library
- M. Bergman. The Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing, 7(1), 2001.Google ScholarCross Ref
- C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 1st edition, 2006. Google ScholarDigital Library
- M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. WebTables: Exploring the Power of Tables on the Web. PVLDB, 1(1):538--549, 2008. Google ScholarDigital Library
- P. Calado, A. S. da Silva, A. H. F. Laender, B. A. Ribeiro-Neto, and R. C. Vieira. A Bayesian Network Approach to Searching Web Databases through Keyword-based Queries. Inf. Process. Man., 40(5), 2004. Google ScholarDigital Library
- S. Chaudhuri, V. Ganti, and D. Xin. Exploiting Web Search to Generate Synonyms for Entities. In WWW, 2009. Google ScholarDigital Library
- Y. Chen, W. Wang, Z. Liu, and X. Lin. Keyword Search on Structured and Semi-structured Data. In SIGMOD, 2009. Google ScholarDigital Library
- T. Cheng, H. Lauw, and S. Paparizos. Fuzzy Matching of Web Queries to Structured Data. In ICDE, 2010.Google ScholarCross Ref
- F. de Sá Mesquita, A. S. da Silva, E. S. de Moura, P. Calado, and A. H. F. Laender. LABRADOR: Efficiently Publishing Relational Databases on the Web by Using Keyword-based Query Interfaces. Inf. Process. Manage., 43(4), 2007. Google ScholarDigital Library
- L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked Keyword Search over XML Documents. In SIGMOD, 2003. Google ScholarDigital Library
- H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: Ranked Keyword Searches on Graphs. In SIGMOD, 2007. Google ScholarDigital Library
- V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-Style Keyword Search over Relational Databases. In VLDB, 2003. Google ScholarDigital Library
- Y. E. Ioannidis. The History of Histograms. In VLDB, 2003. Google ScholarDigital Library
- V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional Expansion For Keyword Search on Graph Databases. In VLDB, 2005. Google ScholarDigital Library
- E. Kandogan, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar Semantic Search: A Database Approach to Information Retrieval. In SIGMOD06. Google ScholarDigital Library
- J. Kim, X. Xue, and W. B. Croft. A Probabilistic Retrieval Model for Semistructured Data. In ECIR, 2009. Google ScholarDigital Library
- X. Li, Y.-Y. Wang, and A. Acero. Extracting Structured Information from User Queries with Semi-supervised Conditional Random Fields. In SIGIR, 2009. Google ScholarDigital Library
- F. Liu, C. T. Yu, W. Meng, and A. Chowdhury. Effective Keyword Search in Relational Databases. In SIGMOD, 2006. Google ScholarDigital Library
- Z. Liu and Y. Chen. Reasoning and Identifying Relevant Matches for XML Keyword Search. PVLDB, 1(1), 2008. Google ScholarDigital Library
- V. Markl, P. J. Haas, M. Kutsch, N. Megiddo, U. Srivastava, and T. M. Tran. Consistent selectivity estimation via maximum entropy. VLDB J., 16(1), 2007. Google ScholarDigital Library
- G. A. Miller. WordNet: A Lexical Database for English. Commun. ACM, 38(11):39--41, 1995. Google ScholarDigital Library
- S. Paparizos, A. Ntoulas, J. C. Shafer, and R. Agrawal. Answering Web Queries Using Structured Data Sources. In SIGMOD, 2009. Google ScholarDigital Library
- K. Q. Pu and X. Yu. Keyword Query Cleaning. PVLDB, 1(1):909--920, 2008. Google ScholarDigital Library
Index Terms
- Structured annotations of web queries
Recommendations
Structured data on the web
NGITS'09: Proceedings of the 7th international conference on Next generation information technologies and systemsThough search on the World-Wide Web has focused mostly on unstructured text, there is an increasing amount of structured data on the Web and growing interest in harnessing such data. I will describe several current projects at Google whose overall goal ...
Automatically generating structured queries in XML keyword search
INEX'10: Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrievalIn this paper, we present a novel method for automatically deriving structured XML queries from keyword-based queries and show how it was applied to the experimental tasks proposed for the INEX 2010 data-centric track. In our method, called StruX, users ...
Interrogation Based on Semantic Annotations: Context-Based Construction of Formal Queries from Keywords
Traditional information search approaches do not explicitly capture the meaning of a keyword query, but provide a good way for the user to express his or her information needs based on the keywords. In principle, semantic search aims to produce better ...
Comments