research-article

Structured annotations of web queries

Authors:
Nikos Sarkas

University of Toronto, Toronto, ON, Canada

University of Toronto, Toronto, ON, Canada
View Profile

,
Stelios Paparizos

Microsoft Research, Mountain View, CA, USA

Microsoft Research, Mountain View, CA, USA
View Profile

,
Panayiotis Tsaparas

Microsoft Research, Mountain View, CA, USA

Microsoft Research, Mountain View, CA, USA
View Profile

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataJune 2010Pages 771–782https://doi.org/10.1145/1807167.1807251

Published:06 June 2010Publication History

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Pages 771–782

ABSTRACT

Queries asked on web search engines often target structured data, such as commercial products, movie showtimes, or airline schedules. However, surfacing relevant results from such data is a highly challenging problem, due to the unstructured language of the web queries, and the imposing scalability and speed requirements of web search. In this paper, we discover latent structured semantics in web queries and produce Structured Annotations for them. We consider an annotation as a mapping of a query to a table of structured data and attributes of this table. Given a collection of structured tables, we present a fast and scalable tagging mechanism for obtaining all possible annotations of a query over these tables. However, we observe that for a given query only few are sensible for the user needs. We thus propose a principled probabilistic scoring mechanism, using a generative model, for assessing the likelihood of a structured annotation, and we define a dynamic threshold for filtering out misinterpreted query annotations. Our techniques are completely unsupervised, obviating the need for costly manual labeling effort. We evaluated our techniques using real world queries and data and present promising experimental results.

References

J. L. Bentley and R. Sedgewick. Fast Algorithms for Sorting and Searching Strings. In SODA, 1997. Google ScholarDigital Library
M. Bergman. The Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing, 7(1), 2001.Google ScholarCross Ref
C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 1st edition, 2006. Google ScholarDigital Library
M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. WebTables: Exploring the Power of Tables on the Web. PVLDB, 1(1):538--549, 2008. Google ScholarDigital Library
P. Calado, A. S. da Silva, A. H. F. Laender, B. A. Ribeiro-Neto, and R. C. Vieira. A Bayesian Network Approach to Searching Web Databases through Keyword-based Queries. Inf. Process. Man., 40(5), 2004. Google ScholarDigital Library
S. Chaudhuri, V. Ganti, and D. Xin. Exploiting Web Search to Generate Synonyms for Entities. In WWW, 2009. Google ScholarDigital Library
Y. Chen, W. Wang, Z. Liu, and X. Lin. Keyword Search on Structured and Semi-structured Data. In SIGMOD, 2009. Google ScholarDigital Library
T. Cheng, H. Lauw, and S. Paparizos. Fuzzy Matching of Web Queries to Structured Data. In ICDE, 2010.Google ScholarCross Ref
F. de Sá Mesquita, A. S. da Silva, E. S. de Moura, P. Calado, and A. H. F. Laender. LABRADOR: Efficiently Publishing Relational Databases on the Web by Using Keyword-based Query Interfaces. Inf. Process. Manage., 43(4), 2007. Google ScholarDigital Library
L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked Keyword Search over XML Documents. In SIGMOD, 2003. Google ScholarDigital Library
H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: Ranked Keyword Searches on Graphs. In SIGMOD, 2007. Google ScholarDigital Library
V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-Style Keyword Search over Relational Databases. In VLDB, 2003. Google ScholarDigital Library
Y. E. Ioannidis. The History of Histograms. In VLDB, 2003. Google ScholarDigital Library
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional Expansion For Keyword Search on Graph Databases. In VLDB, 2005. Google ScholarDigital Library
E. Kandogan, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar Semantic Search: A Database Approach to Information Retrieval. In SIGMOD06. Google ScholarDigital Library
J. Kim, X. Xue, and W. B. Croft. A Probabilistic Retrieval Model for Semistructured Data. In ECIR, 2009. Google ScholarDigital Library
X. Li, Y.-Y. Wang, and A. Acero. Extracting Structured Information from User Queries with Semi-supervised Conditional Random Fields. In SIGIR, 2009. Google ScholarDigital Library
F. Liu, C. T. Yu, W. Meng, and A. Chowdhury. Effective Keyword Search in Relational Databases. In SIGMOD, 2006. Google ScholarDigital Library
Z. Liu and Y. Chen. Reasoning and Identifying Relevant Matches for XML Keyword Search. PVLDB, 1(1), 2008. Google ScholarDigital Library
V. Markl, P. J. Haas, M. Kutsch, N. Megiddo, U. Srivastava, and T. M. Tran. Consistent selectivity estimation via maximum entropy. VLDB J., 16(1), 2007. Google ScholarDigital Library
G. A. Miller. WordNet: A Lexical Database for English. Commun. ACM, 38(11):39--41, 1995. Google ScholarDigital Library
S. Paparizos, A. Ntoulas, J. C. Shafer, and R. Agrawal. Answering Web Queries Using Structured Data Sources. In SIGMOD, 2009. Google ScholarDigital Library
K. Q. Pu and X. Yu. Keyword Query Cleaning. PVLDB, 1(1):909--920, 2008. Google ScholarDigital Library

Index Terms

Structured annotations of web queries
1. Information systems
  1. Information retrieval

Recommendations

Structured data on the web
NGITS'09: Proceedings of the 7th international conference on Next generation information technologies and systems

Though search on the World-Wide Web has focused mostly on unstructured text, there is an increasing amount of structured data on the Web and growing interest in harnessing such data. I will describe several current projects at Google whose overall goal ...
Read More
Automatically generating structured queries in XML keyword search
INEX'10: Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval

In this paper, we present a novel method for automatically deriving structured XML queries from keyword-based queries and show how it was applied to the experimental tasks proposed for the INEX 2010 data-centric track. In our method, called StruX, users ...
Read More
Interrogation Based on Semantic Annotations: Context-Based Construction of Formal Queries from Keywords

Traditional information search approaches do not explicitly capture the meaning of a keyword query, but provide a good way for the user to express his or her information needs based on the keywords. In principle, semantic search aims to produce better ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
June 2010
1286 pages
ISBN:9781450300322
DOI:10.1145/1807167
General Chair:
Ahmed Elmagarmid
Purdue University, USA
,
Program Chair:
Divyakant Agrawal
University of California at Santa Barbara, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
keyword search
structured data
web
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 46
  Total Citations
  View Citations
- 1,256
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Structured annotations of web queries

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Structured data on the web

Automatically generating structured queries in XML keyword search

Interrogation Based on Semantic Annotations: Context-Based Construction of Formal Queries from Keywords