research-article

Measure-driven keyword-query expansion

Authors:
Nikos Sarkas

University of Toronto

University of Toronto
View Profile

,
Nilesh Bansal

University of Toronto

University of Toronto
View Profile

,
Gautam Das

University of Texas at Arlington

University of Texas at Arlington
View Profile

,
Nick Koudas

University of Toronto

University of Toronto
View Profile

Proceedings of the VLDB Endowment Volume 2 Issue 1pp 121–132https://doi.org/10.14778/1687627.1687642

Published:01 August 2009Publication History

Proceedings of the VLDB Endowment

Abstract

User generated content has been fueling an explosion in the amount of available textual data. In this context, it is also common for users to express, either explicitly (through numerical ratings) or implicitly, their views and opinions on products, events, etc. This wealth of textual information necessitates the development of novel searching and data exploration paradigms.

In this paper we propose a new searching model, similar in spirit to faceted search, that enables the progressive refinement of a keyword-query result. However, in contrast to faceted search which utilizes domain-specific and hard-to-extract document attributes, the refinement process is driven by suggesting interesting expansions of the original query with additional search terms. Our query-driven and domain-neutral approach employs surprising word co-occurrence patterns and (optionally) numerical user ratings in order to identify meaningful top-k query expansions and allow one to focus on a particularly interesting subset of the original result set.

The proposed functionality is supported by a framework that is computationally efficient and nimble in terms of storage requirements. Our solution is grounded on Convex Optimization principles that allow us to exploit the pruning opportunities offered by the natural top-k formulation of our problem. The performance benefits offered by our solution are verified using both synthetic data and large real data sets comprised of blog posts.

References

N. Bansal, F. Chiang, N. Koudas, and F. W. Tompa. Seeking stable clusters in the blogosphere. In VLDB, 2007. Google ScholarDigital Library
Z. Bar-Yossef and M. Gurevich. Mining search engine query logs via suggestion sampling. PVLDB, 1(1), 2008. Google ScholarDigital Library
O. Ben-Yitzhak, N. Golbandi, N. Har'El, R. Lempel, A. Neumann, S. Ofek-Koifman, D. Sheinwald, E. J. Shekita, B. Sznajder, and S. Yogev. Beyond basic faceted search. In WSDM, 2008. Google ScholarDigital Library
R. G. Bland, D. Goldfarb, and M. J. Todd. The ellipsoid method: A survey. Operations Research, 29(6), 1981.Google Scholar
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 1st edition, 2004. Google ScholarDigital Library
S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In SIGMOD Conference, 1997. Google ScholarDigital Library
S. Chaudhuri and U. Dayal. An overview of data warehousing and olap technology. SIGMOD Record, 26(1), 1997. Google ScholarDigital Library
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006. Google ScholarDigital Library
W. Dakka, P. G. Ipeirotis, and K. R. Wood. Automatic construction of multifaceted browsing interfaces. In CIKM, 2005. Google ScholarDigital Library
W. DuMouchel and D. Pregibon. Empirical bayes screening for multi-item associations. In KDD, 2001. Google ScholarDigital Library
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001. Google ScholarDigital Library
C. Faloutsos, H. V. Jagadish, and N. Sidiropoulos. Recovering information from summary data. In VLDB, 1997. Google ScholarDigital Library
V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, 2005. Google ScholarDigital Library
T. Palpanas, N. Koudas, and A. O. Mendelzon. Using datacube aggregates for approximate querying and deviation detection. IEEE Trans. Knowl. Data Eng., 17(11), 2005. Google ScholarDigital Library
B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. CoRR, cs.CL/0205070, 2002.Google Scholar
D. Pavlov, H. Mannila, and P. Smyth. Probabilistic models for query approximation with large sparse binary data sets. In UAI, 2000. Google ScholarDigital Library
See homepage for details. Atlas homepage. http://math-atlas.sourceforge.net/.Google Scholar
A. Silberschatz and A. Tuzhilin. On subjective measures of interestingness in knowledge discovery. In KDD, 1995.Google Scholar
A. Simitsis, A. Baid, Y. Sismanis, and B. Reinwald. Multidimensional content exploration. PVLDB, 1(1), 2008. Google ScholarDigital Library
G. Strang. Linear Algebra and its Applications. Brooks Cole, 4th edition, 2005.Google Scholar

Index Terms

Measure-driven keyword-query expansion
1. Information systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Query Expansion by Mining User Logs

Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, ...
Read More
Concept based query expansion
SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval

Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper we present a probabilistic query expansion model based on a similarity thesaurus which was constructed automatically. A similarity ...
Read More
Semisupervised Query Expansion with Minimal Feedback

Query expansion is an information retrieval technique in which new query terms are selected to improve search performance. Although useful terms can be extracted from documents whose relevance is already known, it is difficult to get enough of such ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 2, Issue 1
August 2009
1293 pages
ISSN:2150-8097
Editors:
Serge Abiteboul,
Tova Milo,
Jignesh Patel,
Philippe Rigaux
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2009
Published in pvldb Volume 2, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 204
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Measure-driven keyword-query expansion

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Query Expansion by Mining User Logs

Concept based query expansion

Semisupervised Query Expansion with Minimal Feedback

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Measure-driven keyword-query expansion

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Query Expansion by Mining User Logs

Concept based query expansion

Semisupervised Query Expansion with Minimal Feedback

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media