skip to main content
research-article

Measure-driven keyword-query expansion

Published:01 August 2009Publication History
Skip Abstract Section

Abstract

User generated content has been fueling an explosion in the amount of available textual data. In this context, it is also common for users to express, either explicitly (through numerical ratings) or implicitly, their views and opinions on products, events, etc. This wealth of textual information necessitates the development of novel searching and data exploration paradigms.

In this paper we propose a new searching model, similar in spirit to faceted search, that enables the progressive refinement of a keyword-query result. However, in contrast to faceted search which utilizes domain-specific and hard-to-extract document attributes, the refinement process is driven by suggesting interesting expansions of the original query with additional search terms. Our query-driven and domain-neutral approach employs surprising word co-occurrence patterns and (optionally) numerical user ratings in order to identify meaningful top-k query expansions and allow one to focus on a particularly interesting subset of the original result set.

The proposed functionality is supported by a framework that is computationally efficient and nimble in terms of storage requirements. Our solution is grounded on Convex Optimization principles that allow us to exploit the pruning opportunities offered by the natural top-k formulation of our problem. The performance benefits offered by our solution are verified using both synthetic data and large real data sets comprised of blog posts.

References

  1. N. Bansal, F. Chiang, N. Koudas, and F. W. Tompa. Seeking stable clusters in the blogosphere. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Bar-Yossef and M. Gurevich. Mining search engine query logs via suggestion sampling. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Ben-Yitzhak, N. Golbandi, N. Har'El, R. Lempel, A. Neumann, S. Ofek-Koifman, D. Sheinwald, E. J. Shekita, B. Sznajder, and S. Yogev. Beyond basic faceted search. In WSDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. G. Bland, D. Goldfarb, and M. J. Todd. The ellipsoid method: A survey. Operations Research, 29(6), 1981.Google ScholarGoogle Scholar
  5. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 1st edition, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In SIGMOD Conference, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chaudhuri and U. Dayal. An overview of data warehousing and olap technology. SIGMOD Record, 26(1), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Dakka, P. G. Ipeirotis, and K. R. Wood. Automatic construction of multifaceted browsing interfaces. In CIKM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. DuMouchel and D. Pregibon. Empirical bayes screening for multi-item associations. In KDD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Faloutsos, H. V. Jagadish, and N. Sidiropoulos. Recovering information from summary data. In VLDB, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Palpanas, N. Koudas, and A. O. Mendelzon. Using datacube aggregates for approximate querying and deviation detection. IEEE Trans. Knowl. Data Eng., 17(11), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. CoRR, cs.CL/0205070, 2002.Google ScholarGoogle Scholar
  16. D. Pavlov, H. Mannila, and P. Smyth. Probabilistic models for query approximation with large sparse binary data sets. In UAI, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. See homepage for details. Atlas homepage. http://math-atlas.sourceforge.net/.Google ScholarGoogle Scholar
  18. A. Silberschatz and A. Tuzhilin. On subjective measures of interestingness in knowledge discovery. In KDD, 1995.Google ScholarGoogle Scholar
  19. A. Simitsis, A. Baid, Y. Sismanis, and B. Reinwald. Multidimensional content exploration. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Strang. Linear Algebra and its Applications. Brooks Cole, 4th edition, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Measure-driven keyword-query expansion

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader