Abstract
User generated content has been fueling an explosion in the amount of available textual data. In this context, it is also common for users to express, either explicitly (through numerical ratings) or implicitly, their views and opinions on products, events, etc. This wealth of textual information necessitates the development of novel searching and data exploration paradigms.
In this paper we propose a new searching model, similar in spirit to faceted search, that enables the progressive refinement of a keyword-query result. However, in contrast to faceted search which utilizes domain-specific and hard-to-extract document attributes, the refinement process is driven by suggesting interesting expansions of the original query with additional search terms. Our query-driven and domain-neutral approach employs surprising word co-occurrence patterns and (optionally) numerical user ratings in order to identify meaningful top-k query expansions and allow one to focus on a particularly interesting subset of the original result set.
The proposed functionality is supported by a framework that is computationally efficient and nimble in terms of storage requirements. Our solution is grounded on Convex Optimization principles that allow us to exploit the pruning opportunities offered by the natural top-k formulation of our problem. The performance benefits offered by our solution are verified using both synthetic data and large real data sets comprised of blog posts.
- N. Bansal, F. Chiang, N. Koudas, and F. W. Tompa. Seeking stable clusters in the blogosphere. In VLDB, 2007. Google ScholarDigital Library
- Z. Bar-Yossef and M. Gurevich. Mining search engine query logs via suggestion sampling. PVLDB, 1(1), 2008. Google ScholarDigital Library
- O. Ben-Yitzhak, N. Golbandi, N. Har'El, R. Lempel, A. Neumann, S. Ofek-Koifman, D. Sheinwald, E. J. Shekita, B. Sznajder, and S. Yogev. Beyond basic faceted search. In WSDM, 2008. Google ScholarDigital Library
- R. G. Bland, D. Goldfarb, and M. J. Todd. The ellipsoid method: A survey. Operations Research, 29(6), 1981.Google Scholar
- S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 1st edition, 2004. Google ScholarDigital Library
- S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In SIGMOD Conference, 1997. Google ScholarDigital Library
- S. Chaudhuri and U. Dayal. An overview of data warehousing and olap technology. SIGMOD Record, 26(1), 1997. Google ScholarDigital Library
- T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006. Google ScholarDigital Library
- W. Dakka, P. G. Ipeirotis, and K. R. Wood. Automatic construction of multifaceted browsing interfaces. In CIKM, 2005. Google ScholarDigital Library
- W. DuMouchel and D. Pregibon. Empirical bayes screening for multi-item associations. In KDD, 2001. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001. Google ScholarDigital Library
- C. Faloutsos, H. V. Jagadish, and N. Sidiropoulos. Recovering information from summary data. In VLDB, 1997. Google ScholarDigital Library
- V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, 2005. Google ScholarDigital Library
- T. Palpanas, N. Koudas, and A. O. Mendelzon. Using datacube aggregates for approximate querying and deviation detection. IEEE Trans. Knowl. Data Eng., 17(11), 2005. Google ScholarDigital Library
- B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. CoRR, cs.CL/0205070, 2002.Google Scholar
- D. Pavlov, H. Mannila, and P. Smyth. Probabilistic models for query approximation with large sparse binary data sets. In UAI, 2000. Google ScholarDigital Library
- See homepage for details. Atlas homepage. http://math-atlas.sourceforge.net/.Google Scholar
- A. Silberschatz and A. Tuzhilin. On subjective measures of interestingness in knowledge discovery. In KDD, 1995.Google Scholar
- A. Simitsis, A. Baid, Y. Sismanis, and B. Reinwald. Multidimensional content exploration. PVLDB, 1(1), 2008. Google ScholarDigital Library
- G. Strang. Linear Algebra and its Applications. Brooks Cole, 4th edition, 2005.Google Scholar
Index Terms
- Measure-driven keyword-query expansion
Recommendations
Query Expansion by Mining User Logs
Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, ...
Concept based query expansion
SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrievalQuery expansion methods have been studied for a long time - with debatable success in many instances. In this paper we present a probabilistic query expansion model based on a similarity thesaurus which was constructed automatically. A similarity ...
Semisupervised Query Expansion with Minimal Feedback
Query expansion is an information retrieval technique in which new query terms are selected to improve search performance. Although useful terms can be extracted from documents whose relevance is already known, it is difficult to get enough of such ...
Comments