ABSTRACT
A variety of web sites and web based services produce textual lists at varying time granularities ranked according to several criteria. For example, Google Trends produces lists of popular query keywords which can be visualized according to several criteria. At Flickr, lists of popular tags used to tag the images uploaded can be visualized as a cloud based on their popularity. Identification of the k most popular terms can be easily conducted by utilizing well known rank aggregation algorithms.
In this paper we take a different approach to information discovery from such ranked lists. We maintain the same rank aggregation framework but we elevate terms at a higher level by making use of popular term hierarchies commonly available. Under such a transformation we show that typical early stopping certificates available for rank aggregation algorithms are no longer applicable. Based on this observation, in this paper, we present a probabilistic framework for early stopping in this setting. We introduce a relaxed version of the rank aggregation problem involving a deterministic stopping condition with user specified precision. We introduce an algorithm pH -- RA for the solution of this problem. In addition we introduce techniques to improve the performance of pH -- RAeven further via precomputation utilizing a sparse set system. Through a detailed experimental evaluation using synthetic and real datasets we demonstrate the efficiency of our framework.
- B. Arai, G. Das, D. Gunopulos, and N. Koudas. Anytime measures for top-k algorithms. In VLDB, pages 914--925. ACM, 2007. Google ScholarDigital Library
- Aslam, J. A., and Montague, Mark. Models for metasearch. In SIGIR, 2001. Google ScholarDigital Library
- N. Bansal, F. Chiang, N. Koudas, and F. W. Tompa. Seeking Stable Clusters in the Blogosphere. In VLDB, 2007. Google ScholarDigital Library
- N. Bansal and N. Koudas. BlogScope: A System for Online Analysis of High Volume Text Streams. In VLDB, 2007. Google ScholarDigital Library
- H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. IO-top-k: Index-access optimized top-k query processing. In VLDB, 2006. Google ScholarDigital Library
- J. O. Berger. Statistical Decision Theory and Bayesian Analysis, 2nd Edition. Springer, 1985.Google ScholarCross Ref
- BlogScope. http://www.blogscope.net/about/.Google Scholar
- K. Chakrabarti, V. Ganti, J. Han, and D. Xin. Ranking objects based on relationships. In SIGMOD, 2006. Google ScholarDigital Library
- W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. JAIR, 10:243--270, 1999. Google ScholarDigital Library
- G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In VLDB, 2006. Google ScholarDigital Library
- M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and A. Tomkins. Visualizing tags over time. In WWW, pages 193--202. ACM, 2006. Google ScholarDigital Library
- C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW, pages 613--622, 2001. Google ScholarDigital Library
- R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. SIJDM: SIAM Journal on Discrete Mathematics, 17, 2003. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. JCSS: Journal of Computer and System Sciences, 66, 2003. Google ScholarDigital Library
- W. Feller. An Introduction to Probability Theory and its Applications, volume 1. John Wiley & Sons, 2nd edition, 1957.Google Scholar
- M. F. Fontoura, R. Lempel, R. Qi, , and J. Zien. Inverted index support for numeric search. In Internet Mathematics, 2006.Google ScholarCross Ref
- S. Guha, N. Koudas, and D. Srivastava. Fast algorithms for hierarchical range histogram construction. In ACM PODS, 2002. Google ScholarDigital Library
- U. Güntzer, W.-T. Balke, and W. Kießling. Optimizing multi-feature queries for image databases. In VLDB, pages 419--428. Morgan Kaufmann, 2000. Google ScholarDigital Library
- A. Marian, N. Bruno, and L. Gravano. Evaluating top- k queries over web-accessible databases. ACM Trans. Database Syst, 29(2):319--362, 2004. Google ScholarDigital Library
- S. Nepal and M. V. Ramakrishna. Query processing issues in image (multimedia) databases. In ICDE, 1999.Google ScholarCross Ref
- G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523, 1988. Google ScholarDigital Library
- M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In VLDB. Morgan Kaufmann Publishers, 2004. Google ScholarDigital Library
- R. R. Yager and V. Kreinovich. On how to merge sorted lists coming from different web search tools. Soft Comput, 3(2):83--88, 1999.Google ScholarCross Ref
Index Terms
- Ad-hoc aggregations of ranked lists in the presence of hierarchies
Recommendations
Channeling Science Information Seekers' Attention? A Content Analysis of Top-Ranked vs. Lower-Ranked Sites in Google
Researchers have suggested that search engines shape portrayals of information by making large and popular websites more prominent while discriminating against smaller sites. Despite the possible skew of information sources, little empirical work has ...
Efficient Top-k Keyword Search on XML Streams
ICYCS '08: Proceedings of the 2008 The 9th International Conference for Young Computer ScientistsKeywords can be used to query XML data without schema information. In this paper, a novel kind of query is proposed, top-k keyword search over XML streams. According to the set of keywords and the number of results, such query can retrieve the top-k XML ...
Multi-keyword ranked searchable public-key encryption
Most related works on Public-Key Encryption with Keyword Search scheme also called Searchable Public-key Encryption focus on single keyword search. Some works supporting multi-keyword search cannot provide search result ranking. In this paper, we study ...
Comments