research-article

Ad-hoc aggregations of ranked lists in the presence of hierarchies

Authors:
Nilesh Bansal

University of Toronto, Toronto, ON, Canada

University of Toronto, Toronto, ON, Canada
View Profile

,
Sudipto Guha

University of Pennsylvania, Philadelphia, PA, USA

University of Pennsylvania, Philadelphia, PA, USA
View Profile

,
Nick Koudas

University of Toronto, Toronto, ON, Canada

University of Toronto, Toronto, ON, Canada
View Profile

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataJune 2008Pages 67–78https://doi.org/10.1145/1376616.1376626

Published:09 June 2008Publication History

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Pages 67–78

ABSTRACT

A variety of web sites and web based services produce textual lists at varying time granularities ranked according to several criteria. For example, Google Trends produces lists of popular query keywords which can be visualized according to several criteria. At Flickr, lists of popular tags used to tag the images uploaded can be visualized as a cloud based on their popularity. Identification of the k most popular terms can be easily conducted by utilizing well known rank aggregation algorithms.

In this paper we take a different approach to information discovery from such ranked lists. We maintain the same rank aggregation framework but we elevate terms at a higher level by making use of popular term hierarchies commonly available. Under such a transformation we show that typical early stopping certificates available for rank aggregation algorithms are no longer applicable. Based on this observation, in this paper, we present a probabilistic framework for early stopping in this setting. We introduce a relaxed version of the rank aggregation problem involving a deterministic stopping condition with user specified precision. We introduce an algorithm pH -- RA for the solution of this problem. In addition we introduce techniques to improve the performance of pH -- RAeven further via precomputation utilizing a sparse set system. Through a detailed experimental evaluation using synthetic and real datasets we demonstrate the efficiency of our framework.

References

B. Arai, G. Das, D. Gunopulos, and N. Koudas. Anytime measures for top-k algorithms. In VLDB, pages 914--925. ACM, 2007. Google ScholarDigital Library
Aslam, J. A., and Montague, Mark. Models for metasearch. In SIGIR, 2001. Google ScholarDigital Library
N. Bansal, F. Chiang, N. Koudas, and F. W. Tompa. Seeking Stable Clusters in the Blogosphere. In VLDB, 2007. Google ScholarDigital Library
N. Bansal and N. Koudas. BlogScope: A System for Online Analysis of High Volume Text Streams. In VLDB, 2007. Google ScholarDigital Library
H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. IO-top-k: Index-access optimized top-k query processing. In VLDB, 2006. Google ScholarDigital Library
J. O. Berger. Statistical Decision Theory and Bayesian Analysis, 2nd Edition. Springer, 1985.Google ScholarCross Ref
BlogScope. http://www.blogscope.net/about/.Google Scholar
K. Chakrabarti, V. Ganti, J. Han, and D. Xin. Ranking objects based on relationships. In SIGMOD, 2006. Google ScholarDigital Library
W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. JAIR, 10:243--270, 1999. Google ScholarDigital Library
G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In VLDB, 2006. Google ScholarDigital Library
M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and A. Tomkins. Visualizing tags over time. In WWW, pages 193--202. ACM, 2006. Google ScholarDigital Library
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW, pages 613--622, 2001. Google ScholarDigital Library
R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. SIJDM: SIAM Journal on Discrete Mathematics, 17, 2003. Google ScholarDigital Library
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. JCSS: Journal of Computer and System Sciences, 66, 2003. Google ScholarDigital Library
W. Feller. An Introduction to Probability Theory and its Applications, volume 1. John Wiley & Sons, 2nd edition, 1957.Google Scholar
M. F. Fontoura, R. Lempel, R. Qi, , and J. Zien. Inverted index support for numeric search. In Internet Mathematics, 2006.Google ScholarCross Ref
S. Guha, N. Koudas, and D. Srivastava. Fast algorithms for hierarchical range histogram construction. In ACM PODS, 2002. Google ScholarDigital Library
U. Güntzer, W.-T. Balke, and W. Kießling. Optimizing multi-feature queries for image databases. In VLDB, pages 419--428. Morgan Kaufmann, 2000. Google ScholarDigital Library
A. Marian, N. Bruno, and L. Gravano. Evaluating top- k queries over web-accessible databases. ACM Trans. Database Syst, 29(2):319--362, 2004. Google ScholarDigital Library
S. Nepal and M. V. Ramakrishna. Query processing issues in image (multimedia) databases. In ICDE, 1999.Google ScholarCross Ref
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523, 1988. Google ScholarDigital Library
M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In VLDB. Morgan Kaufmann Publishers, 2004. Google ScholarDigital Library
R. R. Yager and V. Kreinovich. On how to merge sorted lists coming from different web search tools. Soft Comput, 3(2):83--88, 1999.Google ScholarCross Ref

Index Terms

Ad-hoc aggregations of ranked lists in the presence of hierarchies
1. Information systems
  1. Information retrieval

Recommendations

Channeling Science Information Seekers' Attention? A Content Analysis of Top-Ranked vs. Lower-Ranked Sites in Google

Researchers have suggested that search engines shape portrayals of information by making large and popular websites more prominent while discriminating against smaller sites. Despite the possible skew of information sources, little empirical work has ...
Read More
Efficient Top-k Keyword Search on XML Streams
ICYCS '08: Proceedings of the 2008 The 9th International Conference for Young Computer Scientists

Keywords can be used to query XML data without schema information. In this paper, a novel kind of query is proposed, top-k keyword search over XML streams. According to the set of keywords and the number of results, such query can retrieve the top-k XML ...
Read More
Multi-keyword ranked searchable public-key encryption

Most related works on Public-Key Encryption with Keyword Search scheme also called Searchable Public-key Encryption focus on single keyword search. Some works supporting multi-keyword search cannot provide search result ranking. In this paper, we study ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
June 2008
1396 pages
ISBN:9781605581026
DOI:10.1145/1376616
General Chairs:
Laks V. S. Lakshmanan
University of British Columbia, Canada
,
Raymond T. Ng
University of British Columbia, Canada
,
Dennis Shasha
New York University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
list aggregation
top-k
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 649
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ad-hoc aggregations of ranked lists in the presence of hierarchies

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Channeling Science Information Seekers' Attention? A Content Analysis of Top-Ranked vs. Lower-Ranked Sites in Google

Efficient Top-k Keyword Search on XML Streams

Multi-keyword ranked searchable public-key encryption