skip to main content
10.1145/1376616.1376626acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Ad-hoc aggregations of ranked lists in the presence of hierarchies

Published:09 June 2008Publication History

ABSTRACT

A variety of web sites and web based services produce textual lists at varying time granularities ranked according to several criteria. For example, Google Trends produces lists of popular query keywords which can be visualized according to several criteria. At Flickr, lists of popular tags used to tag the images uploaded can be visualized as a cloud based on their popularity. Identification of the k most popular terms can be easily conducted by utilizing well known rank aggregation algorithms.

In this paper we take a different approach to information discovery from such ranked lists. We maintain the same rank aggregation framework but we elevate terms at a higher level by making use of popular term hierarchies commonly available. Under such a transformation we show that typical early stopping certificates available for rank aggregation algorithms are no longer applicable. Based on this observation, in this paper, we present a probabilistic framework for early stopping in this setting. We introduce a relaxed version of the rank aggregation problem involving a deterministic stopping condition with user specified precision. We introduce an algorithm pH -- RA for the solution of this problem. In addition we introduce techniques to improve the performance of pH -- RAeven further via precomputation utilizing a sparse set system. Through a detailed experimental evaluation using synthetic and real datasets we demonstrate the efficiency of our framework.

References

  1. B. Arai, G. Das, D. Gunopulos, and N. Koudas. Anytime measures for top-k algorithms. In VLDB, pages 914--925. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aslam, J. A., and Montague, Mark. Models for metasearch. In SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Bansal, F. Chiang, N. Koudas, and F. W. Tompa. Seeking Stable Clusters in the Blogosphere. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Bansal and N. Koudas. BlogScope: A System for Online Analysis of High Volume Text Streams. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. IO-top-k: Index-access optimized top-k query processing. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. O. Berger. Statistical Decision Theory and Bayesian Analysis, 2nd Edition. Springer, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  7. BlogScope. http://www.blogscope.net/about/.Google ScholarGoogle Scholar
  8. K. Chakrabarti, V. Ganti, J. Han, and D. Xin. Ranking objects based on relationships. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. JAIR, 10:243--270, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and A. Tomkins. Visualizing tags over time. In WWW, pages 193--202. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW, pages 613--622, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. SIJDM: SIAM Journal on Discrete Mathematics, 17, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. JCSS: Journal of Computer and System Sciences, 66, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Feller. An Introduction to Probability Theory and its Applications, volume 1. John Wiley & Sons, 2nd edition, 1957.Google ScholarGoogle Scholar
  16. M. F. Fontoura, R. Lempel, R. Qi, , and J. Zien. Inverted index support for numeric search. In Internet Mathematics, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  17. S. Guha, N. Koudas, and D. Srivastava. Fast algorithms for hierarchical range histogram construction. In ACM PODS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. U. Güntzer, W.-T. Balke, and W. Kießling. Optimizing multi-feature queries for image databases. In VLDB, pages 419--428. Morgan Kaufmann, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Marian, N. Bruno, and L. Gravano. Evaluating top- k queries over web-accessible databases. ACM Trans. Database Syst, 29(2):319--362, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Nepal and M. V. Ramakrishna. Query processing issues in image (multimedia) databases. In ICDE, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  21. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In VLDB. Morgan Kaufmann Publishers, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. R. Yager and V. Kreinovich. On how to merge sorted lists coming from different web search tools. Soft Comput, 3(2):83--88, 1999.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Ad-hoc aggregations of ranked lists in the presence of hierarchies

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
      June 2008
      1396 pages
      ISBN:9781605581026
      DOI:10.1145/1376616

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader