skip to main content
research-article

Randomized multi-pass streaming skyline algorithms

Published:01 August 2009Publication History
Skip Abstract Section

Abstract

We consider external algorithms for skyline computation without pre-processing. Our goal is to develop an algorithm with a good worst case guarantee while performing well on average. Due to the nature of disks, it is desirable that such algorithms access the input as a stream (even if in multiple passes). Using the tools of randomness, proved to be useful in many applications, we present an efficient multi-pass streaming algorithm, RAND, for skyline computation. As far as we are aware, RAND is the first randomized skyline algorithm in the literature.

RAND is near-optimal for the streaming model, which we prove via a simple lower bound. Additionally, our algorithm is distributable and can handle partially ordered domains on each attribute. Finally, we demonstrate the robustness of RAND via extensive experiments on both real and synthetic datasets. RAND is comparable to the existing algorithms in average case and additionally tolerant to simple modifications of the data, while other algorithms degrade considerably with such variation.

References

  1. L. Arge and J. S. Vitter. Optimal dynamic interval management in external memory. In FOCS, pages 560--569, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W.-T. Balke, U. Güntzer, and J. X. Zheng. Efficient distributed skylining for web information systems. In EDBT, pages 256--273, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  3. O. Barndorff-Nielsen and M. Sobel. On the distribution of the number of admissible points in a vector random sample. Theory of Probability and its Applications, 11(2):249--269, 1966.Google ScholarGoogle ScholarCross RefCross Ref
  4. S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C.-Y. Chan, P.-K. Eng, and K.-L. Tan. Efficient processing of skyline queries with partially-ordered domains. In ICDE'05, pages 190--191, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Y. Chan, P.-K. Eng, and K.-L. Tan. Stratified computation of skylines with partially-ordered domains. In SIGMOD Conference, pages 203--214, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chaudhuri, N. Dalvi, and R. Kaushik. Robust cardinality and cost estimation for skyline operator. In ICDE'06, page 64, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline with presorting: Theory and optimizations. In Intelligent Information Systems, pages 595--604, 2005. Also appeared in ICDE'03.Google ScholarGoogle Scholar
  9. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS, pages 285--298, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Godfrey, R. Shipley, and J. Gryz. Algorithms and analyses for maximal vector computation. VLDB J., 16(1):5--28, 2007. Also appeared in VLDB'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In SIGMOD '01, pages 58--66, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Jiang and J. Pei. Online interval skyline queries on time series. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: an online algorithm for skyline queries. In VLDB'02, pages 275--286. VLDB Endowment, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. T. Kung, F. Luccio, and F. P. Preparata. On finding the maxima of a set of vectors. J. ACM, 22(4):469--476, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Lin, Y. Yuan, W. Wang, and H. Lu. Stabbing the sky: Efficient skyline computation over sliding windows. In ICDE, pages 502--513, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Matousek. Computing dominances in en. Inf. Process. Lett., 38(5):277--278, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. D. Morse, J. M. Patel, and H. V. Jagadish. Efficient skyline computation over low-cardinality domains. In VLDB, pages 267--278, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in database systems. ACM Trans. Database Syst., 30(1):41--82, 2005. Also appeard in SIGMOD'03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Park, T. Kim, J. Park, J. Kim, and H. Im. Parallel skyline computation on multicore architectures. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Pei, B. Jiang, X. Lin, and Y. Yuan. Probabilistic skylines on uncertain data. In VLDB, pages 15--26, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. M. Ruhl. Efficient algorithms for new computational models. PhD thesis, 2003. Supervisor-David R. Karger. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Sacharidis, S. Papadopoulos, and D. Papadias. Topologically sorted skylines for partially ordered domains. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Schweikardt. Machine models and lower bounds for query processing. In PODS, pages 41--52, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Schweikardt. Lower bounds for multi-pass processing of multiple data streams. In STACS, pages 51--61, 2009.Google ScholarGoogle Scholar
  25. K.-L. Tan, P.-K. Eng, and B. C. Ooi. Efficient progressive skyline computation. In VLDB '01, pages 301--310, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Tao, L. Ding, X. Lin, and J. Pei. Distance-based representative skyline. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Tao and D. Papadias. Maintaining sliding window skylines on data streams. IEEE Trans. Knowl. Data Eng., 18(2):377--391, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Torlone, P. Ciaccia, and U. Romatre. Which are my preferred items. In In Workshop on Recommendation and Personalization in E-Commerce, pages 1--9, 2002.Google ScholarGoogle Scholar
  29. J. S. Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37--57, 1985. Also appeared in FOCS'83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. S. Vitter. Algorithms and data structures for external memory. Foundations and Trends in Theoretical Computer Science, 2(4):305--474, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Vlachou, C. Doulkeridis, Y. Kotidis, and M. Vazirgiannis. Skypeer: Efficient subspace skyline computation over distributed data. In ICDE, pages 416--425, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  32. R. C.-W. Wong, A. W.-C. Fu, J. Pei, Y. S. Ho, T. Wong, and Y. Liu. Efficient skyline querying with variable user preferences on nominal attributes. PVLDB, 1(1):1032--1043, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. C.-W. Wong, J. Pei, A. W.-C. Fu, and K. Wang. Online skyline analysis with dynamic preferences on nominal attributes. IEEE Trans. Knowl. Data Eng., 21(1):35--49, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. C.-C. Yao. Probabilistic computations: Toward a unified measure of complexity (extended abstract). In FOCS, pages 222--227, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Zhang, X. Lin, Y. Zhang, W. Wang, and J. X. Yu. Probabilistic skyline operator over sliding windows. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Zhu, Y. Tao, and S. Zhou. Distributed skyline retrieval with low bandwidth consumption. IEEE Transactions on Knowledge and Data Engineering, 21(3):384--400, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Randomized multi-pass streaming skyline algorithms

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader