Abstract
Skyline queries compute the set of Pareto-optimal tuples in a relation, that is, those tuples that are not dominated by any other tuple in the same relation. Although several algorithms have been proposed for efficiently evaluating skyline queries, they either necessitate the relation to have been indexed or have to perform the dominance tests on all the tuples in order to determine the result. In this article we introduce salsa, a novel skyline algorithm that exploits the idea of presorting the input data so as to effectively limit the number of tuples to be read and compared. This makes salsa also attractive when skyline queries are executed on top of systems that do not understand skyline semantics, or when the skyline logic runs on clients with limited power and/or bandwidth. We prove that, if one considers symmetric sorting functions, the number of tuples to be read is minimized by sorting data according to a “minimum coordinate,” minC, criterion, and that performance can be further improved if data distribution is known and an asymmetric sorting function is used. Experimental results obtained on synthetic and real datasets show that salsa consistently outperforms state-of-the-art sequential skyline algorithms and that its performance can be accurately predicted.
Supplemental Material
Available for Download
Online appendix to efficient sort-based skyline evaluation. The appendix supports the information on article 31.
- Acharya, S., Poosala, V., and Ramaswamy, S. 1999. Selectivity estimation in spatial databases. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD). Philadelphia, PA, ACM, New York, NY, 13--24. Google ScholarDigital Library
- Balke, W.-T., Güntzer, U., and Zheng, J. X. 2004. Efficient distributed skylining for web information systems. In Proceedings of the 6th International Conference on Extending Database Technology (EDBT). Lecture Notes in Computer Science, vol. 2992, Springer, Berlin, Heidelberg, New York, 256--273.Google Scholar
- Bartolini, I., Ciaccia, P., Oria, V., and Özsu, T. 2004. Integrating the results of multimedia sub-queries using qualitative preferences. In Proceedings of the 10th International Workshop on Multimedia Information Systems (MIS). College Park, MD. Lecture Notes in Computer Science, vol. 2992, Springer, Berlin, Heidelberg, New York, 66--75.Google Scholar
- Bartolini, I., Ciaccia, P., Oria, V., and Özsu, T. 2007. Flexible integration of multimedia sub-queries with qualitative preferences. Multimed. Tools Appl. 33, 3, 275--300. Google ScholarDigital Library
- Bartolini, I., Ciaccia, P., and Patella, M. 2006. SaLSa: computing the skyline without scanning the whole sky. In Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management. Arlington, VA, ACM, New York, NY, 405--414. Google ScholarDigital Library
- Börzsönyi, S., Kossmann, D., and Stocker, K. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering (ICDE). Heidelberg, Germany, IEEE Computer Society, Washington, DC, 421--430. Google ScholarDigital Library
- Buchta, C. 1989. On the average number of maxima in a set of vectors. Inform. Process. Lett. 33, 2, 63--65. Google ScholarDigital Library
- Chaudhuri, S., Dalvi, N., and Kaushik, R. 2006. Robust cardinality and cost estimation for skyline operator. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). Atlanta, GA. IEEE Computer Society, Washington, DC, 64. Google ScholarDigital Library
- Chomicki, J. 2003. Preference Formulas in Relational Queries. ACM Trans. Database Syst. 28, 4, 427--466. Google ScholarDigital Library
- Chomicki, J., Godfrey, P., Gryz, J., and Liang, D. 2002. Skyline with presorting. Tech. Rep. CS-2002-04, York University, Toronto, ON.Google Scholar
- Chomicki, J., Godfrey, P., Gryz, J., and Liang, D. 2003. Skyline with presorting. In Proceedings of the 19th International Conference on Data Engineering (ICDE). Bangalore, India. IEEE Computer Society, Washington, DC, 717--816.Google Scholar
- Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman. Google ScholarDigital Library
- Godfrey, P. 2004. Skyline cardinality for relational processing. In Proceedings of the 3rd International Symposium on Foundations of Information and Knowledge Systems (FoIKS). Lecture Notes in Computer Science, vol. 2942, Springer, Berlin, Heidelberg, New York, 78--97.Google Scholar
- Godfrey, P., Shipley, R., and Gryz, J. 2005. Maximal vector computation in large data sets. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB). Trondheim, Norway. Morgan Kaufmann, San Francisco, CA, 229--240. Google ScholarDigital Library
- Kiessling, W. 2002. Foundations of preferences in database systems. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB). Hong Kong, China. Morgan Kaufmann, San Francisco, CA, 311--322. Google ScholarDigital Library
- Kossmann, D., Ramsak, F., and Rost, S. 2002. Shooting stars in the sky: an online algorithm for skyline queries. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB). Hong Kong, China. Morgan Kaufmann, San Francisco, CA, 275--286. Google ScholarDigital Library
- Papadias, D., Tao, Y., Fu, G., and Seeger, B. 2003. An optimal and progressive algorithm for skyline queries. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD). San Diego, CA. ACM, New York, NY, 467--478. Google ScholarDigital Library
- Papadias, D., Tao, Y., Fu, G., and Seeger, B. 2005. Progressive skyline computation in database systems. ACM Trans. Database Syst. 30, 1, 41--82. Google ScholarDigital Library
- Paredes, R. and Navarro, G. 2006. Optimal incremental sorting. In Proceedings of the 8th Workshop on Algorithm Engineering and Experiments (ALENEX) and the 3rd Workshop on Analytic Algorithmics and Combinatorics (ANALCO). Miami, FL. SIAM Press, Philadelphia, PA, 171--182.Google Scholar
- Preparata, F. P. and Shamos, M. I. 1985. Computational Geometry—An Introduction. Springer. Google ScholarDigital Library
- Sakamoto, H. 1943. On the distributions of the product and the quotient of the independent and uniformly distributed random variables. Tohoku Math. J. 49, 243--260.Google Scholar
- Tan, K.-L., Eng, P.-K., and Ooi, B. C. 2001. Efficient progressive skyline computation. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB). Rome, Italy, Morgan Kaufmann, San Francisco, CA, 301--310. Google ScholarDigital Library
- Tao, Y., Xiao, X., and Pei, J. 2006. SUBSKY: efficient computation of skylines in subspaces. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). Atlanta, GA. IEEE, Computer Society, Washington, DC, 65. Google ScholarDigital Library
Index Terms
- Efficient sort-based skyline evaluation
Recommendations
U-Skyline: A New Skyline Query for Uncertain Databases
The skyline query, aiming at identifying a set of skyline tuples that are not dominated by any other tuple, is particularly useful for multicriteria data analysis and decision making. For uncertain databases, a probabilistic skyline query, called P-...
Efficient Skyline Query over Multiple Relations
Skyline query on multiple relations, known as skyline join query, finds skyline points from join results of multiple data sources. The issue of skyline join query has been extensively studied. However, most of the existing skyline join algorithms can ...
Efficient Skyline Computation of Multiple Range Skyline Queries
iiWAS2021: The 23rd International Conference on Information Integration and Web IntelligenceSkyline query which returns a set of skyline objects by filtering those objects that are dominated by others from a potentially large multidimensional data set has attracted significant research attention especially in the database community. Many ...
Comments