Skip to main content
Erschienen in: Knowledge and Information Systems 2/2015

01.05.2015 | Regular Paper

SEPT: an efficient skyline join algorithm on massive data

verfasst von: Xixian Han, Jianzhong Li, Hong Gao, Chengyu Yang

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Skyline join is an important operation in many applications to return all join tuples that are not dominated by any other join tuples. It is found that the existing algorithms cannot process skyline join on massive data efficiently. This paper presents a novel skyline join algorithm SEPT on massive data. SEPT utilizes sorted positional index lists with join information which require low space overhead to reduce I/O cost significantly. The sorted positional index list is constructed for each potential skyline attribute in the joined tables and is arranged in ascending order of the attribute. SEPT consists of two phases. In phase one, SEPT obtains candidate join positional index pairs of skyline join results. During retrieving the sorted positional index lists, SEPT performs pruning on candidate join positional index pairs in order to discard the candidates whose corresponding join tuples are not skyline join results. In phase two, SEPT exploits the obtained candidate join positional index pairs to get skyline join results by a selective and sequential scan on the tables. The experimental results on synthetic and real data sets show that SEPT has a significant advantage over the existing skyline join algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this paper, we consider that the tuples are in general position [29].
 
Literatur
1.
Zurück zum Zitat Bartolini I, Ciaccia P, Patella M (2008) Efficient sort-based skyline evaluation. ACM Trans Database Syst 33(4):31:1–31:49CrossRef Bartolini I, Ciaccia P, Patella M (2008) Efficient sort-based skyline evaluation. ACM Trans Database Syst 33(4):31:1–31:49CrossRef
2.
Zurück zum Zitat Bentley J, Kung H, Schkolnick M, Thompson C (1978) On the average number of maxima in a set of vectors and applications. J ACM 25(4):536–543CrossRefMATHMathSciNet Bentley J, Kung H, Schkolnick M, Thompson C (1978) On the average number of maxima in a set of vectors and applications. J ACM 25(4):536–543CrossRefMATHMathSciNet
3.
Zurück zum Zitat Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7): 422–426 Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7): 422–426
4.
Zurück zum Zitat Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data, engineering, pp 421–430 Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data, engineering, pp 421–430
5.
Zurück zum Zitat Bryan R (2007) Data-intensive supercomputing: the case for disc. In: Technical report CMU-CS-07-128. School of Computer Science, Carnegie Mellon University Bryan R (2007) Data-intensive supercomputing: the case for disc. In: Technical report CMU-CS-07-128. School of Computer Science, Carnegie Mellon University
6.
Zurück zum Zitat Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data, engineering, pp 717–719 Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data, engineering, pp 717–719
7.
Zurück zum Zitat Courant R, John F (1989) Introduction to calculus and analysis: volume I, 1st edn. Springer, New YorkCrossRefMATH Courant R, John F (1989) Introduction to calculus and analysis: volume I, 1st edn. Springer, New YorkCrossRefMATH
8.
Zurück zum Zitat Gibas M, Canahuate G, Ferhatosmanoglu H (2008) Online index recommendations for high-dimensional databases using query workloads. IEEE Trans Knowl Data Eng 20(2):246–260CrossRef Gibas M, Canahuate G, Ferhatosmanoglu H (2008) Online index recommendations for high-dimensional databases using query workloads. IEEE Trans Knowl Data Eng 20(2):246–260CrossRef
9.
Zurück zum Zitat Godfrey P (2004) Skyline cardinality for relational processing. In: Seipel D, Turull-Torres JMa (eds) Foundations of information and knowledge systems, vol 2942. Springer, Berlin, pp 78–97 Godfrey P (2004) Skyline cardinality for relational processing. In: Seipel D, Turull-Torres JMa (eds) Foundations of information and knowledge systems, vol 2942. Springer, Berlin, pp 78–97
10.
Zurück zum Zitat Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28CrossRef Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28CrossRef
11.
Zurück zum Zitat Gray J, Shenoy P (2000) Rules of thumb in data engineering. In: Proceedings of the 16th international conference on data, engineering, pp 3–12 Gray J, Shenoy P (2000) Rules of thumb in data engineering. In: Proceedings of the 16th international conference on data, engineering, pp 3–12
12.
Zurück zum Zitat Han X, Li J, Yang D (2012) Pi-join: efficiently processing join queries on massive data. Knowl Inf Syst 32(3):527–557CrossRef Han X, Li J, Yang D (2012) Pi-join: efficiently processing join queries on massive data. Knowl Inf Syst 32(3):527–557CrossRef
13.
Zurück zum Zitat Han X, Li J, Yang D, Wang J (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535CrossRef Han X, Li J, Yang D, Wang J (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535CrossRef
14.
Zurück zum Zitat Huang J, Jiang B, Pei J, Chen J, Tang Y (2013) Skyline distance: a measure of multidimensional competence. Knowl Inf Syst 34(2):373–396CrossRef Huang J, Jiang B, Pei J, Chen J, Tang Y (2013) Skyline distance: a measure of multidimensional competence. Knowl Inf Syst 34(2):373–396CrossRef
15.
Zurück zum Zitat Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183CrossRef Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183CrossRef
16.
Zurück zum Zitat Jin W, Ester M, Hu Z, Han J (2007) The multi-relational skyline operator. In: Proceedings of the 23rd international conference on data, engineering, pp 1276–1280 Jin W, Ester M, Hu Z, Han J (2007) The multi-relational skyline operator. In: Proceedings of the 23rd international conference on data, engineering, pp 1276–1280
17.
Zurück zum Zitat Jin W, Morse M, Patel J, Ester M, Hu Z (2010) Evaluating skylines in the presence of equijoins. In: Proceedings of the 26th international conference on data, engineering, pp 249–260 Jin W, Morse M, Patel J, Ester M, Hu Z (2010) Evaluating skylines in the presence of equijoins. In: Proceedings of the 26th international conference on data, engineering, pp 249–260
18.
Zurück zum Zitat Khalefa M, Mokbel M, Levandoski J (2011) Prefjoin: an efficient preference-aware join operator. In: Proceedings of the 27th international conference on data, engineering, pp 995–1006 Khalefa M, Mokbel M, Levandoski J (2011) Prefjoin: an efficient preference-aware join operator. In: Proceedings of the 27th international conference on data, engineering, pp 995–1006
19.
Zurück zum Zitat Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data, bases, pp 275–286 Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data, bases, pp 275–286
21.
Zurück zum Zitat Lee K, Lee W, Zheng B, Li H, Tian Y (2010) Z-sky: an efficient skyline query processing framework based on z-order. VLDB J 19(3):333–362CrossRef Lee K, Lee W, Zheng B, Li H, Tian Y (2010) Z-sky: an efficient skyline query processing framework based on z-order. VLDB J 19(3):333–362CrossRef
22.
Zurück zum Zitat Luo C, Jiang Z, Hou W, He S, Zhu Q (2012) A sampling approach for skyline query cardinality estimation. Knowl Inf Syst 32(2):281–301CrossRef Luo C, Jiang Z, Hou W, He S, Zhu Q (2012) A sampling approach for skyline query cardinality estimation. Knowl Inf Syst 32(2):281–301CrossRef
23.
Zurück zum Zitat Nagendra M, Candan K (2012) Skyline-sensitive joins with lr-pruning. In: Proceedings of the 15th international conference on extending database technology, pp 252–263 Nagendra M, Candan K (2012) Skyline-sensitive joins with lr-pruning. In: Proceedings of the 15th international conference on extending database technology, pp 252–263
24.
Zurück zum Zitat Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82CrossRef Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82CrossRef
25.
Zurück zum Zitat Raghavan V, Rundensteiner E (2010) Progressive result generation for multi-criteria decision support queries. In: Proceedings of the 26th international conference on data, engineering, pp 733–744 Raghavan V, Rundensteiner E (2010) Progressive result generation for multi-criteria decision support queries. In: Proceedings of the 26th international conference on data, engineering, pp 733–744
26.
Zurück zum Zitat Raghavan V, Rundensteiner E, Srivastava S (2011) Skyline and mapping aware join query evaluation. Inf Syst 36(6):917–936CrossRef Raghavan V, Rundensteiner E, Srivastava S (2011) Skyline and mapping aware join query evaluation. Inf Syst 36(6):917–936CrossRef
27.
Zurück zum Zitat Rudin W (1976) Principles of mathematical analysis, 3rd edn. McGraw-Hill Book Co., New YorkMATH Rudin W (1976) Principles of mathematical analysis, 3rd edn. McGraw-Hill Book Co., New YorkMATH
29.
Zurück zum Zitat Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116 Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116
30.
Zurück zum Zitat Sun D, Wu S, Li J, Tung A (2008) Skyline-join in distributed databases. In: Proceedings of the 24th international conference on data engineering workshops, pp 176–181 Sun D, Wu S, Li J, Tung A (2008) Skyline-join in distributed databases. In: Proceedings of the 24th international conference on data engineering workshops, pp 176–181
31.
Zurück zum Zitat Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606CrossRef Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606CrossRef
32.
Zurück zum Zitat Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data, bases, pp 301–310 Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data, bases, pp 301–310
33.
Zurück zum Zitat Tao Y, Xiao X, Pei J (2007) Efficient skyline and top-k retrieval in subspaces. IEEE Trans Knowl Data Eng 19(8):1072–1088CrossRef Tao Y, Xiao X, Pei J (2007) Efficient skyline and top-k retrieval in subspaces. IEEE Trans Knowl Data Eng 19(8):1072–1088CrossRef
35.
Zurück zum Zitat Vlachou A, Doulkeridis C, Polyzotis N (2011) Skyline query processing over joins. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 73–84 Vlachou A, Doulkeridis C, Polyzotis N (2011) Skyline query processing over joins. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 73–84
Metadaten
Titel
SEPT: an efficient skyline join algorithm on massive data
verfasst von
Xixian Han
Jianzhong Li
Hong Gao
Chengyu Yang
Publikationsdatum
01.05.2015
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2015
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0734-2

Weitere Artikel der Ausgabe 2/2015

Knowledge and Information Systems 2/2015 Zur Ausgabe

Premium Partner