Skip to main content
Erschienen in: The Journal of Supercomputing 1/2021

18.04.2020

Parallel computation of probabilistic skyline queries using MapReduce

verfasst von: Elaheh Gavagsaz

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, numerous applications have been continuously generating large amounts of uncertain data. The advanced analysis queries such as skyline operators are essential topics to extract interesting objects from the vast uncertain dataset. Recently, the MapReduce system has been widely used in the area of big data analysis. Although the probabilistic skyline query is not decomposable, it does not make sense to implement the probabilistic skyline query in the MapReduce framework. This paper proposes an effective parallel method called parallel computation of probabilistic skyline query (PCPS) that can measure the probabilistic skyline set in one MapReduce computation pass. The proposed method takes into account the critical sections and detects data with a high probability of existence through a proposed smart sampling algorithm. PCPS implements a new approach to the fair allocation of input data. The experimental results indicate that our proposed approach can not only reduce the processing time of the probabilistic skyline queries, but also achieve fair precision with varying dimensionality degrees.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
4.
Zurück zum Zitat Zhang W, Lin X, Pei J, Zhang Y (2008) Managing uncertain data: probabilistic approaches. In: The Ninth International Conference on Web-Age Information Management. IEEE, pp 405–412 Zhang W, Lin X, Pei J, Zhang Y (2008) Managing uncertain data: probabilistic approaches. In: The Ninth International Conference on Web-Age Information Management. IEEE, pp 405–412
5.
Zurück zum Zitat Borzsony S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of 17th International Conference on Data Engineering, 2001. IEEE, pp 421–430 Borzsony S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of 17th International Conference on Data Engineering, 2001. IEEE, pp 421–430
6.
Zurück zum Zitat Dellis E, Seeger B (2007) Efficient computation of reverse skyline queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, pp 291–302 Dellis E, Seeger B (2007) Efficient computation of reverse skyline queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, pp 291–302
7.
Zurück zum Zitat Papadias D, Tao Y, Fu G, Seeger B (2003) An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM, pp 467–478 Papadias D, Tao Y, Fu G, Seeger B (2003) An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM, pp 467–478
8.
Zurück zum Zitat Atallah MJ, Qi Y (2009) Computing all skyline probabilities for uncertain data. In: Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, pp 279–287 Atallah MJ, Qi Y (2009) Computing all skyline probabilities for uncertain data. In: Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, pp 279–287
9.
Zurück zum Zitat Li F, Yi K, Jestes J (2009) Ranking distributed probabilistic data. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM, pp 361–374 Li F, Yi K, Jestes J (2009) Ranking distributed probabilistic data. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM, pp 361–374
10.
Zurück zum Zitat Pei J, Jiang B, Lin X, Yuan Y (2007) Probabilistic skylines on uncertain data. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, pp 15–26 Pei J, Jiang B, Lin X, Yuan Y (2007) Probabilistic skylines on uncertain data. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, pp 15–26
11.
Zurück zum Zitat Wang C, Yuan LY, You J-H, Zaiane OR, Pei J (2011) On pruning for top-k ranking in uncertain databases. Proc VLDB Endow 4(10):598–609CrossRef Wang C, Yuan LY, You J-H, Zaiane OR, Pei J (2011) On pruning for top-k ranking in uncertain databases. Proc VLDB Endow 4(10):598–609CrossRef
12.
Zurück zum Zitat Zhang W, Lin X, Zhang Y, Wang W, Yu JX (2009) Probabilistic skyline operator over sliding windows. In: IEEE 25th International Conference on Data Engineering, 2009. ICDE’09. IEEE, pp 1060–1071 Zhang W, Lin X, Zhang Y, Wang W, Yu JX (2009) Probabilistic skyline operator over sliding windows. In: IEEE 25th International Conference on Data Engineering, 2009. ICDE’09. IEEE, pp 1060–1071
16.
Zurück zum Zitat Ding L, Wang G, Xin J, Yuan Y (2013) Efficient probabilistic skyline query processing in mapreduce. In: 2013 IEEE International Congress on Big Data (BigData Congress). IEEE, pp 203–210 Ding L, Wang G, Xin J, Yuan Y (2013) Efficient probabilistic skyline query processing in mapreduce. In: 2013 IEEE International Congress on Big Data (BigData Congress). IEEE, pp 203–210
17.
Zurück zum Zitat Park Y, Min J-K, Shim K (2013) Parallel computation of skyline and reverse skyline queries using mapreduce. Proc VLDB Endow 6(14):2002–2013CrossRef Park Y, Min J-K, Shim K (2013) Parallel computation of skyline and reverse skyline queries using mapreduce. Proc VLDB Endow 6(14):2002–2013CrossRef
18.
Zurück zum Zitat Park Y, Min J-K, Shim K (2015) Processing of probabilistic skyline queries using mapreduce. Proc VLDB Endow 8(12):1406–1417CrossRef Park Y, Min J-K, Shim K (2015) Processing of probabilistic skyline queries using mapreduce. Proc VLDB Endow 8(12):1406–1417CrossRef
19.
Zurück zum Zitat Ryu H-C, Jung S (2017) MapReduce-based skyline query processing scheme using adaptive two-level grids. Clust Comput 20(4):3605–3616CrossRef Ryu H-C, Jung S (2017) MapReduce-based skyline query processing scheme using adaptive two-level grids. Clust Comput 20(4):3605–3616CrossRef
20.
Zurück zum Zitat Zhang B, Zhou S, Guan J (2011) Adapting skyline computation to the mapreduce framework: algorithms and experiments. In: International Conference on Database Systems for Advanced Applications. Springer, pp 403–414 Zhang B, Zhou S, Guan J (2011) Adapting skyline computation to the mapreduce framework: algorithms and experiments. In: International Conference on Database Systems for Advanced Applications. Springer, pp 403–414
22.
Zurück zum Zitat Li X, Liu J, Ren K, Li X, Ren X, Deng K (2019) Parallel k-dominant skyline queries over uncertain data streams with capability index. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 10–12 Aug. 2019. pp 1556–1563. https://doi.org/10.1109/hpcc/smartcity/dss.2019.00214 Li X, Liu J, Ren K, Li X, Ren X, Deng K (2019) Parallel k-dominant skyline queries over uncertain data streams with capability index. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 10–12 Aug. 2019. pp 1556–1563. https://​doi.​org/​10.​1109/​hpcc/​smartcity/​dss.​2019.​00214
23.
Zurück zum Zitat Liu J, Li X, Ren K, Song J, Zhang Z (2018) Parallel n-of-N skyline queries over uncertain data streams. In: Database and Expert Systems Applications. Springer International Publishing, Cham. pp 176–184 Liu J, Li X, Ren K, Song J, Zhang Z (2018) Parallel n-of-N skyline queries over uncertain data streams. In: Database and Expert Systems Applications. Springer International Publishing, Cham. pp 176–184
24.
Zurück zum Zitat Vlachou A, Doulkeridis C, Kotidis Y (2008) Angle-based space partitioning for efficient parallel skyline computation. In: Proceedings of the 2008 ACM SIGMOD International conference on Management of Data. ACM, pp 227–238 Vlachou A, Doulkeridis C, Kotidis Y (2008) Angle-based space partitioning for efficient parallel skyline computation. In: Proceedings of the 2008 ACM SIGMOD International conference on Management of Data. ACM, pp 227–238
26.
Zurück zum Zitat Tan K-L, Eng P-K, Ooi BC (2001) Efficient progressive skyline computation. In: VLDB, vol 1. pp 301–310 Tan K-L, Eng P-K, Ooi BC (2001) Efficient progressive skyline computation. In: VLDB, vol 1. pp 301–310
27.
Zurück zum Zitat Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: VLDB’02 Proceedings of the 28th International Conference on Very Large Data Bases. Morgan Kaufmann, pp 275–286 Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: VLDB’02 Proceedings of the 28th International Conference on Very Large Data Bases. Morgan Kaufmann, pp 275–286
28.
Zurück zum Zitat Bartolini I, Ciaccia P, Patella M (2006) SaLSa: computing the skyline without scanning the whole sky. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, pp 405–414 Bartolini I, Ciaccia P, Patella M (2006) SaLSa: computing the skyline without scanning the whole sky. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, pp 405–414
29.
Zurück zum Zitat Chomicki J, Godfrey P, Gryz J, Liang D (2005) Skyline with presorting: theory and optimizations. Intelligent Information Processing and Web Mining. Springer, Berlin, Heidelberg, pp 595–604CrossRef Chomicki J, Godfrey P, Gryz J, Liang D (2005) Skyline with presorting: theory and optimizations. Intelligent Information Processing and Web Mining. Springer, Berlin, Heidelberg, pp 595–604CrossRef
30.
Zurück zum Zitat Godfrey P, Shipley R, Gryz J (2005) Maximal vector computation in large data sets. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, pp 229–240 Godfrey P, Shipley R, Gryz J (2005) Maximal vector computation in large data sets. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, pp 229–240
31.
Zurück zum Zitat Rocha-Junior JB, Vlachou A, Doulkeridis C, Nørvåg K (2009) AGiDS: A grid-based strategy for distributed skyline query processing. In: International Conference on Data Management in Grid and P2P Systems. Springer, pp 12–23 Rocha-Junior JB, Vlachou A, Doulkeridis C, Nørvåg K (2009) AGiDS: A grid-based strategy for distributed skyline query processing. In: International Conference on Data Management in Grid and P2P Systems. Springer, pp 12–23
32.
Zurück zum Zitat Wang S, Ooi BC, Tung AK, Xu L (2007) Efficient skyline query processing on peer-to-peer networks. In: IEEE 23rd International Conference on Data Engineering, 2007. ICDE 2007. IEEE, pp 1126–1135 Wang S, Ooi BC, Tung AK, Xu L (2007) Efficient skyline query processing on peer-to-peer networks. In: IEEE 23rd International Conference on Data Engineering, 2007. ICDE 2007. IEEE, pp 1126–1135
33.
Zurück zum Zitat Wu P, Zhang C, Feng Y, Zhao BY, Agrawal D, El Abbadi A (2006) Parallelizing skyline queries for scalable distribution. In: International Conference on Extending Database Technology. Springer, pp 112–130 Wu P, Zhang C, Feng Y, Zhao BY, Agrawal D, El Abbadi A (2006) Parallelizing skyline queries for scalable distribution. In: International Conference on Extending Database Technology. Springer, pp 112–130
34.
Zurück zum Zitat Cosgaya-Lozano A, Rau-Chaplin A, Zeh N Parallel computation of skyline queries. In: 21st International Symposium on High Performance Computing Systems and Applications, 2007. HPCS 2007. IEEE, pp 12–12 Cosgaya-Lozano A, Rau-Chaplin A, Zeh N Parallel computation of skyline queries. In: 21st International Symposium on High Performance Computing Systems and Applications, 2007. HPCS 2007. IEEE, pp 12–12
35.
Zurück zum Zitat Köhler H, Yang J, Zhou X (2011) Efficient parallel skyline processing using hyperplane projections. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, pp 85–96 Köhler H, Yang J, Zhou X (2011) Efficient parallel skyline processing using hyperplane projections. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, pp 85–96
36.
Zurück zum Zitat Afrati FN, Koutris P, Suciu D, Ullman JD (2015) Parallel skyline queries. Theory Comput Syst 57(4):1008–1037MathSciNetCrossRef Afrati FN, Koutris P, Suciu D, Ullman JD (2015) Parallel skyline queries. Theory Comput Syst 57(4):1008–1037MathSciNetCrossRef
37.
Zurück zum Zitat Green TJ, Tannen V (2006) Models for incomplete and probabilistic information. In: International Conference on Extending Database Technology. Springer, pp 278–296 Green TJ, Tannen V (2006) Models for incomplete and probabilistic information. In: International Conference on Extending Database Technology. Springer, pp 278–296
38.
Zurück zum Zitat Xin J, Bai M, Wang G (2011) Efficient threshold skyline query processing in uncertain databases. In: 2011 Seventh International Conference on Natural Computation. IEEE, pp 311–315 Xin J, Bai M, Wang G (2011) Efficient threshold skyline query processing in uncertain databases. In: 2011 Seventh International Conference on Natural Computation. IEEE, pp 311–315
39.
Zurück zum Zitat Considine J, Li F, Kollios G, Byers J (2004) Approximate aggregation techniques for sensor databases. In: Proceedings. 20th International Conference on Data Engineering. IEEE, pp 449–460 Considine J, Li F, Kollios G, Byers J (2004) Approximate aggregation techniques for sensor databases. In: Proceedings. 20th International Conference on Data Engineering. IEEE, pp 449–460
40.
Zurück zum Zitat Xin J, Wang G, Chen L, Zhang X, Wang Z (2007) Continuously maintaining sliding window skylines in a sensor network. In: International Conference on Database Systems for Advanced Applications. Springer, pp 509–521 Xin J, Wang G, Chen L, Zhang X, Wang Z (2007) Continuously maintaining sliding window skylines in a sensor network. In: International Conference on Database Systems for Advanced Applications. Springer, pp 509–521
41.
Zurück zum Zitat Cosgaya-Lozano A, Rau-Chaplin A, Zeh N (2007) Parallel computation of skyline queries. In: 21st International Symposium on High Performance Computing Systems and Applications (HPCS’07). IEEE, pp 12–12 Cosgaya-Lozano A, Rau-Chaplin A, Zeh N (2007) Parallel computation of skyline queries. In: 21st International Symposium on High Performance Computing Systems and Applications (HPCS’07). IEEE, pp 12–12
42.
Zurück zum Zitat Valkanas G, Papadopoulos AN (2010) Efficient and adaptive distributed skyline computation. In: International Conference on Scientific and Statistical Database Management. Springer, pp 24–41 Valkanas G, Papadopoulos AN (2010) Efficient and adaptive distributed skyline computation. In: International Conference on Scientific and Statistical Database Management. Springer, pp 24–41
43.
Zurück zum Zitat Gavagsaz E, Rezaee A, Haj Seyyed Javadi H (2018) Load balancing in reducers for skewed data in MapReduce systems by using scalable simple random sampling. J Supercomput 74:3415–3440CrossRef Gavagsaz E, Rezaee A, Haj Seyyed Javadi H (2018) Load balancing in reducers for skewed data in MapReduce systems by using scalable simple random sampling. J Supercomput 74:3415–3440CrossRef
44.
Zurück zum Zitat Gavagsaz E, Rezaee A, Javadi HHS (2019) Load balancing in join algorithms for skewed data in MapReduce systems. J Supercomput 75(1):228–254CrossRef Gavagsaz E, Rezaee A, Javadi HHS (2019) Load balancing in join algorithms for skewed data in MapReduce systems. J Supercomput 75(1):228–254CrossRef
46.
Zurück zum Zitat Meng X (2013) Scalable simple random sampling and stratified sampling. In: Proceedings of the 30th International Conference on International Conference on Machine Learning—volume 28. pp III-531–III-539 Meng X (2013) Scalable simple random sampling and stratified sampling. In: Proceedings of the 30th International Conference on International Conference on Machine Learning—volume 28. pp III-531–III-539
48.
Zurück zum Zitat Madar V (2015) Direct formulation to Cholesky decomposition of a general nonsingular correlation matrix. Stat Probab Lett 103:142–147MathSciNetCrossRef Madar V (2015) Direct formulation to Cholesky decomposition of a general nonsingular correlation matrix. Stat Probab Lett 103:142–147MathSciNetCrossRef
Metadaten
Titel
Parallel computation of probabilistic skyline queries using MapReduce
verfasst von
Elaheh Gavagsaz
Publikationsdatum
18.04.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03279-x

Weitere Artikel der Ausgabe 1/2021

The Journal of Supercomputing 1/2021 Zur Ausgabe