Skip to main content
Erschienen in: The VLDB Journal 4/2019

16.01.2019 | Regular Paper

Scalable computational geometry in MapReduce

verfasst von: Yuan Li, Ahmed Eldawy, Jie Xue, Nadezda Knorozova, Mohamed F. Mokbel, Ravi Janardan

Erschienen in: The VLDB Journal | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework has not been exploited for processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapReduce algorithms for various fundamental computational geometry operations, namely polygon union, Voronoi diagram, skyline, convex hull, farthest pair, and closest pair, which present a set of key components for other geometric algorithms. For each computational geometry operation, CG_Hadoop has two versions, one for the Apache Hadoop system and one for the SpatialHadoop system, a Hadoop-based system that is more suited for spatial operations. These proposed algorithms form the nucleus of a comprehensive MapReduce library of computational geometry operations. Extensive experimental results run on a cluster of 25 machines over datasets of size up to 3.8B records show that CG_Hadoop achieves up to 14x and 115x better performance than traditional algorithms when using Hadoop and SpatialHadoop systems, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. In: VLDB (2013) Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. In: VLDB (2013)
2.
Zurück zum Zitat Akdogan, A., Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: Voronoi-based geospatial query processing with MapReduce. In: CLOUDCOM (2010) Akdogan, A., Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: Voronoi-based geospatial query processing with MapReduce. In: CLOUDCOM (2010)
3.
Zurück zum Zitat Andrew, A.M.: Another efficient algorithm for convex hulls in two dimensions. Inf. Process. Lett. 9(5), 216–219 (1979)CrossRefMATH Andrew, A.M.: Another efficient algorithm for convex hulls in two dimensions. Inf. Process. Lett. 9(5), 216–219 (1979)CrossRefMATH
5.
Zurück zum Zitat Bentley, J.L., Kung, H., Schkolnick, M., Thompson, C.D.: On the average number of maxima in a set of vectors and applications. J. ACM: JACM 25(4), 536–543 (1978)MathSciNetCrossRefMATH Bentley, J.L., Kung, H., Schkolnick, M., Thompson, C.D.: On the average number of maxima in a set of vectors and applications. J. ACM: JACM 25(4), 536–543 (1978)MathSciNetCrossRefMATH
6.
Zurück zum Zitat Berg, M.D., Cheong, O., Kreveld, M.V., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer, Berlin (2008)CrossRefMATH Berg, M.D., Cheong, O., Kreveld, M.V., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer, Berlin (2008)CrossRefMATH
7.
Zurück zum Zitat Borne, K.D., Baum, S.A., Fruchter, A., Long, K.S.: The hubble space telescope data archive. In: Astronomical Data Analysis Software and Systems IV, vol. 77 (1995) Borne, K.D., Baum, S.A., Fruchter, A., Long, K.S.: The hubble space telescope data archive. In: Astronomical Data Analysis Software and Systems IV, vol. 77 (1995)
8.
Zurück zum Zitat Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001) Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001)
9.
Zurück zum Zitat Cary, A., Sun, Z., Hristidis, V., Rishe, N.: Experiences on processing spatial data with MapReduce. In: SSDBM, pp. 302–319. New Orleans, Louisiana (2009) Cary, A., Sun, Z., Hristidis, V., Rishe, N.: Experiences on processing spatial data with MapReduce. In: SSDBM, pp. 302–319. New Orleans, Louisiana (2009)
10.
Zurück zum Zitat Choudhury, F.M., Culpepper, J.S., Bao, Z., Sellis, T.: Finding the optimal location and keywords in obstructed and unobstructed space. VLDB J. 27, 445–470 (2018)CrossRef Choudhury, F.M., Culpepper, J.S., Bao, Z., Sellis, T.: Finding the optimal location and keywords in obstructed and unobstructed space. VLDB J. 27, 445–470 (2018)CrossRef
11.
Zurück zum Zitat Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.A., Puz, N., Weaver, D., Yerneni, R.: PNUTS: Yahoo!’s hosted data serving platform. PVLDB 1(2), 1277–1288 (2008) Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.A., Puz, N., Weaver, D., Yerneni, R.: PNUTS: Yahoo!’s hosted data serving platform. PVLDB 1(2), 1277–1288 (2008)
12.
Zurück zum Zitat Cormen, T.H., Leisorson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009) Cormen, T.H., Leisorson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)
14.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRef
15.
Zurück zum Zitat Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. In: PVLDB, pp. 1602–1605. Kohala Coast, HI (2015) Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. In: PVLDB, pp. 1602–1605. Kohala Coast, HI (2015)
16.
Zurück zum Zitat Eldawy, A., Li, Y., Mokbel, M.F., Janardan, R.: CG\_Hadoop: computational geometry in MapReduce. In: SIGSPATIAL, pp. 284–293. Orlando, FL (2013) Eldawy, A., Li, Y., Mokbel, M.F., Janardan, R.: CG\_Hadoop: computational geometry in MapReduce. In: SIGSPATIAL, pp. 284–293. Orlando, FL (2013)
17.
Zurück zum Zitat Eldawy, A., Mokbel, M.F.: A demonstration of SpatialHadoop: an efficient MapReduce framework for spatial data. In: VLDB (2013) Eldawy, A., Mokbel, M.F.: A demonstration of SpatialHadoop: an efficient MapReduce framework for spatial data. In: VLDB (2013)
18.
Zurück zum Zitat Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE (2015) (to appear) Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE (2015) (to appear)
19.
Zurück zum Zitat Fox, A., Eichelberger, C., Hughes, J., Lyon, S.: Spatio-temporal indexing in non-relational distributed databases. In: BigData, pp. 291–299. Santa Clara, CA (2013) Fox, A., Eichelberger, C., Hughes, J., Lyon, S.: Spatio-temporal indexing in non-relational distributed databases. In: BigData, pp. 291–299. Santa Clara, CA (2013)
20.
Zurück zum Zitat García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Enhancing SpatialHadoop with closest pair queries. In: East European Conference on Advances in Databases and Information Systems, pp. 212–225. Springer, Berlin (2016) García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Enhancing SpatialHadoop with closest pair queries. In: East European Conference on Advances in Databases and Information Systems, pp. 212–225. Springer, Berlin (2016)
21.
Zurück zum Zitat Ghoting, A., Krishnamurthy, R., Pednault, E., Reinwald, B., Sindhwani, V., Tatikonda, S., Tian, Y., Vaithyanathan, S.: SystemML: declarative machine learning on MapReduce. In: ICDE (2011) Ghoting, A., Krishnamurthy, R., Pednault, E., Reinwald, B., Sindhwani, V., Tatikonda, S., Tian, Y., Vaithyanathan, S.: SystemML: declarative machine learning on MapReduce. In: ICDE (2011)
23.
Zurück zum Zitat Goodrich, M.T., Sitchinava, N., Zhang, Q.: Sorting, searching, and simulation in the MapReduce framework. In: ISAAC (2011) Goodrich, M.T., Sitchinava, N., Zhang, Q.: Sorting, searching, and simulation in the MapReduce framework. In: ISAAC (2011)
24.
Zurück zum Zitat Guibas, L.J., Stolfi, J.: Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams. In: STOC, pp. 221–234. Boston, MA (1983) Guibas, L.J., Stolfi, J.: Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams. In: STOC, pp. 221–234. Boston, MA (1983)
25.
Zurück zum Zitat Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: SIGMOD (1984) Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: SIGMOD (1984)
26.
Zurück zum Zitat Huai, Y., Chauhan, A., Gates, A., Hagleitner, G., Hanson, E.N., O’Malley, O., Pandey, J., Yuan, Y., Lee, R., Zhang, X.: Major technical advancements in apache hive. In: ACM SIGMOD, pp. 1235–1246 (2014) Huai, Y., Chauhan, A., Gates, A., Hagleitner, G., Hanson, E.N., O’Malley, O., Pandey, J., Yuan, Y., Lee, R., Zhang, X.: Major technical advancements in apache hive. In: ACM SIGMOD, pp. 1235–1246 (2014)
27.
Zurück zum Zitat Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys (2007) Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys (2007)
29.
Zurück zum Zitat Köhler, H., Yang, J., Zhou, X.: Efficient parallel skyline processing using hyperplane projections. In: ACM SIGMOD, pp. 85–96. ACM (2011) Köhler, H., Yang, J., Zhou, X.: Efficient parallel skyline processing using hyperplane projections. In: ACM SIGMOD, pp. 85–96. ACM (2011)
30.
Zurück zum Zitat Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRef Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRef
31.
Zurück zum Zitat Lee, G., Lin, J., Liu, C., Lorek, A., Ryaboy, D.V.: The unified logging infrastructure for data analytics at Twitter. PVLDB 5(12), 1771–1780 (2012) Lee, G., Lin, J., Liu, C., Lorek, A., Ryaboy, D.V.: The unified logging infrastructure for data analytics at Twitter. PVLDB 5(12), 1771–1780 (2012)
32.
Zurück zum Zitat Liao, H., Han, J., Fang, J.: Multi-dimensional index on Hadoop distributed file system. In: ICNAS, pp. 240–249 (2010) Liao, H., Han, J., Fang, J.: Multi-dimensional index on Hadoop distributed file system. In: ICNAS, pp. 240–249 (2010)
33.
Zurück zum Zitat Lu, J., Guting, R.H.: Parallel secondo: boosting database engines with Hadoop. In: ICPADS (2012) Lu, J., Guting, R.H.: Parallel secondo: boosting database engines with Hadoop. In: ICPADS (2012)
34.
Zurück zum Zitat Lu, P., Chen, G., Ooi, B.C., Vo, H.T., Wu, S.: ScalaGiST: scalable generalized search trees for MapReduce systems. PVLDB 7(14), 1797–1808 (2014) Lu, P., Chen, G., Ooi, B.C., Vo, H.T., Wu, S.: ScalaGiST: scalable generalized search trees for MapReduce systems. PVLDB 7(14), 1797–1808 (2014)
35.
Zurück zum Zitat Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using MapReduce. PVLDB 5, 1016–1027 (2012) Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using MapReduce. PVLDB 5, 1016–1027 (2012)
36.
Zurück zum Zitat Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on MapReduce. In: CLOUDDB (2009) Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on MapReduce. In: CLOUDDB (2009)
37.
Zurück zum Zitat Nishimura, S., Das, S., Agrawal, D., Abbadi, A.E.: MD-HBase: A scalable multi-dimensional data infrastructure for location aware services. In: MDM (2011) Nishimura, S., Das, S., Agrawal, D., Abbadi, A.E.: MD-HBase: A scalable multi-dimensional data infrastructure for location aware services. In: MDM (2011)
38.
Zurück zum Zitat Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: \(\cal{MD}\): design and implementation of an elastic data infrastructure for cloud-scale location services. DAPD 31(2), 289–319 (2013) Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: \(\cal{MD}\): design and implementation of an elastic data infrastructure for cloud-scale location services. DAPD 31(2), 289–319 (2013)
39.
Zurück zum Zitat Nutanong, S., Zhang, R., Tanin, E., Kulik, L.: The v*-diagram: a query-dependent approach to moving knn queries. Proc. VLDB Endow. 1(1), 1095–1106 (2008)CrossRef Nutanong, S., Zhang, R., Tanin, E., Kulik, L.: The v*-diagram: a query-dependent approach to moving knn queries. Proc. VLDB Endow. 1(1), 1095–1106 (2008)CrossRef
40.
Zurück zum Zitat Oliver, D., Steinberger, D.J.: From geography to medicine: exploring innerspace via spatial and temporal databases. In: SSTD (2011) Oliver, D., Steinberger, D.J.: From geography to medicine: exploring innerspace via spatial and temporal databases. In: SSTD (2011)
41.
Zurück zum Zitat O’Malley, O.: Terabyte sort on Apache Hadoop. Yahoo! (2008) O’Malley, O.: Terabyte sort on Apache Hadoop. Yahoo! (2008)
43.
Zurück zum Zitat Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. TODS 30(1), 41–82 (2005)CrossRef Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. TODS 30(1), 41–82 (2005)CrossRef
44.
Zurück zum Zitat Park, Y., Min, J., Shim, K.: Parallel computation of skyline and reverse skyline queries using mapreduce. Proc. VLDB Endow. 6(14), 2002–2013 (2013)CrossRef Park, Y., Min, J., Shim, K.: Parallel computation of skyline and reverse skyline queries using mapreduce. Proc. VLDB Endow. 6(14), 2002–2013 (2013)CrossRef
46.
Zurück zum Zitat Preparata, F., Shamos, M.I.: Computational Geometry: An Introduction. Springer, Berlin (1985)CrossRefMATH Preparata, F., Shamos, M.I.: Computational Geometry: An Introduction. Springer, Berlin (1985)CrossRefMATH
47.
Zurück zum Zitat Samet, H.: The Quadtree and related hierarchical data structures. ACMCS 16(2), 187–260 (1984)MathSciNet Samet, H.: The Quadtree and related hierarchical data structures. ACMCS 16(2), 187–260 (1984)MathSciNet
48.
Zurück zum Zitat Sun, Y., Qi, J., Zhang, R., Chen, Y., Du, X.: Mapreduce based location selection algorithm for utility maximization with capacity constraints. Computing 97(4), 403–423 (2015)MathSciNetCrossRefMATH Sun, Y., Qi, J., Zhang, R., Chen, Y., Du, X.: Mapreduce based location selection algorithm for utility maximization with capacity constraints. Computing 97(4), 403–423 (2015)MathSciNetCrossRefMATH
49.
Zurück zum Zitat Tauheed, F., Biveinis, L., Heinis, T., Schürmann, F., Markram, H., Ailamaki, A.: Accelerating range queries for brain simulations. In: ICDE (2012) Tauheed, F., Biveinis, L., Heinis, T., Schürmann, F., Markram, H., Ailamaki, A.: Accelerating range queries for brain simulations. In: ICDE (2012)
50.
Zurück zum Zitat Wang, K., Han, J., Tu, B., Dai, J., Zhou, W., Song, X.: Accelerating spatial data processing with MapReduce. In: ICPADS, pp. 229–236. Shanghai, China (2010) Wang, K., Han, J., Tu, B., Dai, J., Zhou, W., Song, X.: Accelerating spatial data processing with MapReduce. In: ICPADS, pp. 229–236. Shanghai, China (2010)
51.
Zurück zum Zitat Wang, K., Huai, Y., Lee, R., Wang, F., Zhang, X., Saltz, J.: Accelerating pathology image data cross-comparison on cpu-gpu hybrid systems. Proc. VLDB Endow. 5(11), 1543–1554 (2012)CrossRef Wang, K., Huai, Y., Lee, R., Wang, F., Zhang, X., Saltz, J.: Accelerating pathology image data cross-comparison on cpu-gpu hybrid systems. Proc. VLDB Endow. 5(11), 1543–1554 (2012)CrossRef
52.
Zurück zum Zitat Whitman, R.T., Park, M.B., Ambrose, S.A., Hoel, E.G.: Spatial indexing and analytics on Hadoop. In: SIGSPATIAL. Dallas, TX (2014) Whitman, R.T., Park, M.B., Ambrose, S.A., Hoel, E.G.: Spatial indexing and analytics on Hadoop. In: SIGSPATIAL. Dallas, TX (2014)
53.
Zurück zum Zitat Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: EDBT (2012) Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: EDBT (2012)
54.
Zurück zum Zitat Zhang, J., Jiang, X., Ku, W.S., Qin, X.: Efficient parallel skyline evaluation using MapReduce. TPDS PP(99), 1–14 (2015) Zhang, J., Jiang, X., Ku, W.S., Qin, X.: Efficient parallel skyline evaluation using MapReduce. TPDS PP(99), 1–14 (2015)
55.
Zurück zum Zitat Zhang, S., Han, J., Liu, Z., Wang, K., Feng, S.: Spatial queries evaluation with MapReduce. In: GCC (2009) Zhang, S., Han, J., Liu, Z., Wang, K., Feng, S.: Spatial queries evaluation with MapReduce. In: GCC (2009)
Metadaten
Titel
Scalable computational geometry in MapReduce
verfasst von
Yuan Li
Ahmed Eldawy
Jie Xue
Nadezda Knorozova
Mohamed F. Mokbel
Ravi Janardan
Publikationsdatum
16.01.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 4/2019
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-018-0534-5

Weitere Artikel der Ausgabe 4/2019

The VLDB Journal 4/2019 Zur Ausgabe

Premium Partner