Skip to main content
Top
Published in: GeoInformatica 1/2022

23-07-2021

GeoBalance: workload-aware partitioning of real-time spatiotemporal data

Authors: Kiumars Soltani, Anand Padmanabhan, Shaowen Wang

Published in: GeoInformatica | Issue 1/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Numerous scientific disciplines have witnessed tremendous growth in the amount of spatial data produced over the past decade. To handle the volume and velocity of such data, researchers have embraced distributed systems, which partition data among multiple nodes to provide scalability and high availability. Previous work on partitioning large spatiotemporal data focuses on bulk-ingestion and static partitioning, hence is unable to handle dynamic data and querying workloads which is common for real-time data. In this paper we develop GeoBalance as a workload-aware partitioning approach for spatiotemporal data that can adapt partitions on-the-fly without disrupting the data ingestion/retrieval process. GeoBalance employs a spatial evolutionary algorithm to incrementally tune the partitions according to a geo-aware partitioning fitness function. In addition, we perform a rolling migration from one partitioning scheme to another to ensure that data ingestion and retrieval is not compromised during the partition change period. We conduct multiple experiments using a write-intensive hybrid workload of Twitter data and random hotspots, to demonstrate that the GeoBalance partitioning approach outperforms statically defined partitions and other partitioning algorithms such as k-d tree.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
3
Resourcing Open Geo-spatial Education and Research
 
Literature
2.
go back to reference Aly AM, Mahmood AR, Hassan MS, Aref WG, Ouzzani M, Elmeleegy H, Qadah T (2015) Aqwa: adaptive query workload aware partitioning of big spatial data. Proc VLDB Endowment 8(13):2062–2073CrossRef Aly AM, Mahmood AR, Hassan MS, Aref WG, Ouzzani M, Elmeleegy H, Qadah T (2015) Aqwa: adaptive query workload aware partitioning of big spatial data. Proc VLDB Endowment 8(13):2062–2073CrossRef
3.
go back to reference Kleppmann M (2017) Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. ” O’Reilly Media, Inc.” Kleppmann M (2017) Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. ” O’Reilly Media, Inc.”
4.
go back to reference Soliman A, Soltani K, Yin J, Padmanabhan A, Wang S (2017) Social sensing of urban land use based on analysis of twitter users mobility patterns. PloS one 12(7):e0181657CrossRef Soliman A, Soltani K, Yin J, Padmanabhan A, Wang S (2017) Social sensing of urban land use based on analysis of twitter users mobility patterns. PloS one 12(7):e0181657CrossRef
5.
go back to reference Kamath KY, Caverlee J, Cheng Z, Sui DZ (2012) Spatial influence vs. community influence: modeling the global spread of social media. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 962–971 Kamath KY, Caverlee J, Cheng Z, Sui DZ (2012) Spatial influence vs. community influence: modeling the global spread of social media. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 962–971
7.
go back to reference Eldawy A, Mokbel MF (2015) The era of big spatial data: Challenges and opportunities. In: Proceedings of the 2015 16th IEEE International Conference on Mobile Data Management - Volume 02, MDM ’15. IEEE Computer Society, Washington, pp 7–10. https://doi.org/10.1109/MDM.2015.82 Eldawy A, Mokbel MF (2015) The era of big spatial data: Challenges and opportunities. In: Proceedings of the 2015 16th IEEE International Conference on Mobile Data Management - Volume 02, MDM ’15. IEEE Computer Society, Washington, pp 7–10. https://​doi.​org/​10.​1109/​MDM.​2015.​82
8.
go back to reference Fox A, Eichelberger C, Hughes J, Lyon S (2013) Spatio-temporal indexing in non-relational distributed databases. In: 2013 IEEE International Conference on Big Data, pp 291–299 Fox A, Eichelberger C, Hughes J, Lyon S (2013) Spatio-temporal indexing in non-relational distributed databases. In: 2013 IEEE International Conference on Big Data, pp 291–299
9.
go back to reference Malensek M, Pallickara S, Pallickara S (2016) Autonomous cloud federation for high-throughput queries over voluminous datasets. IEEE Cloud Comput 3(3):40–49CrossRef Malensek M, Pallickara S, Pallickara S (2016) Autonomous cloud federation for high-throughput queries over voluminous datasets. IEEE Cloud Comput 3(3):40–49CrossRef
11.
go back to reference Serafini M, Taft R, Elmore AJ, Pavlo A, Aboulnaga A, Stonebraker M (2016) Clay: fine-grained adaptive partitioning for general database schemas. Proc VLDB Endowment 10(4):445–456CrossRef Serafini M, Taft R, Elmore AJ, Pavlo A, Aboulnaga A, Stonebraker M (2016) Clay: fine-grained adaptive partitioning for general database schemas. Proc VLDB Endowment 10(4):445–456CrossRef
12.
go back to reference Arzuaga E, Kaeli DR (2010) Quantifying load imbalance on virtualized enterprise servers. In: Proceedings of the First Joint WOSP/SIPEW International Conference on Performance Engineering, WOSP/SIPEW ’10. ACM, New York, pp 235–242. https://doi.org/10.1145/1712605.1712641 Arzuaga E, Kaeli DR (2010) Quantifying load imbalance on virtualized enterprise servers. In: Proceedings of the First Joint WOSP/SIPEW International Conference on Performance Engineering, WOSP/SIPEW ’10. ACM, New York, pp 235–242. https://​doi.​org/​10.​1145/​1712605.​1712641
14.
go back to reference Malensek M, Pallickara S, Pallickara S (2013) Polygon-based query evaluation over geospatial data using distributed hash tables. In: 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing (UCC), pp 219–226 Malensek M, Pallickara S, Pallickara S (2013) Polygon-based query evaluation over geospatial data using distributed hash tables. In: 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing (UCC), pp 219–226
16.
go back to reference Kini A, Emanuele R (2014) Geotrellis: Adding geospatial capabilities to spark. Spark Summit Kini A, Emanuele R (2014) Geotrellis: Adding geospatial capabilities to spark. Spark Summit
17.
go back to reference Yu J, Wu J, Sarwat M (2015) Geospark: A cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, pp 70 Yu J, Wu J, Sarwat M (2015) Geospark: A cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, pp 70
18.
go back to reference Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proc VLDB Endowment 6(11):1009–1020CrossRef Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proc VLDB Endowment 6(11):1009–1020CrossRef
19.
go back to reference Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp 1–10 Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp 1–10
20.
go back to reference Nishimura S, Das S, Agrawal D, El Abbadi A (2013) ∖mathcal {MD}-hbase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib Parallel Databases 31(2):289–319CrossRef Nishimura S, Das S, Agrawal D, El Abbadi A (2013) ∖mathcal {MD}-hbase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib Parallel Databases 31(2):289–319CrossRef
21.
go back to reference Taft R, Mansour E, Serafini M, Duggan J, Elmore AJ, Aboulnaga A, Pavlo A, Stonebraker M (2014) E-store: Fine-grained elastic partitioning for distributed transaction processing systems. Proc VLDB Endowment 8 (3):245–256CrossRef Taft R, Mansour E, Serafini M, Duggan J, Elmore AJ, Aboulnaga A, Pavlo A, Stonebraker M (2014) E-store: Fine-grained elastic partitioning for distributed transaction processing systems. Proc VLDB Endowment 8 (3):245–256CrossRef
23.
go back to reference Ghosh M, Xu L, Qian X, Kao T, Gupta I, Gupta H (2016) Getafix: Workload-aware distributed interactive analytics. UIUC Ideals Ghosh M, Xu L, Qian X, Kao T, Gupta I, Gupta H (2016) Getafix: Workload-aware distributed interactive analytics. UIUC Ideals
24.
go back to reference Jindal A, Dittrich J (2011) Relax and let the database do the partitioning online. In: International Workshop on Business Intelligence for the Real-Time Enterprise. Springer, pp 65–80 Jindal A, Dittrich J (2011) Relax and let the database do the partitioning online. In: International Workshop on Business Intelligence for the Real-Time Enterprise. Springer, pp 65–80
25.
go back to reference Pavlo A, Curino C, Zdonik S (2012) Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12. ACM, New Yorkpp 61–72. https://doi.org/10.1145/2213836.2213844 Pavlo A, Curino C, Zdonik S (2012) Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12. ACM, New Yorkpp 61–72. https://​doi.​org/​10.​1145/​2213836.​2213844
26.
go back to reference Quamar A, Kumar KA, Deshpande A (2013) Sword: scalable workload-aware data placement for transactional workloads. In: Proceedings of the 16th International Conference on Extending Database Technology. ACM, pp 430–441 Quamar A, Kumar KA, Deshpande A (2013) Sword: scalable workload-aware data placement for transactional workloads. In: Proceedings of the 16th International Conference on Extending Database Technology. ACM, pp 430–441
27.
go back to reference Wu X, Murray AT (2008) A new approach to quantifying spatial contiguity using graph theory and spatial interaction. Int J Geogr Inf Sci 22(4):387–407CrossRef Wu X, Murray AT (2008) A new approach to quantifying spatial contiguity using graph theory and spatial interaction. Int J Geogr Inf Sci 22(4):387–407CrossRef
28.
go back to reference Tzoumas K, Yiu ML, Jensen CS (2009) Workload-aware indexing of continuously moving objects. Proc VLDB Endowment 2(1):1186–1197CrossRef Tzoumas K, Yiu ML, Jensen CS (2009) Workload-aware indexing of continuously moving objects. Proc VLDB Endowment 2(1):1186–1197CrossRef
29.
go back to reference Achakeev D, Seeger B, Widmayer P (2012) Sort-based query-adaptive loading of r-trees. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, pp 2080–2084 Achakeev D, Seeger B, Widmayer P (2012) Sort-based query-adaptive loading of r-trees. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, pp 2080–2084
31.
go back to reference Kai CAO, Boa HUANG (2010) Comparison of spatial compactness evaluation methods for simple genetic algorithm based land use planning optimization problem. In: Proceedings of the Joint International Conference on Theory, Data Handling and Modelling in GeoSpatial Information Science, pp 26–28 Kai CAO, Boa HUANG (2010) Comparison of spatial compactness evaluation methods for simple genetic algorithm based land use planning optimization problem. In: Proceedings of the Joint International Conference on Theory, Data Handling and Modelling in GeoSpatial Information Science, pp 26–28
32.
go back to reference Beasley D, Bull DR, Martin RR (1993) An overview of genetic algorithms: Part 1, fundamentals. Univ Comput 15(2):58–69 Beasley D, Bull DR, Martin RR (1993) An overview of genetic algorithms: Part 1, fundamentals. Univ Comput 15(2):58–69
35.
go back to reference Gupta A, Yang F, Govig J, Kirsch A, Chan K, Lai K, Wu S, Dhoot S, Kumar AR, Agiwal A, Bhansali S, Hong M, Cameron J, Siddiqi M, Jones D, Shute J, Gubarev A, Venkataraman S, Agrawal D (2016) Mesa: A geo-replicated online data warehouse for google’s advertising system. Commun ACM 59(7):117–125. https://doi.org/10.1145/2936722CrossRef Gupta A, Yang F, Govig J, Kirsch A, Chan K, Lai K, Wu S, Dhoot S, Kumar AR, Agiwal A, Bhansali S, Hong M, Cameron J, Siddiqi M, Jones D, Shute J, Gubarev A, Venkataraman S, Agrawal D (2016) Mesa: A geo-replicated online data warehouse for google’s advertising system. Commun ACM 59(7):117–125. https://​doi.​org/​10.​1145/​2936722CrossRef
36.
go back to reference Marz N, Warren J (2015) Big data: Principles and best practices of scalable realtime data systems. Manning Publications Co. Marz N, Warren J (2015) Big data: Principles and best practices of scalable realtime data systems. Manning Publications Co.
39.
go back to reference Kim YS, Kim T, Carey MJ, Li C (2017) A comparative study of log-structured merge-tree-based spatial indexes for big data. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp 147–150 Kim YS, Kim T, Carey MJ, Li C (2017) A comparative study of log-structured merge-tree-based spatial indexes for big data. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp 147–150
40.
go back to reference Rabl T, Sadoghi M, Jacobsen H-A, Gómez-Villamor S, Muntés-Mulero V, Mankowskii S (2012) Solving big data challenges for enterprise application performance management. Proc VLDB Endowment 5(12) Rabl T, Sadoghi M, Jacobsen H-A, Gómez-Villamor S, Muntés-Mulero V, Mankowskii S (2012) Solving big data challenges for enterprise application performance management. Proc VLDB Endowment 5(12)
Metadata
Title
GeoBalance: workload-aware partitioning of real-time spatiotemporal data
Authors
Kiumars Soltani
Anand Padmanabhan
Shaowen Wang
Publication date
23-07-2021
Publisher
Springer US
Published in
GeoInformatica / Issue 1/2022
Print ISSN: 1384-6175
Electronic ISSN: 1573-7624
DOI
https://doi.org/10.1007/s10707-021-00444-z

Other articles of this Issue 1/2022

GeoInformatica 1/2022 Go to the issue