Elsevier

Big Data Research

Volume 2, Issue 2, June 2015, Pages 74-81
Big Data Research

Geospatial Big Data: Challenges and Opportunities

https://doi.org/10.1016/j.bdr.2015.01.003Get rights and content

Abstract

Geospatial big data refers to spatial data sets exceeding capacity of current computing systems. A significant portion of big data is actually geospatial data, and the size of such data is growing rapidly at least by 20% every year. In this paper, we explore the challenges and opportunities which geospatial big data brought us. Several case studies are introduced to show the importance and benefits of the analytics of geospatial big data, including fuel and time saving, revenue increase, urban planning, and health care. Then, we introduce new emerging platforms for sharing the collected geospatial big data and for tracking human mobility via mobile devices. The researchers in academia and industry have spent a lot of efforts to improve the value of geospatial big data as well as take advantage of its value. Along the same line, we present our current research activities toward the analytics of geospatial big data, especially on interactive analytics of real-time or dynamic data.

Introduction

Geospatial data has always been big data. In these days, big data analytics for geospatial data is receiving considerable attention to allow users to analyze huge amounts of geospatial data. Geospatial big data typically refers to spatial data sets exceeding capacity of current computing systems. McKinsey Global Institute says that the pool of personal location data was in the level of 1 PB in 2009 and is growing at a rate of 20% per year [1]. This estimation did not include the data from RFID sensors and those stored in private archives. According to the estimation by United Nations Initiative on Global Geospatial Information Management (UN-GGIM), 2.5 quintillion bytes of data is being generated every day, and a large portion of the data is location-aware. Also, in Google, about 25 PB of data is being generated per day, and a significant portion of the data falls into the realm of spatio-temporal data [2]. This trend will be even accelerated since the world becomes more and more mobile in these days. As in Fig. 1, in India, the internet traffic from mobile devices already exceeded that from desktop computers [3].

Along with this exponential increase of geospatial big data, the capability of high performance computing is being required greatly than ever, for modeling and simulation of geospatially enabled contents. However, because of limited processing power, it has been hard to fully exploit high-volume or high-velocity collection of geospatial data in many applications. Recently, distributed, parallel processing on a cluster of commodity computers or a cloud such as Amazon EC21 has been becoming widely available for use, breaking the existing limitations on processing power. In addition, big data platforms such as Hadoop [4], Hive [5], and MongoDB [6] have been developed such that users can implement big data analytics software very easily on a distributed, parallel computing platform. It is obvious that these recent improvements are providing us with a lot of opportunities for advanced analytics for geospatial big data [7], [8], [9]. According to Garner's hype cycle in Fig. 2, geospatial big data analytics belongs to the stage of peak of inflated expectations as of July 2012 [10].

Geospatial big data or simply spatial big data are societal opportunities [11], [12]. The Millennium Project identified 15 global challenges that the human kind is facing as in Fig. 3 [13]. Many of them can benefit from geospatial big data. Shashi Shekhar [14], a renowned computer scientist, says that the seven challenges are related to geospatial big data, as indicated by boxes in the figure. For example, as for energy, eco-routing is one example that can save energy using geospatial big data. This technology minimizes fuel consumption rather than travel time or travel distance. For this purpose, eco-routing tries to find a route that avoids congestion, idling at red lights, turns and elevation changes, and so on. Compared to using the “Fastest Route” option, Ford researchers told that using the “Eco Route” option offered as much as 15% reduced fuel consumption in some of their vehicles [15]. Now, in many Ford cars, we can find the eco-route option, as in Fig. 4.

McKinsey Global Institute conducted a study on how big data can innovate our world [16]. As for geospatial big data, the study says “the use of personal location data could save consumers worldwide more than $600 billion annually by 2020.” One can find out users' current locations by tracking their mobile devices such as smart phones. The study mentioned geosocial networking services such as Foursquare used for locating friends and for finding nearby stores and restaurants, where many users check-in at various places and reveal their current location [17]. On the other hand, according to the study, the biggest consumer benefit will be obtained from time and fuel saving thanks to location-based services that, by taking account of real-time traffic and weather data, help driver avoid traffic congestion and recommend alternative routes. Location tracking can be done by using a driver's smart phone or a global positioning system (GPS) equipped with a car.

Section snippets

Power of location

Sir Martin Sorrell [18], the CEO of WPP Group, says “Location targeting is holy grail for marketers.” Big data analytics is an effective way to enhance the power of location [19]. For example, video rental services of Netflix can benefit from analyzing rental patterns of the regions designated by zip codes [20]. In Fig. 5, which is the result of this study, Netflix generated the data on the top-50 rentals in 2009 in each zip code. Titles were listed in the approximate order of popularity across

Data collection

There are several forms of geospatial big data. Traditionally, geospatial data can be categorized into three forms: raster data, vector data, and graph data [14]. First, the raster data include geo-images typically obtained by unmanned aerial vehicles, security cameras, and satellites. Recently, the military is collecting huge amounts of raster data by utilizing drones, and the satellites keep providing us with the remote sensing data of the Earth. Table 1 shows some of the climate and earth

On-going efforts

We recently initiated a new research project on spatial big data, supported by the Ministry of Land, Infrastructure, and Transport of the Korean Government. This project is planned for five years, and the outcomes will be integrated into the public services for Korean citizens. Fig. 9 shows the entire system architecture we are planning. The system consists of three layers: geospatial big data integration & management, geospatial big data analytics, and geospatial big data service platform. The

Conclusion

In this paper, we have discussed the challenges and opportunities which geospatial big data brought us. Many evidences have witnessed that a significant portion of big data is, in fact, geospatial big data. We can innovate our daily life and business by exploiting the power of location embedded in geospatial big data. A few cases are introduced to show the real benefits of geospatial big data. The collection of geospatial big data becomes pretty easy thanks to the advancements of sensor and

Acknowledgements

This research, “Geospatial Big Data Management, Analysis and Service Platform Technology Development,” was supported by the MOLIT (The Ministry of Land, Infrastructure and Transport), Korea, under the national spatial information research program supervised by the KAIA (Korea Agency for Infrastructure Technology Advancement) (14NSIP-B091011-01).

References (36)

  • A. Dasgupta

    Big data: the future is in analytics

  • R.R. Vatsavai et al.

    Spatiotemporal data mining in the era of big spatial data: algorithms and applications

  • M. Meeker

    2012 KPCB internet trends year-end update

  • T. White

    Hadoop: The Definitive Guide

    (2012)
  • A. Thusoo et al.

    Hive: a warehousing solution over a map-reduce framework

    Proc. VLDB Endow.

    (2009)
  • E. Plugge et al.

    The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing

    (2010)
  • J.-G. Lee et al.

    Trajectory clustering: a partition-and-group framework

  • J.-G. Lee et al.

    Trajectory outlier detection: a partition-and-detect framework

  • J.-G. Lee et al.

    TraClass: trajectory classification using hierarchical region-based and trajectory-based clustering

    Proc. VLDB Endow.

    (2008)
  • A. Lapkin

    Hype cycle for big data, 2012

  • N. Eagle et al.

    Reality Mining: Using Big Data to Engineer a Better World

    (2014)
  • V. Mayer-Schonberger et al.

    Big Data: A Revolution That Will Transform How We Live, Work, and Think

    (2014)
  • M. Marien

    Global challenges for humanity

  • S. Shekhar

    Spatial big data challenges

  • P. Valdes-Dapena

    GPS systems that save gas

  • S. Lohr

    New ways to exploit raw data may bring surge of innovation, a study says

  • M. Choy et al.

    Glaucus: exploiting the wisdom of crowds for location-based queries in mobile environments

  • S.M. Sorrell

    The power of apps

  • Cited by (339)

    • Leveraging OGC API for cloud-based flood modeling campaigns

      2024, Environmental Modelling and Software
    View all citing articles on Scopus

    This article belongs to Visions on Big Data.

    View full text