ABSTRACT
Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful patterns from the spatial and spatiotemporal data. However, explosive growth in the spatial and spatiotemporal data, and the emergence of social media and location sensing technologies emphasize the need for developing new and computationally efficient methods tailored for analyzing big data. In this paper, we review major spatial data mining algorithms by closely looking at the computational and I/O requirements and allude to few applications dealing with big spatial data.
- D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, and A. Ng. Discriminative learning of markov random fields for segmentation of 3d scan data. In CVPR '05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2, pages 169--176, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- L. Anselin. Spatial Econometrics: methods and models. Kluwer, Dordrecht, Netherlands, 1988.Google Scholar
- J. Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of Royal Statistical Society, 36:192--236, 1974.Google Scholar
- J. Besag. On the statistical analysis of dirty pictures. J. Royal Statistical Soc., (48):259--302, 1986.Google Scholar
- J. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical Report, University of Berkeley, ICSI-TR-97-021, 1997., 1997.Google Scholar
- Y. Boykov, O. Veksler, and R. Zabih. Fast Approximate Energy Minimization via Graph Cuts. International Conference on Computer Vision, September 1999.Google ScholarCross Ref
- R. Brittaine and N. Lutaladio. Jatropha: A samllholder bioenergy crop. the potential for pro-poor development. Integrated Crop Management, 8:1--114, 2010.Google Scholar
- G. Capps, O. Franzese, B. Knee, M. Lascurain, and P. Otaduy. Class-8 heavy truck duty cycle project final report. ORNL/TM-2008/122, 2008.Google Scholar
- M. Celik, B. Kazar, S. Shekhar, D. Boley, and D. Lilja. Northstar: A parameter estimation method for the spatial autoregression model. AHPCRC Technical Report No: 2005-001, 2007.Google ScholarCross Ref
- V. Chandola and R. R. Vatsavai. Scalable time series change detection for biomass monitoring using gaussian process. In NASA Conference on Intelligent Data Understanding (CIDU), pages 69--82, 2010.Google Scholar
- V. Chandola and R. R. Vatsavai. A gaussian process based online change detection algorithm for monitoring periodic time series. In SIAM Data Mining (SDM), 2011.Google ScholarCross Ref
- V. Chandola and R. R. Vatsavai. A scalable gaussian process analysis algorithm for biomass monitoring. Statistical Analysis and Data Mining, 4(4):430--445, 2011. Google ScholarDigital Library
- S. Chatterjee, K. Steinhaeuser, A. Banerjee, S. Chatterjee, and A. R. Ganguly. Sparse group lasso: Consistency and climate applications. In SDM, pages 47--58, 2012.Google ScholarCross Ref
- P. Chou, P. Cooper, M. J. Swain, C. Brown, and L. Wixson. Probabilistic network inference for cooperative high and low levell vision. In In Markov Random Field, Theory and Applicaitons. Academic Press, New York, 1993.Google Scholar
- N. Cressie. Statistics for Spatial Data (Revised Edition). Wiley, New York, 1993.Google Scholar
- A. Crooks, A. Croitoru, A. Stefanidis, and J. Radzikowski. Earthquake: Twitter as a distributed sensor system. Transactions in GIS (in press), 0(0), 2012.Google Scholar
- D. Das, E. Kodra, Z. Obradovic, and A. R. Ganguly. Mining extremes: Severe rainfall and climate change. In ECAI, pages 899--900, 2012.Google Scholar
- H. Derin and H. Elliott. Modeling and segmentation of noisy and textured images using Gibbs random fields. IEEE Transaction on Pattern Analysis and Machine Intelligence, (9):39--55, 1987. Google ScholarDigital Library
- C. Docan, M. Parashar, and S. Klasky. Dataspaces: an interaction and coordination framework for coupled simulation workflows. In HPDC, pages 25--36, 2010. Google ScholarDigital Library
- S. Fritz, I. McCallum, C. Schill, C. Perger, R. Grillmayer, F. Achard, F. Kraxner, and M. Obersteiner. Geo-wiki.org: The use of crowdsourcing to improve global land cover. Remote Sensing, 1(3):345--354, 2009.Google ScholarCross Ref
- A. R. Ganguly and K. Steinhaeuser. Data mining for climate change and impacts. In ICDM Workshops, pages 385--394, 2008. Google ScholarDigital Library
- A. R. Ganguly, K. Steinhaeuser, D. J. Erickson, M. Branstetter, E. S. Parish, N. Singh, J. B. Drake, and L. Buja. Higher trends but larger uncertainty and geographic variability in 21st century temperature and heat waves. Proceedings of the National Academy of Sciences, 106(37):15555--15559, 2009.Google ScholarCross Ref
- S. Geman and D. Geman. Stochastic relaxation, gibbs distributions and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, (6):721--741, 1984. Google ScholarDigital Library
- M. F. Goodchild. Citizens as Sensors: The World of Volunteered Geography, pages 370--378. John Wiley and Sons, Ltd, 2011.Google Scholar
- J. Graesser, A. Cheriyadat, R. R. Vatsavai, V. Chandola, J. Long, and E. Bright. Image based characterization of formal and informal neighborhoods in an urban landscape. Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of, 5(4):1164--1176, August 2012.Google Scholar
- A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD '07, pages 56--65, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- Y. Jhung and P. H. Swain. Bayesian Contextual Classification Based on Modified M-Estimates and Markov Random Fields. IEEE Transaction on Pattern Analysis and Machine Intelligence, 34(1):67--75, 1996.Google Scholar
- G. Jun, R. R. Vatsavai, and J. Ghosh. Spatially adaptive classification and active learning of multispectral data with gaussian processes. In ICDM Workshops: Spatial and Spatiotemporal Data Mining (SSTDM), pages 597--603, 2009. Google ScholarDigital Library
- C. O. Justice, E. Vermote, J. R. Townshend, R. Defries, D. P. Roy, D. K. Hall, V. V. Salomonson, J. L. Privette, G. Riggs, A. Strahler, W. Lucht, R. B. Myneni, Y. Knyazikhin, S. W. Running, S. W. Steve W. Nemani, Z. Wan, A. R. Huete, W. van Leeuwen, R. E. Wolfe, L. Giglio, J.-P. Muller, P. Lewis, and M. J. Barnsley. The moderate resolution imagin spectrradiometer (modis): Land remote sensing for global chang research. IEEE Transactions on Geosciences and Remote Sensing, 36:1228--1249, 1998.Google ScholarCross Ref
- S.-C. Kao and A. R. Ganguly. Intensity, duration, and frequency of precipitation extremes under 21st-century warming scenarios. J. Geophys. Res., 116(D16119), 2011.Google ScholarCross Ref
- H. Kargupta, J. Gama, and W. Fan. The next generation of transportation systems, greenhouse emissions, and data mining. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '10, pages 1209--1212, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- H. Kargupta, V. Puttagunta, M. Klein, and K. Sarkar. On-board vehicle data stream monitoring using mine-fleet and fast resource constrained monitoring of correlation matrices. New Gen. Comput., 25(1):5--32, Jan. 2007. Google ScholarDigital Library
- J. Kawale, S. Liess, A. Kumar, M. Steinbach, A. Ganguly, N. Samatova, F. Semazzi, P. Snyder, and V. Kumar. Data-guided discovery of climate dipoles in observations and models. In NASA Conference on Intelligent Data Understanding (CIDU), pages 1--15, 2011.Google Scholar
- B. Kazar, S. Shekhar, D. Lilja, R. Vatsavai, and R. Pace. Comparing exact and approximate spatial auto-regression model solutions for spatial data analysis. In Third International Conference on Geographic Information Science (GIScience2004). LNCS, Springer, October 2004.Google ScholarCross Ref
- S. Khan, A. Ganguly, S. Bandyopadhyay, S. Saigal, D. Erickson, V. Protopopescu, and G. Ostrouchov. Non-linear statistics reveals stronger ties between enso and the tropical hydrological cycle. Geophysical Research Letters, 33(L24402):6, 2006.Google ScholarCross Ref
- S. Klasky and et. al. In situ data processing for extreme-scale computing. In SicDAC, page 16, 2011.Google Scholar
- J. LeSage. Bayesian estimation of spatial autoregressive models. International Regional Science Review, (20):113--129, 1997.Google ScholarCross Ref
- J. LeSage. Regression Analysis of Spatial data. The Journal of Regional Analysis and Policy (Publisher: Mid-Continent Regional Science Association and UNL College of Business Administration), 27(2):83--94, 1997.Google Scholar
- J. P. LeSage and R. Pace. Spatial dependence in data mining. In Geographic Data Mining and Knowledge Discovery. Taylor and Francis, forthcoming, 2001.Google Scholar
- S. Z. Li. Markov random field modeling in image analysis. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2001. Google ScholarDigital Library
- J. Lovell. Left-hand-turn elimination. New York Times, http://goo.gl/3bkPb, December 9, 2007.Google Scholar
- C. Ma. Spatial autoregression and related spatio-temporal models. J. Multivarate Analysis, 88(1):152--162, 2004. Google ScholarDigital Library
- J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers. Big data: The next frontier for innovation, competition and productivity. McKinsey Global Institute, 2011.Google Scholar
- G. J. McLachlan and K. E. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988.Google Scholar
- V. Norris, M. McCahill, and D. Wood. Editorial: The growth of CCTV: a global perspective on the international diffusion of video surveillance in publicly accessible space. Surveillance and Society, 2(2/3):110--135, 2004.Google Scholar
- J. T. Overpeck, G. A. Meehl, S. Bony, and D. R. Easterling. Climate data challenges in the 21st century. Science, 331(6018):700--702, 2011.Google ScholarCross Ref
- R. Pace and R. Barry. Quick Computation of Regressions with a Spatially Autoregressive Dependent Variable. Geographic Analysis, 1997.Google Scholar
- R. Pace and R. Barry. Sparse spatial autoregressions. Statistics and Probability Letters (Publisher: Elsevier Science), (33):291--297, 1997.Google Scholar
- C. Rasmussen and C. Williams. Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, 2006. Google ScholarDigital Library
- D. M. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: idioms, political hashtags and complex contagion on twitter. In Proceedings of the 20th international conference on World wide web, pages 695--704, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- F. Sahito, A. Latif, and W. Slany. Weaving twitter stream into linked data a proof of concept framework. In 7th International Conference on Emerging Technologies (ICET), pages 1--6, 2011.Google ScholarCross Ref
- S. Shekhar, V. Gunturi, M. R. Evans, and K. Yang. Spatial big-data challenges intersecting mobility and cloud computing. In Proceedings of the Eleventh ACM International Workshop on Data Engineering for Wireless and Mobile Access, MobiDE '12, pages 1--6, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- S. Shekhar, P. Schrater, R. Vatsavai, W. Wu, and S. Chawla. Spatial contextual classification and prediction models for mining geospatial data. IEEE Transaction on Multimedia, 4(2):174--188, 2002. Google ScholarDigital Library
- A. H. Solberg, T. Taxt, and A. K. Jain. A Markov Random Field Model for Classification of Multisource Satellite Imagery. IEEE Transaction on Geoscience and Remote Sensing, 34(1):100--113, 1996.Google ScholarCross Ref
- A. Stefanidis, A. Crooks, and J. Radzikowski. Harvesting ambient geospatial information from social media feeds. GeoJournal, pages 1--20, 2011.Google Scholar
- K. Steinhaeuser, A. Ganguly, and N. Chawla. Multivariate and multiscale dependence in the global climate system revealed through complex networks. Climate Dynamics, 39:889--895, 2012.Google ScholarCross Ref
- R. R. Vatsavai. Biomon: a google earth based continuous biomass monitoring system. In 17th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems (ACM-GIS), pages 536--537, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- R. R. Vatsavai. Stpminer: a highperformance spatiotemporal pattern mining toolbox. In Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities, PDAC '11, pages 29--34, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- R. R. Vatsavai, A. Cheriyadat, and S. S. Gleason. Unsupervised semantic labeling framework for identification of complex facilities in high-resolution remote sensing images. In ICDM Workshops, pages 273--280, 2010. Google ScholarDigital Library
- R. R. Vatsavai, S. Shekhar, and T. E. Burk. An efficient spatial semi-supervised learning algorithm. International Journal of Parallel, Emergent and Distributed Systems, 22(6):427--437, 2007. Google ScholarDigital Library
- C. E. Warrender and M. F. Augusteijn. Fusion of image classifications using Bayesian techniques with Markov rand fields. International Journal of Remote Sensing, 20(10):1987--2002, 1999.Google ScholarCross Ref
- N. Wayant, A. Crooks, A. Stefanidis, A. Croitoru, J. Radzikowski, J. Stahl, and J. Shine. Spatiotemporal clustering of twitter feeds for activity summarization. In GIScience (short paper), 2012.Google Scholar
Index Terms
- Spatiotemporal data mining in the era of big spatial data: algorithms and applications
Recommendations
Mining Big Data
ICEIS 2015: Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1Nowadays, the daily amount of generated data is measured in exabytes. Such huge data is now referred to as Big Data. Big data mining leads to the discovery of the useful information from huge data repositories. However, this huge amount of data hinders ...
From Big Data to Big Data Mining: Challenges, Issues, and Opportunities
Proceedings of the 18th International Conference on Database Systems for Advanced Applications - Volume 7827While "big data" has become a highlighted buzzword since last year, "big data mining", i.e., mining from big data, has almost immediately followed up as an emerging, interrelated research area. This paper provides an overview of big data mining and ...
A Brief Survey on Big Data in Healthcare
This article presents a brief introduction to big data and big data analytics and also their roles in the healthcare system. A definite range of scientific researches about big data analytics in the healthcare system have been reviewed. The definition ...
Comments