Skip to main content
Top
Published in: Journal of Geographical Systems 4/2015

01-10-2015 | Original Article

Optimizing distance-based methods for large data sets

Authors: Tobias Scholl, Thomas Brenner

Published in: Journal of Geographical Systems | Issue 4/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Distance-based methods for measuring spatial concentration of industries have received an increasing popularity in the spatial econometrics community. However, a limiting factor for using these methods is their computational complexity since both their memory requirements and running times are in \({\mathcal {O}}(n^2)\). In this paper, we present an algorithm with constant memory requirements and shorter running time, enabling distance-based methods to deal with large data sets. We discuss three recent distance-based methods in spatial econometrics: the D&O-Index by Duranton and Overman (Rev Econ Stud 72(4):1077–1106, 2005), the M-function by Marcon and Puech (J Econ Geogr 10(5):745–762, 2010) and the Cluster-Index by Scholl and Brenner (Reg Stud (ahead-of-print):1–15, 2014). Finally, we present an alternative calculation for the latter index that allows the use of data sets with millions of firms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Note, that Eq. (2) is modified for the observation of an unmarked point pattern. See Marcon and Puech (2010, p. 749) for the original formula of the M-function.
 
2
Note, that the latest version of dbmss already includes computational improvements. See Sect. 4 for more information.
 
Literature
go back to reference Baddeley A, Møller J, Waagepetersen RP (2000) Non- and semi-parametric estimation of interaction in inhomogeneous point patterns. Stat Ned 3(54):329–350CrossRef Baddeley A, Møller J, Waagepetersen RP (2000) Non- and semi-parametric estimation of interaction in inhomogeneous point patterns. Stat Ned 3(54):329–350CrossRef
go back to reference Barlet M, Briant A, Crusson L (2013) Location patterns of service industries in France: a distance-based approach. Reg Sci Urban Econ 43(2):338–351CrossRef Barlet M, Briant A, Crusson L (2013) Location patterns of service industries in France: a distance-based approach. Reg Sci Urban Econ 43(2):338–351CrossRef
go back to reference Duque JC, Aldstadt J, Velasquez E, Franco JL, Betancourt A (2011) A computationally efficient method for delineating irregularly shaped spatial clusters. J Geogr Syst 13(4):355–372CrossRef Duque JC, Aldstadt J, Velasquez E, Franco JL, Betancourt A (2011) A computationally efficient method for delineating irregularly shaped spatial clusters. J Geogr Syst 13(4):355–372CrossRef
go back to reference Ellison G, Glaeser EL, Kerr WR (2010) What causes industry agglomeration? Evidence from coagglomeration patterns. Am Econ Rev 100(3):1195–1213CrossRef Ellison G, Glaeser EL, Kerr WR (2010) What causes industry agglomeration? Evidence from coagglomeration patterns. Am Econ Rev 100(3):1195–1213CrossRef
go back to reference Espa G, Arbia G, Giuliani D et al (2010) Measuring industrial agglomeration with inhomogeneous k-function: the case of ict firms in milan (Italy). Artículo de trabajo 14:1–11 Espa G, Arbia G, Giuliani D et al (2010) Measuring industrial agglomeration with inhomogeneous k-function: the case of ict firms in milan (Italy). Artículo de trabajo 14:1–11
go back to reference German Federal Ministry of Economics and Technology (2010) Möglichkeiten und Grenzen einer Verbesserung der Wettbewerbssituation der Automobilindustrie durch Abbau von branchenspezifischen Kosten aus Informationspflichten. BMBF, Stuttgart German Federal Ministry of Economics and Technology (2010) Möglichkeiten und Grenzen einer Verbesserung der Wettbewerbssituation der Automobilindustrie durch Abbau von branchenspezifischen Kosten aus Informationspflichten. BMBF, Stuttgart
go back to reference Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics. Geographical analysis 24(3):189–206CrossRef Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics. Geographical analysis 24(3):189–206CrossRef
go back to reference Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318CrossRef Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318CrossRef
go back to reference Koh HJ, Riedel N (2014) Assessing the localization pattern of German manufacturing and service industries: a distance-based approach. Reg Stud 48(5):823–843CrossRef Koh HJ, Riedel N (2014) Assessing the localization pattern of German manufacturing and service industries: a distance-based approach. Reg Stud 48(5):823–843CrossRef
go back to reference Kosfeld R, Eckey HF, Lauridsen J (2011) Spatial point pattern analysis and industry concentration. Ann Reg Sci 47(2):311–328CrossRef Kosfeld R, Eckey HF, Lauridsen J (2011) Spatial point pattern analysis and industry concentration. Ann Reg Sci 47(2):311–328CrossRef
go back to reference Marcon E, Traissac S, Lang G (2013) A statistical test for ripleys function rejection of poisson null hypothesis. Int Sch Res Not 1:1–9 Marcon E, Traissac S, Lang G (2013) A statistical test for ripleys function rejection of poisson null hypothesis. Int Sch Res Not 1:1–9
go back to reference Marcon E, Puech F (2003) Evaluating the geographic concentration of industries using distance-based methods. J Econ Geogr 3(4):409–428CrossRef Marcon E, Puech F (2003) Evaluating the geographic concentration of industries using distance-based methods. J Econ Geogr 3(4):409–428CrossRef
go back to reference Marcon E, Puech F (2010) Measures of the geographic concentration of industries: improving distance-based methods. J Econ Geogr 10(5):745–762CrossRef Marcon E, Puech F (2010) Measures of the geographic concentration of industries: improving distance-based methods. J Econ Geogr 10(5):745–762CrossRef
go back to reference Miller HJ (2010) The data avalanche is here. Shouldn’t we be digging? J Reg Sci 50(1):181–201CrossRef Miller HJ (2010) The data avalanche is here. Shouldn’t we be digging? J Reg Sci 50(1):181–201CrossRef
go back to reference Neill DB, Moore AW (2004) Rapid detection of significant spatial clusters. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 256–265 Neill DB, Moore AW (2004) Rapid detection of significant spatial clusters. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 256–265
go back to reference Openshaw S (1984) The modifiable areal unit problem. Institute of British Geographers, Norwich Openshaw S (1984) The modifiable areal unit problem. Institute of British Geographers, Norwich
go back to reference Ripley BD (2005) Spatial statistics. Wiley, New Jersey Ripley BD (2005) Spatial statistics. Wiley, New Jersey
go back to reference Sankaranarayanan J, Samet H, Varshney A (2007) A fast all nearest neighbor algorithm for applications involving large point-clouds. Comput Graph 31(2):157–174CrossRef Sankaranarayanan J, Samet H, Varshney A (2007) A fast all nearest neighbor algorithm for applications involving large point-clouds. Comput Graph 31(2):157–174CrossRef
go back to reference Vitali S, Napoletano M, Fagiolo G (2013) Spatial localization in manufacturing: a cross-country analysis. Reg Stud 47(9):1534–1554CrossRef Vitali S, Napoletano M, Fagiolo G (2013) Spatial localization in manufacturing: a cross-country analysis. Reg Stud 47(9):1534–1554CrossRef
Metadata
Title
Optimizing distance-based methods for large data sets
Authors
Tobias Scholl
Thomas Brenner
Publication date
01-10-2015
Publisher
Springer Berlin Heidelberg
Published in
Journal of Geographical Systems / Issue 4/2015
Print ISSN: 1435-5930
Electronic ISSN: 1435-5949
DOI
https://doi.org/10.1007/s10109-015-0219-1

Premium Partner