Skip to main content

2014 | OriginalPaper | Buchkapitel

Fast SNN-Based Clustering Approach for Large Geospatial Data Sets

verfasst von : Arménio Antunes, Maribel Yasmina Santos, Adriano Moreira

Erschienen in: Connecting a Digital Europe Through Location and Place

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Current positioning and sensing technologies enable the collection of very large spatio-temporal data sets. When analysing movement data, researchers often resort to clustering techniques to extract useful patterns from these data. Density-based clustering algorithms, although being very adequate to the analysis of this type of data, can be very inefficient when analysing huge amounts of data. The Shared Nearest Neighbour (SNN) algorithm presents low efficiency when dealing with high quantities of data due to its complexity evaluated in the worst case by O(n\(^{2}\)). This chapter presents a clustering method, based on the SNN algorithm that significantly reduces the processing time by segmenting the spatial dimension of the data into a set of cells, and by minimizing the number of cells that have to be visited while searching for the k-nearest neighbours of each vector. The obtained results show an expressive reduction of the time needed to find the k-nearest neighbours and to compute the clusters, while producing results that are equal to those produced by the original SNN algorithm. Experimental results obtained with three different data sets (2D and 3D), one synthetic and two real, show that the proposed method enables the analysis of much larger data sets within reasonable amount of time.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRef Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRef
Zurück zum Zitat Bhavsar HB, Jivani A G (2009) The shared nearest neighbor algorithm with enclosures (SNNAE). In: 2009 WRI world congress on computer science and information engineering. doi:10.1109/CSIE.2009.997 Bhavsar HB, Jivani A G (2009) The shared nearest neighbor algorithm with enclosures (SNNAE). In: 2009 WRI world congress on computer science and information engineering. doi:10.​1109/​CSIE.​2009.​997
Zurück zum Zitat Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321CrossRef Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321CrossRef
Zurück zum Zitat Ertoz L, Steinbach M, Kumar V (2002) A new shared nearest neighbor clustering algorithm and its applications. In: Proceedings of the workshop on clustering high dimensional data and its applications at 2nd SIAM international conference on data mining, p 105–115 Ertoz L, Steinbach M, Kumar V (2002) A new shared nearest neighbor clustering algorithm and its applications. In: Proceedings of the workshop on clustering high dimensional data and its applications at 2nd SIAM international conference on data mining, p 105–115
Zurück zum Zitat Ertoz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM international conference on data mining Ertoz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM international conference on data mining
Zurück zum Zitat Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining, KDD 1996, p 226–231 Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining, KDD 1996, p 226–231
Zurück zum Zitat Faustino B (2012) Implementation for spatial data of the shared nearest neighbour with metric data structures. M.Sc. thesis, New University of Lisbon Faustino B (2012) Implementation for spatial data of the shared nearest neighbour with metric data structures. M.Sc. thesis, New University of Lisbon
Zurück zum Zitat Li X, Jiang SY, Su XK (2009) A novel fast clustering algorithm. In: Proceedings of the international conference on artificial intelligence and computational intelligence, pp 284–288 Li X, Jiang SY, Su XK (2009) A novel fast clustering algorithm. In: Proceedings of the international conference on artificial intelligence and computational intelligence, pp 284–288
Zurück zum Zitat Moreira A, Santos MY, Wachowicz M, Orellana D (2010) The impact of data quality in the context of pedestrian movement analysis. In: Painho M, Santos MY, Pundt H (eds) Geospatial thinking. Springer, Berlin, pp 61–78 Moreira A, Santos MY, Wachowicz M, Orellana D (2010) The impact of data quality in the context of pedestrian movement analysis. In: Painho M, Santos MY, Pundt H (eds) Geospatial thinking. Springer, Berlin, pp 61–78
Zurück zum Zitat Moreira G, Santos MY, Moura-Pires J (2013) SNN Input Parameters: how are they related? In: Proceedings of the 19th IEEE international conference on parallel and distributed systems (ICPADS’2013). Seoul, Korea, pp 15–18 (IEEE computer society, December) Moreira G, Santos MY, Moura-Pires J (2013) SNN Input Parameters: how are they related? In: Proceedings of the 19th IEEE international conference on parallel and distributed systems (ICPADS’2013). Seoul, Korea, pp 15–18 (IEEE computer society, December)
Zurück zum Zitat Traina C, Traina A, Faloutsos C, Seeger B (2002) Fast indexing and visualization of metric data sets using slim-trees. IEEE Trans Knowl Data Eng 14(2):244–260CrossRef Traina C, Traina A, Faloutsos C, Seeger B (2002) Fast indexing and visualization of metric data sets using slim-trees. IEEE Trans Knowl Data Eng 14(2):244–260CrossRef
Zurück zum Zitat Tripathy A, Maji SK, Patra PK (2011) FDCA: A fast density based clustering algorithm for spatial database system. In: Proceedings of the 2nd international conference on computer and communication technology, pp 21–26 Tripathy A, Maji SK, Patra PK (2011) FDCA: A fast density based clustering algorithm for spatial database system. In: Proceedings of the 2nd international conference on computer and communication technology, pp 21–26
Metadaten
Titel
Fast SNN-Based Clustering Approach for Large Geospatial Data Sets
verfasst von
Arménio Antunes
Maribel Yasmina Santos
Adriano Moreira
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-319-03611-3_11