Skip to main content
Erschienen in: Knowledge and Information Systems 2/2013

01.11.2013 | Regular Paper

An incremental algorithm for clustering spatial data streams: exploring temporal locality

verfasst von: Ling-Yin Wei, Wen-Chih Peng

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering sensor data discovers useful information hidden in sensor networks. In sensor networks, a sensor has two types of attributes: a geographic attribute (i.e, its spatial location) and non-geographic attributes (e.g., sensed readings). Sensor data are periodically collected and viewed as spatial data streams, where a spatial data stream consists of a sequence of data points exhibiting attributes in both the geographic and non-geographic domains. Previous studies have developed a dual clustering problem for spatial data by considering similarity-connected relationships in both geographic and non-geographic domains. However, the clustering processes in stream environments are time-sensitive because of frequently updated sensor data. For sensor data, the readings from one sensor are similar for a period, and the readings refer to temporal locality features. Using the temporal locality features of the sensor data, this study proposes an incremental clustering (IC) algorithm to discover clusters efficiently. The IC algorithm comprises two phases: cluster prediction and cluster refinement. The first phase estimates the probability of two sensors belonging to a cluster from the previous clustering results. According to the estimation, a coarse clustering result is derived. The cluster refinement phase then refines the coarse result. This study evaluates the performance of the IC algorithm using synthetic and real datasets. Experimental results show that the IC algorithm outperforms exiting approaches confirming the scalability of the IC algorithm. In addition, the effect of temporal locality features on the IC algorithm is analyzed and thoroughly examined in the experiments.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases (VLDB), pp 81–92 Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases (VLDB), pp 81–92
2.
Zurück zum Zitat Aggarwal CC, Yu PS (2010) On clustering massive text and categorical data streams. Knowl Inf Syst 24(2):171–196CrossRef Aggarwal CC, Yu PS (2010) On clustering massive text and categorical data streams. Knowl Inf Syst 24(2):171–196CrossRef
3.
Zurück zum Zitat Aghabozorgi SR, Saybani MR, Wah TY (2012) Incremental clustering of time-series by fuzzy clustering. J Inf Sci Eng (JISE) 28(4):671–688MathSciNet Aghabozorgi SR, Saybani MR, Wah TY (2012) Incremental clustering of time-series by fuzzy clustering. J Inf Sci Eng (JISE) 28(4):671–688MathSciNet
4.
Zurück zum Zitat Alïtelhadj A, Boughanem M, Mezghiche M, Souam F (2012) Using structural similarity for clustering xml documents. Knowl Inf Syst (KAIS) 32(1):109–139CrossRef Alïtelhadj A, Boughanem M, Mezghiche M, Souam F (2012) Using structural similarity for clustering xml documents. Knowl Inf Syst (KAIS) 32(1):109–139CrossRef
5.
Zurück zum Zitat Bagnall AJ, Janacek GJ (2004) Clustering time series from arma models with clipped data. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 49–58 Bagnall AJ, Janacek GJ (2004) Clustering time series from arma models with clipped data. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 49–58
6.
Zurück zum Zitat Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395MathSciNetCrossRef Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395MathSciNetCrossRef
7.
Zurück zum Zitat Beringer J, HLullermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng (DKE) 58(2):180–204CrossRef Beringer J, HLullermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng (DKE) 58(2):180–204CrossRef
8.
Zurück zum Zitat Borovkova S, Permana FJ (2004) Modelling electricity prices by the potential jumpdiffusion. In: Proceedings of the autumn school and international conference on stochastic finance (StochFin), pp 239–263 Borovkova S, Permana FJ (2004) Modelling electricity prices by the potential jumpdiffusion. In: Proceedings of the autumn school and international conference on stochastic finance (StochFin), pp 239–263
9.
Zurück zum Zitat Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the sixth SIAM international conference on data mining (SDM), pp 326–337 Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the sixth SIAM international conference on data mining (SDM), pp 326–337
10.
Zurück zum Zitat Cartea A, Figueroa MG (2005) Pricing in electricity markets a mean reverting jump diffusion model with seasonality. Appl Math Finance 12(4):313–335CrossRefMATH Cartea A, Figueroa MG (2005) Pricing in electricity markets a mean reverting jump diffusion model with seasonality. Appl Math Finance 12(4):313–335CrossRefMATH
11.
Zurück zum Zitat Costa G, Manco G, Ortale R (2010) Density-based clustering of data streams at multiple resolutions. Data Min Knowl Discov (DMKD) 20(1):152–187MathSciNetCrossRef Costa G, Manco G, Ortale R (2010) Density-based clustering of data streams at multiple resolutions. Data Min Knowl Discov (DMKD) 20(1):152–187MathSciNetCrossRef
12.
Zurück zum Zitat Dai BR, Lin CR, Chen MS (2007) Constrained data clustering by depth control and progressive constraint relaxation. Int J Very Large Data Bases 16(2):201–217CrossRef Dai BR, Lin CR, Chen MS (2007) Constrained data clustering by depth control and progressive constraint relaxation. Int J Very Large Data Bases 16(2):201–217CrossRef
13.
Zurück zum Zitat Davidson I, Ester M, Ravi SS (2007) Efficient incremental constrained clustering. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining (SIGKDD), pp 240–249 Davidson I, Ester M, Ravi SS (2007) Efficient incremental constrained clustering. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining (SIGKDD), pp 240–249
14.
Zurück zum Zitat Davidson I, Ravi SS (2005) Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proceedings of the fifth SIAM international conference on data mining (SDM), pp 138–149 Davidson I, Ravi SS (2005) Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proceedings of the fifth SIAM international conference on data mining (SDM), pp 138–149
15.
Zurück zum Zitat Ester M, Kriegel HP, Sander J, Wimmer M, Xu X (1998) Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24th international conference on very large data bases (VLDB), pp 323–333 Ester M, Kriegel HP, Sander J, Wimmer M, Xu X (1998) Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24th international conference on very large data bases (VLDB), pp 323–333
16.
Zurück zum Zitat Ge R, Ester M, Gao BJ, Hu Z, Bhattacharya B, Ben-Moshe B (2008) Joint cluster analysis of attribute data and relationship data: the connected k-center problem, algorithms and applications. ACM Trans Knowl Discov Data 2(2):7:1–7:35CrossRef Ge R, Ester M, Gao BJ, Hu Z, Bhattacharya B, Ben-Moshe B (2008) Joint cluster analysis of attribute data and relationship data: the connected k-center problem, algorithms and applications. ACM Trans Knowl Discov Data 2(2):7:1–7:35CrossRef
17.
Zurück zum Zitat Ge R, Ester M, Jin W, Davidson I (2007) Constraint-driven clustering. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining(SIGKDD), pp 320–329 Ge R, Ester M, Jin W, Davidson I (2007) Constraint-driven clustering. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining(SIGKDD), pp 320–329
18.
Zurück zum Zitat Guha S, Meyerson A, Mishra N, Motwani R, OCallaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng (TKDE) 15(3):515–528CrossRef Guha S, Meyerson A, Mishra N, Motwani R, OCallaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng (TKDE) 15(3):515–528CrossRef
19.
Zurück zum Zitat Guo L, Ai C, Wang X, Cai Z, Li Y (2009) Real time clustering of sensory data in wireless sensor networks. In: Proceedings of the 28th IEEE international performance computing and communications conference (IPCCC), pp 33–40 Guo L, Ai C, Wang X, Cai Z, Li Y (2009) Real time clustering of sensory data in wireless sensor networks. In: Proceedings of the 28th IEEE international performance computing and communications conference (IPCCC), pp 33–40
20.
Zurück zum Zitat Halkidi M, Spiliopoulou M, Pavlou A (2012) A semi-supervised incremental clustering algorithm for streaming data. In: Proceedings of the 16th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD), pp 578–590 Halkidi M, Spiliopoulou M, Pavlou A (2012) A semi-supervised incremental clustering algorithm for streaming data. In: Proceedings of the 16th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD), pp 578–590
21.
Zurück zum Zitat Han J, Kamber M (2000) Data mining: concepts and techniques. Morgan Kaufmann, New York Han J, Kamber M (2000) Data mining: concepts and techniques. Morgan Kaufmann, New York
22.
Zurück zum Zitat Huang J, Zhang J (2010) Distributed dual cluster algorithm based on grid for sensor streams. Int J Digit Content Technol Appl (JDCTA) 4(9):225–233CrossRef Huang J, Zhang J (2010) Distributed dual cluster algorithm based on grid for sensor streams. Int J Digit Content Technol Appl (JDCTA) 4(9):225–233CrossRef
23.
Zurück zum Zitat Kavitha V, Punithavalli M (2010) Clustering time series data stream—a literature survey. ACM Trans Knowl Discov Data (IJCSIS) 8(1):289–294 Kavitha V, Punithavalli M (2010) Clustering time series data stream—a literature survey. ACM Trans Knowl Discov Data (IJCSIS) 8(1):289–294
24.
Zurück zum Zitat Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the 19th international conference on machine learning (ICML), pp 307–314 Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the 19th international conference on machine learning (ICML), pp 307–314
25.
Zurück zum Zitat Liao ZX, Peng WC (2012) Clustering spatial data with a geographic constraint: exploring local search. Knowl Inf Syst 31(1):153–170CrossRef Liao ZX, Peng WC (2012) Clustering spatial data with a geographic constraint: exploring local search. Knowl Inf Syst 31(1):153–170CrossRef
26.
Zurück zum Zitat Lin CR, Liu KH, Chen MS (2005) Dual clustering: integrating data clustering over optimization and constraint domains. IEEE Trans Knowl Data Eng 17(5):628–637CrossRef Lin CR, Liu KH, Chen MS (2005) Dual clustering: integrating data clustering over optimization and constraint domains. IEEE Trans Knowl Data Eng 17(5):628–637CrossRef
27.
Zurück zum Zitat Lin J, Vlachos M, Keogh EJ, Gunopulos D (2004) Iterative incremental clustering of time series. In: Proceedings of the ninth international conference on extending database technology (EDBT), pp 106V122 Lin J, Vlachos M, Keogh EJ, Gunopulos D (2004) Iterative incremental clustering of time series. In: Proceedings of the ninth international conference on extending database technology (EDBT), pp 106V122
28.
Zurück zum Zitat Lühr S, Lazarescu M (2009) Incremental clustering of dynamic data streams using connectivity based representative points. Data Knowl Eng (DKE) 68(1):1–27CrossRef Lühr S, Lazarescu M (2009) Incremental clustering of dynamic data streams using connectivity based representative points. Data Knowl Eng (DKE) 68(1):1–27CrossRef
29.
Zurück zum Zitat Mirkin B (2011) Clustering for data mining: a data recovery approach. Taylor and Francis, London Mirkin B (2011) Clustering for data mining: a data recovery approach. Taylor and Francis, London
30.
Zurück zum Zitat O’Callaghan L, Meyerson A, Motwani R, Mishra N, Guha S (2002) Streaming-data algorithms for high-quality clustering. In: Proceedings of the 18th IEEE international conference on data engineering (ICDE), pp 685–694 O’Callaghan L, Meyerson A, Motwani R, Mishra N, Guha S (2002) Streaming-data algorithms for high-quality clustering. In: Proceedings of the 18th IEEE international conference on data engineering (ICDE), pp 685–694
32.
33.
Zurück zum Zitat Rodrigues PP, Gama J, Lopes L (2008) Clustering distributed sensor data streams. In: Proceedings of the European conference on machine learning and knowledge discovery in databases—part II (ECML PKDD), pp 282–297 Rodrigues PP, Gama J, Lopes L (2008) Clustering distributed sensor data streams. In: Proceedings of the European conference on machine learning and knowledge discovery in databases—part II (ECML PKDD), pp 282–297
34.
Zurück zum Zitat Rodrigues PP, Gama J, Pedroso JP (2008) Hierarchical clustering of time series data streams. IEEE Trans Knowl Data Eng (TKDE) 20(5):615–627 Rodrigues PP, Gama J, Pedroso JP (2008) Hierarchical clustering of time series data streams. IEEE Trans Knowl Data Eng (TKDE) 20(5):615–627
35.
Zurück zum Zitat Shi YB, Yuan CA, Huang Y, Wen YG (2010) A method of spatial clustering based on the combination of the spatial coordinate and attributes. In: Proceedings of the sixth international conference on information systems security (ICISS), pp 526–529 Shi YB, Yuan CA, Huang Y, Wen YG (2010) A method of spatial clustering based on the combination of the spatial coordinate and attributes. In: Proceedings of the sixth international conference on information systems security (ICISS), pp 526–529
37.
Zurück zum Zitat Tai CH, Dai BR, Chen MS (2007) Incremental clustering in geography and optimization spaces. In: Proceedings of the 11th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 272–283 Tai CH, Dai BR, Chen MS (2007) Incremental clustering in geography and optimization spaces. In: Proceedings of the 11th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 272–283
39.
Zurück zum Zitat Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Addison-Wesley, New York Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Addison-Wesley, New York
40.
Zurück zum Zitat Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the 17th international conference on machine learning (ICML), pp 1103–1110 Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the 17th international conference on machine learning (ICML), pp 1103–1110
41.
Zurück zum Zitat Wan L, Ng WK, Dang XH, Yu PS, Zhang K (2009) Density-based clustering of data streams at multiple resolutions. ACM Trans Knowl Discov Data (TKDD) 3(3):1–28CrossRef Wan L, Ng WK, Dang XH, Yu PS, Zhang K (2009) Density-based clustering of data streams at multiple resolutions. ACM Trans Knowl Discov Data (TKDD) 3(3):1–28CrossRef
42.
Zurück zum Zitat Wang JW, Cheng CH (2007) An efficient method for estimating null values in relational databases. Knowl Inf Syst 12(3):379–394CrossRef Wang JW, Cheng CH (2007) An efficient method for estimating null values in relational databases. Knowl Inf Syst 12(3):379–394CrossRef
43.
Zurück zum Zitat Wei LY, Peng WC (2009) Clustering data streams in optimization and geography domains. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 997–1005 Wei LY, Peng WC (2009) Clustering data streams in optimization and geography domains. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 997–1005
44.
Zurück zum Zitat Yang J (2003) Dynamic clustering of evolving streams with a single pass. In: Proceedings of the 19th IEEE international conference on data engineering (ICDE), pp 695–697 Yang J (2003) Dynamic clustering of evolving streams with a single pass. In: Proceedings of the 19th IEEE international conference on data engineering (ICDE), pp 695–697
45.
Zurück zum Zitat Zhou J, Guan J, Li P (2007c) Dcad: a dual clustering algorithm for distributed spatial databases. Geo-Spatial Inf Sci 10(2):137–144CrossRef Zhou J, Guan J, Li P (2007c) Dcad: a dual clustering algorithm for distributed spatial databases. Geo-Spatial Inf Sci 10(2):137–144CrossRef
46.
Zurück zum Zitat Zhou A, Cao F, Yan Y, Sha C, He X (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 133–142 Zhou A, Cao F, Yan Y, Sha C, He X (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 133–142
47.
Zurück zum Zitat Zhou A, Cao F, Yan Y, Sha C, He X (2007) Distributed data stream clustering: a fast em-based approach. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 736–745 Zhou A, Cao F, Yan Y, Sha C, He X (2007) Distributed data stream clustering: a fast em-based approach. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 736–745
Metadaten
Titel
An incremental algorithm for clustering spatial data streams: exploring temporal locality
verfasst von
Ling-Yin Wei
Wen-Chih Peng
Publikationsdatum
01.11.2013
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2013
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-013-0636-8

Weitere Artikel der Ausgabe 2/2013

Knowledge and Information Systems 2/2013 Zur Ausgabe

Premium Partner