Skip to main content
Erschienen in: Business & Information Systems Engineering 3/2019

21.01.2019 | State of the Art

Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms

verfasst von: Matthias Carnein, Heike Trautmann

Erschienen in: Business & Information Systems Engineering | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Analyzing data streams has received considerable attention over the past decades due to the widespread usage of sensors, social media and other streaming data sources. A core research area in this field is stream clustering which aims to recognize patterns in an unordered, infinite and evolving stream of observations. Clustering can be a crucial support in decision making, since it aims for an optimized aggregated representation of a continuous data stream over time and allows to identify patterns in large and high-dimensional data. A multitude of algorithms and approaches has been developed that are able to find and maintain clusters over time in the challenging streaming scenario. This survey explores, summarizes and categorizes a total of 51 stream clustering algorithms and identifies core research threads over the past decades. In particular, it identifies categories of algorithms based on distance thresholds, density grids and statistical models as well as algorithms for high dimensional data. Furthermore, it discusses applications scenarios, available software and how to configure stream clustering algorithms. This survey is considerably more extensive than comparable studies, more up-to-date and highlights how concepts are interrelated and have been developed over time.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
Zurück zum Zitat Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) StreamKM++: a clustering algorithm for data streams. J Exp Algorithmics 17:2.4:2.1–2.4:2.30CrossRef Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) StreamKM++: a clustering algorithm for data streams. J Exp Algorithmics 17:2.4:2.1–2.4:2.30CrossRef
Zurück zum Zitat Aggarwal CC (2007) Data streams: models and algorithms, vol 31. Springer, BerlinCrossRef Aggarwal CC (2007) Data streams: models and algorithms, vol 31. Springer, BerlinCrossRef
Zurück zum Zitat Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, volume 29 of VLDB ’03, VLDB Endowment, Berlin, pp 81–92 Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, volume 29 of VLDB ’03, VLDB Endowment, Berlin, pp 81–92
Zurück zum Zitat Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the thirtieth international conference on very large data bases, volume 30 of VLDB ’04, VLDB Endowment, Toronto, pp 852–863 Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the thirtieth international conference on very large data bases, volume 30 of VLDB ’04, VLDB Endowment, Toronto, pp 852–863
Zurück zum Zitat Ali MH, Sundus A, Qaiser W, Ahmed Z, Halim Z (2011) Applicative implementation of D-stream clustering algorithm for the real-time data of telecom sector. In: International conference on computer networks and information technology, pp 293–297 Ali MH, Sundus A, Qaiser W, Ahmed Z, Halim Z (2011) Applicative implementation of D-stream clustering algorithm for the real-time data of telecom sector. In: International conference on computer networks and information technology, pp 293–297
Zurück zum Zitat Amini A, Wah TY (2011) Density micro-clustering algorithms on data streams: a review. In: Proceeding of the international multiconference of engineers and computer scientists (IMECS) Amini A, Wah TY (2011) Density micro-clustering algorithms on data streams: a review. In: Proceeding of the international multiconference of engineers and computer scientists (IMECS)
Zurück zum Zitat Amini A, Wah TY (2012) A comparative study of density-based clustering algorithms on data streams: micro-clustering approaches. Springer, US, Boston, pp 275–287 Amini A, Wah TY (2012) A comparative study of density-based clustering algorithms on data streams: micro-clustering approaches. Springer, US, Boston, pp 275–287
Zurück zum Zitat Amini A, Wah TY (2013) LeaDen-Stream: a leader density-based clustering algorithm over evolving data stream. J Comput Commun 01(05):26–31CrossRef Amini A, Wah TY (2013) LeaDen-Stream: a leader density-based clustering algorithm over evolving data stream. J Comput Commun 01(05):26–31CrossRef
Zurück zum Zitat Amini A, Wah TY, Saybani MR, Yazdi SRAS (2011) A study of density-grid based clustering algorithms on data streams. In: Eighth international conference on fuzzy systems and knowledge discovery (FSKD) 3:1652–1656 Amini A, Wah TY, Saybani MR, Yazdi SRAS (2011) A study of density-grid based clustering algorithms on data streams. In: Eighth international conference on fuzzy systems and knowledge discovery (FSKD) 3:1652–1656
Zurück zum Zitat Amini A, Wah TY, Teh YW (2012) DENGRIS-Stream: a density-grid based clustering algorithm for evolving data streams over sliding window. In: Proceedings of the international conference on data mining and computer engineering, pp 206–210 Amini A, Wah TY, Teh YW (2012) DENGRIS-Stream: a density-grid based clustering algorithm for evolving data streams over sliding window. In: Proceedings of the international conference on data mining and computer engineering, pp 206–210
Zurück zum Zitat Amini A, Saboohi H, Wah TY, Herawan T (2014a) A fast density-based clustering algorithm for real-time internet of things stream. Sci World J 2014:1–11CrossRef Amini A, Saboohi H, Wah TY, Herawan T (2014a) A fast density-based clustering algorithm for real-time internet of things stream. Sci World J 2014:1–11CrossRef
Zurück zum Zitat Amini A, Wah TY, Saboohi H (2014b) On density-based data streams clustering algorithms: a survey. J Comput Sci Technol 29(1):116–141CrossRef Amini A, Wah TY, Saboohi H (2014b) On density-based data streams clustering algorithms: a survey. J Comput Sci Technol 29(1):116–141CrossRef
Zurück zum Zitat Amini A, Saboohi H, Herawan T, Wah TY (2016) MuDi-Stream: a multi density clustering algorithm for evolving data stream. J Netw Comput Appl 59:370–385CrossRef Amini A, Saboohi H, Herawan T, Wah TY (2016) MuDi-Stream: a multi density clustering algorithm for evolving data stream. J Netw Comput Appl 59:370–385CrossRef
Zurück zum Zitat Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, SODA ’07, Society for Industrial and Applied Mathematics, New Orleans, pp 1027–1035 Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, SODA ’07, Society for Industrial and Applied Mathematics, New Orleans, pp 1027–1035
Zurück zum Zitat Barbará D, Chen P (2000) Using the fractal dimension to cluster datasets. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00, ACM, Boston, pp 260–264 Barbará D, Chen P (2000) Using the fractal dimension to cluster datasets. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00, ACM, Boston, pp 260–264
Zurück zum Zitat Barbará D, Chen P (2003) Using self-similarity to cluster large data sets. Data Min Knowl Discov 7(2):123–152CrossRef Barbará D, Chen P (2003) Using self-similarity to cluster large data sets. Data Min Knowl Discov 7(2):123–152CrossRef
Zurück zum Zitat Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2001) Support vector clustering. J Mach Learn Res 2:125–137 Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2001) Support vector clustering. J Mach Learn Res 2:125–137
Zurück zum Zitat Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? Springer, Berlin, pp 217–235 Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? Springer, Berlin, pp 217–235
Zurück zum Zitat Bhatnagar V, Kaur S (2007) Exclusive and complete clustering of streams. Springer, Berlin, pp 629–638 Bhatnagar V, Kaur S (2007) Exclusive and complete clustering of streams. Springer, Berlin, pp 629–638
Zurück zum Zitat Bhatnagar V, Kaur S, Chakravarthy S (2014) Clustering data streams using grid-based synopsis. Knowl Inf Syst 41(1):127–152CrossRef Bhatnagar V, Kaur S, Chakravarthy S (2014) Clustering data streams using grid-based synopsis. Knowl Inf Syst 41(1):127–152CrossRef
Zurück zum Zitat Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604 Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Zurück zum Zitat Bifet A, Gavaldà R, Holmes G, Pfahringer B (2018) Machine learning for data streams with practical examples in MOA. MIT Press, CambridgeCrossRef Bifet A, Gavaldà R, Holmes G, Pfahringer B (2018) Machine learning for data streams with practical examples in MOA. MIT Press, CambridgeCrossRef
Zurück zum Zitat Bohm C, Kailing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Proceedings of the fourth IEEE international conference on data mining, ICDM ’04, IEEE Computer Society, Washington, DC, pp 27–34 Bohm C, Kailing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Proceedings of the fourth IEEE international conference on data mining, ICDM ’04, IEEE Computer Society, Washington, DC, pp 27–34
Zurück zum Zitat Bolaños M, Forrest J, Hahsler M (2014) Clustering large datasets using data stream clustering techniques. In: Spiliopoulou M, Schmidt-Thieme L, Janning R (eds) Data analysis, machine learning and knowledge discovery, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 135–143 Bolaños M, Forrest J, Hahsler M (2014) Clustering large datasets using data stream clustering techniques. In: Spiliopoulou M, Schmidt-Thieme L, Janning R (eds) Data analysis, machine learning and knowledge discovery, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 135–143
Zurück zum Zitat Bradley PS, Fayyad U, Reina C (1998) Scaling clustering algorithms to large databases. In: Proceedings of the 4th international conference on knowledge discovery and data mining (KDD’98). AAAI Press, pp 9–15 Bradley PS, Fayyad U, Reina C (1998) Scaling clustering algorithms to large databases. In: Proceedings of the 4th international conference on knowledge discovery and data mining (KDD’98). AAAI Press, pp 9–15
Zurück zum Zitat Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Conference on data mining (SIAM ’06), pp 328–339 Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Conference on data mining (SIAM ’06), pp 328–339
Zurück zum Zitat Carnein M, Assenmacher D, Trautmann H (2017a) An empirical comparison of stream clustering algorithms. In: Proceedings of the ACM international conference on computing frontiers (CF ’17). ACM, pp 361–365 Carnein M, Assenmacher D, Trautmann H (2017a) An empirical comparison of stream clustering algorithms. In: Proceedings of the ACM international conference on computing frontiers (CF ’17). ACM, pp 361–365
Zurück zum Zitat Carnein M, Assenmacher D, Trautmann H (2017b) Stream clustering of chat messages with applications to twitch streams. In Proceedings of the 36th international conference on conceptual modeling (ER’17). Springer International Publishing, pp 79–88 Carnein M, Assenmacher D, Trautmann H (2017b) Stream clustering of chat messages with applications to twitch streams. In Proceedings of the 36th international conference on conceptual modeling (ER’17). Springer International Publishing, pp 79–88
Zurück zum Zitat Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’07, ACM, San Jose, pp 133–142 Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’07, ACM, San Jose, pp 133–142
Zurück zum Zitat Dang XH, Lee V, Ng WK, Ciptadi A, Ong KL (2009a) An EM-based algorithm for clustering data streams in sliding windows. In: Zhou X, Yokota H, Deng K, Liu Q (eds) Proceedings of the 14th international conference on database systems for advanced applications (DASFAA 2009). Springer, Berlin, pp 230–235 Dang XH, Lee V, Ng WK, Ciptadi A, Ong KL (2009a) An EM-based algorithm for clustering data streams in sliding windows. In: Zhou X, Yokota H, Deng K, Liu Q (eds) Proceedings of the 14th international conference on database systems for advanced applications (DASFAA 2009). Springer, Berlin, pp 230–235
Zurück zum Zitat Dang XH, Lee VCS, Ng WK, Ong KL (2009b) Incremental and adaptive clustering stream data over sliding window. Springer, Berlin, pp 660–674 Dang XH, Lee VCS, Ng WK, Ong KL (2009b) Incremental and adaptive clustering stream data over sliding window. Springer, Berlin, pp 660–674
Zurück zum Zitat Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. AAAI Press, pp 226–231 Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. AAAI Press, pp 226–231
Zurück zum Zitat Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. SIGKDD Explor Newsl 2(1):51–57CrossRef Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. SIGKDD Explor Newsl 2(1):51–57CrossRef
Zurück zum Zitat Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172 Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172
Zurück zum Zitat Forestiero A, Pizzuti C, Spezzano G (2013) A single pass algorithm for clustering evolving data streams based on swarm intelligence. Data Min Knowl Discov 26(1):1–26CrossRef Forestiero A, Pizzuti C, Spezzano G (2013) A single pass algorithm for clustering evolving data streams based on swarm intelligence. Data Min Knowl Discov 26(1):1–26CrossRef
Zurück zum Zitat Gao J, Li J, Zhang Z, Tan P-N (2005) An incremental data stream clustering algorithm based on dense units detection. Springer, Berlin, pp 420–425 Gao J, Li J, Zhang Z, Tan P-N (2005) An incremental data stream clustering algorithm based on dense units detection. Springer, Berlin, pp 420–425
Zurück zum Zitat Gao X, Ferrara E, Qiu J (2015) Parallel clustering of high-dimensional social media data streams. arXiv:1502.00316 Gao X, Ferrara E, Qiu J (2015) Parallel clustering of high-dimensional social media data streams. arXiv:1502.00316
Zurück zum Zitat Ghesmoune M, Azzag H, Lebbah M (2014) G-Stream: growing neural gas over data stream. In: Loo CK, Siah YK, Wong KW, Jin AT, Huang K (eds) Proceedings of neural information processing: 21st international conference, ICONIP 2014, Kuching, Malaysia, November 3–6, 2014, Part I. Springer International Publishing, pp 207–214 Ghesmoune M, Azzag H, Lebbah M (2014) G-Stream: growing neural gas over data stream. In: Loo CK, Siah YK, Wong KW, Jin AT, Huang K (eds) Proceedings of neural information processing: 21st international conference, ICONIP 2014, Kuching, Malaysia, November 3–6, 2014, Part I. Springer International Publishing, pp 207–214
Zurück zum Zitat Ghesmoune M, Lebbah M, Azzag H (2015) Clustering over data streams based on growing neural gas. Springer, Berlin, pp 134–145 Ghesmoune M, Lebbah M, Azzag H (2015) Clustering over data streams based on growing neural gas. Springer, Berlin, pp 134–145
Zurück zum Zitat Ghesmoune M, Lebbah M, Azzag H (2016) State-of-the-art on clustering data streams. Big Data Anal 1(1):13CrossRef Ghesmoune M, Lebbah M, Azzag H (2016) State-of-the-art on clustering data streams. Big Data Anal 1(1):13CrossRef
Zurück zum Zitat Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15(3):515–528CrossRef Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15(3):515–528CrossRef
Zurück zum Zitat Hahsler M, Bolaños M (2016) Clustering data streams based on shared density between micro-clusters. IEEE Trans Knowl Data Eng 28(6):1449–1461CrossRef Hahsler M, Bolaños M (2016) Clustering data streams based on shared density between micro-clusters. IEEE Trans Knowl Data Eng 28(6):1449–1461CrossRef
Zurück zum Zitat Hassani M, Kranen P, Seidl T (2011) Precise anytime clustering of noisy sensor data with logarithmic complexity. In: Proceedings of 5th international workshop on knowledge discovery from sensor data (SensorKDD 2011) in conjunction with 17th ACM SIGKDD conference on knowledge discovery and data mining (KDD 2011), ACM, San Diego, pp 52–60 Hassani M, Kranen P, Seidl T (2011) Precise anytime clustering of noisy sensor data with logarithmic complexity. In: Proceedings of 5th international workshop on knowledge discovery from sensor data (SensorKDD 2011) in conjunction with 17th ACM SIGKDD conference on knowledge discovery and data mining (KDD 2011), ACM, San Diego, pp 52–60
Zurück zum Zitat Hassani M, Spaus P, Gaber MM, Seidl T (2012) Density-based projected clustering of data streams. Springer, Berlin, pp 311–324 Hassani M, Spaus P, Gaber MM, Seidl T (2012) Density-based projected clustering of data streams. Springer, Berlin, pp 311–324
Zurück zum Zitat Hassani M, Kim Y, Seidl T (2013) Subspace MOA: subspace stream clustering evaluation using the MOA framework. Springer, Berlin, pp 446–449 Hassani M, Kim Y, Seidl T (2013) Subspace MOA: subspace stream clustering evaluation using the MOA framework. Springer, Berlin, pp 446–449
Zurück zum Zitat Hutter F, Hoos HH, Stützle T (2007) Automatic algorithm configuration based on local search. In: Proceedings of the twenty-second conference on artifical intelligence (AAAI ’07), pp 1152–1157 Hutter F, Hoos HH, Stützle T (2007) Automatic algorithm configuration based on local search. In: Proceedings of the twenty-second conference on artifical intelligence (AAAI ’07), pp 1152–1157
Zurück zum Zitat Hutter F, Hoos HH, Leyton-Brown K, Stützle T (2009) ParamILS: an automatic algorithm configuration framework. J Artif Intell Res 36:267–306CrossRef Hutter F, Hoos HH, Leyton-Brown K, Stützle T (2009) ParamILS: an automatic algorithm configuration framework. J Artif Intell Res 36:267–306CrossRef
Zurück zum Zitat Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In: Proceedings of LION-5, pp 507–523 Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In: Proceedings of LION-5, pp 507–523
Zurück zum Zitat Isaksson C, Dunham MH, Hahsler M (2012) SOStream: self organizing density-based clustering over data stream. Springer, Berlin, pp 264–278 Isaksson C, Dunham MH, Hahsler M (2012) SOStream: self organizing density-based clustering over data stream. Springer, Berlin, pp 264–278
Zurück zum Zitat Jia C, Tan C, Yong A (2008) A grid and density-based clustering algorithm for processing data stream. In: Second international conference on genetic and evolutionary computing (WGEC ’08), pp 517–521 Jia C, Tan C, Yong A (2008) A grid and density-based clustering algorithm for processing data stream. In: Second international conference on genetic and evolutionary computing (WGEC ’08), pp 517–521
Zurück zum Zitat Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69CrossRef Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69CrossRef
Zurück zum Zitat Kontaki M, Papadopoulos AN, Manolopoulos Y (2008) Continuous trend-based clustering in data streams. Springer, Berlin, pp 251–262 Kontaki M, Papadopoulos AN, Manolopoulos Y (2008) Continuous trend-based clustering in data streams. Springer, Berlin, pp 251–262
Zurück zum Zitat Kranen P, Assent I, Baldauf C, Seidl T (2009) Self-adaptive anytime stream clustering. In: 9th IEEE international conference on data mining (ICDM ’09), pp 249–258 Kranen P, Assent I, Baldauf C, Seidl T (2009) Self-adaptive anytime stream clustering. In: 9th IEEE international conference on data mining (ICDM ’09), pp 249–258
Zurück zum Zitat Kranen P, Assent I, Baldauf C, Seidl T (2011a) The ClusTree: indexing micro-clusters for anytime stream mining. In: Knowledge and information systems journal (Springer KAIS), Vol 29, Issue 2. Springer, London, pp 249–272 Kranen P, Assent I, Baldauf C, Seidl T (2011a) The ClusTree: indexing micro-clusters for anytime stream mining. In: Knowledge and information systems journal (Springer KAIS), Vol 29, Issue 2. Springer, London, pp 249–272
Zurück zum Zitat Kranen P, Reidl F, Villaamil FS, Seidl T (2011b) Hierarchical clustering for real-time stream data with noise. Springer, Berlin, pp 405–413 Kranen P, Reidl F, Villaamil FS, Seidl T (2011b) Hierarchical clustering for real-time stream data with noise. Springer, Berlin, pp 405–413
Zurück zum Zitat Lin J, Lin H (2009) A density-based clustering over evolving heterogeneous data stream. In: 2009 ISECS international colloquium on computing, communication, control, and management, vol 4, pp 275–277 Lin J, Lin H (2009) A density-based clustering over evolving heterogeneous data stream. In: 2009 ISECS international colloquium on computing, communication, control, and management, vol 4, pp 275–277
Zurück zum Zitat Liu LX, Huang H, Guo YF, Chen FC (2009) rDenStream, a clustering algorithm over an evolving data stream. In: 2009 international conference on information engineering and computer science, pp 1–4 Liu LX, Huang H, Guo YF, Chen FC (2009) rDenStream, a clustering algorithm over an evolving data stream. In: 2009 international conference on information engineering and computer science, pp 1–4
Zurück zum Zitat López-Ibáñez M, Dubois-Lacoste J, Cáceres LP, Stützle T, Birattari M (2016) The irace package: iterated racing for automatic algorithm configuration. Oper Res Perspect 3:43–58CrossRef López-Ibáñez M, Dubois-Lacoste J, Cáceres LP, Stützle T, Birattari M (2016) The irace package: iterated racing for automatic algorithm configuration. Oper Res Perspect 3:43–58CrossRef
Zurück zum Zitat Lorbeer B, Kosareva A, Deva B, Softić D, Ruppel P, Küpper A (2017) A-BIRCH: automatic threshold estimation for the BIRCH clustering algorithm. Springer, Berlin, pp 169–178 Lorbeer B, Kosareva A, Deva B, Softić D, Ruppel P, Küpper A (2017) A-BIRCH: automatic threshold estimation for the BIRCH clustering algorithm. Springer, Berlin, pp 169–178
Zurück zum Zitat Lühr S, Lazarescu M (2008) Connectivity based stream clustering using localised density exemplars. Springer, Berlin, pp 662–672 Lühr S, Lazarescu M (2008) Connectivity based stream clustering using localised density exemplars. Springer, Berlin, pp 662–672
Zurück zum Zitat Lühr S, Lazarescu M (2009) Incremental clustering of dynamic data streams using connectivity based representative points. Data Knowl Eng 68(1):1–27CrossRef Lühr S, Lazarescu M (2009) Incremental clustering of dynamic data streams using connectivity based representative points. Data Knowl Eng 68(1):1–27CrossRef
Zurück zum Zitat Ma WH (2014) Survey on data streams clustering techniques. In: Manufacture engineering, quality and production system III, volume 933 of Advanced Materials Research. Trans Tech Publications, pp 768–773 Ma WH (2014) Survey on data streams clustering techniques. In: Manufacture engineering, quality and production system III, volume 933 of Advanced Materials Research. Trans Tech Publications, pp 768–773
Zurück zum Zitat Martinetz T, Schulten K et al (1991) A “neural-gas” network learns topologies. University of Illinois at Urbana-Champaign Martinetz T, Schulten K et al (1991) A “neural-gas” network learns topologies. University of Illinois at Urbana-Champaign
Zurück zum Zitat Meesuksabai W, Kangkachit T, Waiyamai K (2011) Hue-Stream: evolution-based clustering technique for heterogeneous data streams with uncertainty. In: Tang J, King I, Chen L, Wang J (eds) ADMA, volume 7121 of Lecture Notes in Computer Science. Springer, pp 27–40 Meesuksabai W, Kangkachit T, Waiyamai K (2011) Hue-Stream: evolution-based clustering technique for heterogeneous data streams with uncertainty. In: Tang J, King I, Chen L, Wang J (eds) ADMA, volume 7121 of Lecture Notes in Computer Science. Springer, pp 27–40
Zurück zum Zitat Motoyoshi M, Miura T, Shioya I (2004) Clustering stream data by regression analysis. In: Proceedings of the second workshop on Australasian information security, data mining and web intelligence, and software internationalisation, volume 32 of ACSW Frontiers ’04, Australian Computer Society, Darlinghurst, pp 115–120 Motoyoshi M, Miura T, Shioya I (2004) Clustering stream data by regression analysis. In: Proceedings of the second workshop on Australasian information security, data mining and web intelligence, and software internationalisation, volume 32 of ACSW Frontiers ’04, Australian Computer Society, Darlinghurst, pp 115–120
Zurück zum Zitat Mousavi M, Bakar AA, Vakilian M (2015) Data stream clustering algorithms: a review. Int J Adv Soft Comput Appl 7:1–15 Mousavi M, Bakar AA, Vakilian M (2015) Data stream clustering algorithms: a review. Int J Adv Soft Comput Appl 7:1–15
Zurück zum Zitat Nguyen H-L, Woon Y-K, Ng W-K (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569CrossRef Nguyen H-L, Woon Y-K, Ng W-K (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569CrossRef
Zurück zum Zitat Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H-P (2012) Density-based projected clustering over high dimensional data streams. In: Proceedings of the 2012 SIAM international conference on data mining, pp 987–998 Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H-P (2012) Density-based projected clustering over high dimensional data streams. In: Proceedings of the 2012 SIAM international conference on data mining, pp 987–998
Zurück zum Zitat O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Proceedings of the 18th international conference on data engineering (ICDE), pp 685–694 O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Proceedings of the 18th international conference on data engineering (ICDE), pp 685–694
Zurück zum Zitat Park NH, Lee WS (2004) Statistical Grid-based clustering over data streams. SIGMOD Rec 33(1):32–37CrossRef Park NH, Lee WS (2004) Statistical Grid-based clustering over data streams. SIGMOD Rec 33(1):32–37CrossRef
Zurück zum Zitat Park NH, Lee WS (2007a) Cell trees: an adaptive synopsis structure for clustering multi-dimensional on-line data streams. Data Knowl Eng 63(2):528–549CrossRef Park NH, Lee WS (2007a) Cell trees: an adaptive synopsis structure for clustering multi-dimensional on-line data streams. Data Knowl Eng 63(2):528–549CrossRef
Zurück zum Zitat Park NH, Lee WS (2007b) Grid-based subspace clustering over data streams. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, ACM, New York, pp 801–810CrossRef Park NH, Lee WS (2007b) Grid-based subspace clustering over data streams. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, ACM, New York, pp 801–810CrossRef
Zurück zum Zitat Ren J, Ma R (2009) Density-based data streams clustering over sliding windows. In: 2009 Sixth international conference on fuzzy systems and knowledge discovery, volume 5, pp 248–252 Ren J, Ma R (2009) Density-based data streams clustering over sliding windows. In: 2009 Sixth international conference on fuzzy systems and knowledge discovery, volume 5, pp 248–252
Zurück zum Zitat Ren J, Cai B, Hu C (2011) Clustering over data streams based on grid density and index tree. J Converg Inf Technol 6(1):83–93 Ren J, Cai B, Hu C (2011) Clustering over data streams based on grid density and index tree. J Converg Inf Technol 6(1):83–93
Zurück zum Zitat Ruiz C, Spiliopoulou M, Menasalvas E (2007) C-DBSCAN: density-based clustering with constraints. Springer, Berlin, pp 216–223 Ruiz C, Spiliopoulou M, Menasalvas E (2007) C-DBSCAN: density-based clustering with constraints. Springer, Berlin, pp 216–223
Zurück zum Zitat Ruiz C, Menasalvas E, Spiliopoulou M (2009) C-DenStream: using domain knowledge on a data stream. Springer, Berlin, pp 287–301 Ruiz C, Menasalvas E, Spiliopoulou M (2009) C-DenStream: using domain knowledge on a data stream. Springer, Berlin, pp 287–301
Zurück zum Zitat Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho AC, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv 46(1):13:1–13:31CrossRef Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho AC, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv 46(1):13:1–13:31CrossRef
Zurück zum Zitat Spinosa EJ, de Leon F de Carvalho AP, Gama J (2007) Olindda: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proceedings of the 2007 ACM symposium on applied computing. ACM, pp 448–452 Spinosa EJ, de Leon F de Carvalho AP, Gama J (2007) Olindda: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proceedings of the 2007 ACM symposium on applied computing. ACM, pp 448–452
Zurück zum Zitat Steil J, Huang MX, Bulling A (2018) Fixation detection for head-mounted eye tracking based on visual similarity of gaze targets. In: Proceedings of international symposium on eye tracking research and applications (ETRA), pp 23:1–23:9 Steil J, Huang MX, Bulling A (2018) Fixation detection for head-mounted eye tracking based on visual similarity of gaze targets. In: Proceedings of international symposium on eye tracking research and applications (ETRA), pp 23:1–23:9
Zurück zum Zitat Tasoulis DK, Adams NM, Hand DJ (2006) Unsupervised clustering in streaming data. In: Sixth IEEE international conference on data mining–workshops (ICDMW’06), pp 638–642 Tasoulis DK, Adams NM, Hand DJ (2006) Unsupervised clustering in streaming data. In: Sixth IEEE international conference on data mining–workshops (ICDMW’06), pp 638–642
Zurück zum Zitat Tasoulis D, Adams N, Weston DJ, Hand DJ (2008) Mining information from plastic card transaction streams. In: Proceedings in computational statistics: 18th symposium (COMPSTAT 2008), volume 2, pp 315–322 Tasoulis D, Adams N, Weston DJ, Hand DJ (2008) Mining information from plastic card transaction streams. In: Proceedings in computational statistics: 18th symposium (COMPSTAT 2008), volume 2, pp 315–322
Zurück zum Zitat Theiler J (1990) Estimating fractal dimension. J Opt Soc Am A 7(6):1055–1073CrossRef Theiler J (1990) Estimating fractal dimension. J Opt Soc Am A 7(6):1055–1073CrossRef
Zurück zum Zitat Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B (Stat Methodol) 63(2):411–423CrossRef Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B (Stat Methodol) 63(2):411–423CrossRef
Zurück zum Zitat Tu L, Chen Y (2009) Stream data clustering based on grid density and attraction. ACM Trans Knowl Discov Data 3(3):12:1–12:27CrossRef Tu L, Chen Y (2009) Stream data clustering based on grid density and attraction. ACM Trans Knowl Discov Data 3(3):12:1–12:27CrossRef
Zurück zum Zitat Udommanetanakit K, Rakthanmanon T, Waiyamai K (2007) E-Stream: evolution-based technique for stream clustering. Springer, Berlin, pp 605–615 Udommanetanakit K, Rakthanmanon T, Waiyamai K (2007) E-Stream: evolution-based technique for stream clustering. Springer, Berlin, pp 605–615
Zurück zum Zitat van Rijn JN, Holmes G, Pfahringer B, Vanschoren J (2014) Algorithm selection on data streams. In: Džeroski S, Panov P, Kocev D, Todorovski L (eds) Proceedings of the 17th international conference on discovery science (DS), volume 8777 of lecture notes in computer science (LNCS). Springer, pp 325–336 van Rijn JN, Holmes G, Pfahringer B, Vanschoren J (2014) Algorithm selection on data streams. In: Džeroski S, Panov P, Kocev D, Todorovski L (eds) Proceedings of the 17th international conference on discovery science (DS), volume 8777 of lecture notes in computer science (LNCS). Springer, pp 325–336
Zurück zum Zitat van Rijn J, Nicolaas GH, Pfahringer B, Vanschoren J (2018) The online performance estimation framework: heterogeneous ensemble learning for data streams. Mach Learn 107(1):149–176CrossRef van Rijn J, Nicolaas GH, Pfahringer B, Vanschoren J (2018) The online performance estimation framework: heterogeneous ensemble learning for data streams. Mach Learn 107(1):149–176CrossRef
Zurück zum Zitat Wan L, Ng WK, Dang XH, Yu PS, Zhang K (2009) Density-based clustering of data streams at multiple resolutions. ACM Trans Knowl Discov Data 3(3):14:1–14:28CrossRef Wan L, Ng WK, Dang XH, Yu PS, Zhang K (2009) Density-based clustering of data streams at multiple resolutions. ACM Trans Knowl Discov Data 3(3):14:1–14:28CrossRef
Zurück zum Zitat Wang CD, Lai JH, Huang D, Zheng WS (2013) SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans Knowl Data Eng 25(6):1410–1424CrossRef Wang CD, Lai JH, Huang D, Zheng WS (2013) SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans Knowl Data Eng 25(6):1410–1424CrossRef
Zurück zum Zitat Wang G, Zhang X, Tang S, Zheng H, Zhao BY (2016) Unsupervised clickstream clustering for user behavior analysis. In: Proceedings of the 2016 CHI conference on human factors in computing systems, ACM, New York, pp 225–236CrossRef Wang G, Zhang X, Tang S, Zheng H, Zhao BY (2016) Unsupervised clickstream clustering for user behavior analysis. In: Proceedings of the 2016 CHI conference on human factors in computing systems, ACM, New York, pp 225–236CrossRef
Zurück zum Zitat Wedel M, Kamakura WA (2000) Market segmentation, 2nd edn. Springer, USCrossRef Wedel M, Kamakura WA (2000) Market segmentation, 2nd edn. Springer, USCrossRef
Zurück zum Zitat Yang C, Zhou J (2006) HClustream: a novel approach for clustering evolving heterogeneous data stream. In: Sixth IEEE international conference on data mining—workshops (ICDMW’06), pp 682–688 Yang C, Zhou J (2006) HClustream: a novel approach for clustering evolving heterogeneous data stream. In: Sixth IEEE international conference on data mining—workshops (ICDMW’06), pp 682–688
Zurück zum Zitat Yang Y, Liu Z, Zhang Jp, Yang J (2012) Dynamic density-based clustering algorithm over uncertain data streams. In: 2012 9th international conference on fuzzy systems and knowledge discovery, pp 2664–2670 Yang Y, Liu Z, Zhang Jp, Yang J (2012) Dynamic density-based clustering algorithm over uncertain data streams. In: 2012 9th international conference on fuzzy systems and knowledge discovery, pp 2664–2670
Zurück zum Zitat Zhang X, Wang W (2010) Self-adaptive change detection in streaming data with non-stationary distribution. Springer, Berlin, pp 334–345 Zhang X, Wang W (2010) Self-adaptive change detection in streaming data with non-stationary distribution. Springer, Berlin, pp 334–345
Zurück zum Zitat Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, ACM, Montreal, pp 103–114CrossRef Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, ACM, Montreal, pp 103–114CrossRef
Zurück zum Zitat Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Mini Knowl Discov 1(2):141–182CrossRef Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Mini Knowl Discov 1(2):141–182CrossRef
Zurück zum Zitat Zhang X, Germain C, Sebag M (2010) Adaptively detecting changes in autonomic grid computing. In: 2010 11th IEEE/ACM international conference on grid computing, pp 387–392 Zhang X, Germain C, Sebag M (2010) Adaptively detecting changes in autonomic grid computing. In: 2010 11th IEEE/ACM international conference on grid computing, pp 387–392
Zurück zum Zitat Zhou A, Cao F, Qian W, Jin C (2007a) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15(2):181–214CrossRef Zhou A, Cao F, Qian W, Jin C (2007a) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15(2):181–214CrossRef
Zurück zum Zitat Zhou A, Cao F, Yan Y, Sha C, He X (2007b) Distributed data stream clustering: a fast EM-based approach. In: 2007 IEEE 23rd international conference on data engineering, pp 736–745 Zhou A, Cao F, Yan Y, Sha C, He X (2007b) Distributed data stream clustering: a fast EM-based approach. In: 2007 IEEE 23rd international conference on data engineering, pp 736–745
Zurück zum Zitat Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data streams in real time. In: Proceedings of the 28th international conference on very large data bases, VLDB Endowment, Hong Kong, pp 358–369CrossRef Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data streams in real time. In: Proceedings of the 28th international conference on very large data bases, VLDB Endowment, Hong Kong, pp 358–369CrossRef
Metadaten
Titel
Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms
verfasst von
Matthias Carnein
Heike Trautmann
Publikationsdatum
21.01.2019
Verlag
Springer Fachmedien Wiesbaden
Erschienen in
Business & Information Systems Engineering / Ausgabe 3/2019
Print ISSN: 2363-7005
Elektronische ISSN: 1867-0202
DOI
https://doi.org/10.1007/s12599-019-00576-5

Weitere Artikel der Ausgabe 3/2019

Business & Information Systems Engineering 3/2019 Zur Ausgabe