Abstract
Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with nonstationary, unbounded data that arrive in an online fashion. The intrinsic nature of stream data requires the development of algorithms capable of performing fast and incremental processing of data objects, suitably addressing time and memory limitations. In this article, we present a survey of data stream clustering algorithms, providing a thorough discussion of the main design components of state-of-the-art algorithms. In addition, this work addresses the temporal aspects involved in data stream clustering, and presents an overview of the usually employed experimental methodologies. A number of references are provided that describe applications of data stream clustering in different domains, such as network intrusion detection, sensor networks, and stock market analysis. Information regarding software packages and data repositories are also available for helping researchers and practitioners. Finally, some important issues and open questions that can be subject of future research are discussed.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Data stream clustering: A survey
- Ackermann, M. R., Martens, M., Raupach, C., Swierkot, K., Lammersen, C., and Sohler, C. 2012. StreamKM++: A clustering algorithm for data streams. ACM J. Exper. Algor. 17, 1. Google ScholarDigital Library
- Agarwal, P. K., Har-Peled, S., and Varadarajan, K. R. 2004a. Approximating extent measures of points. J. ACM 51, 4, 606--635. Google ScholarDigital Library
- Aggarwal, C. C. 2003. A framework for diagnosing changes in evolving data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). 575--586. Google ScholarDigital Library
- Aggarwal, C. C. 2007. Data Streams -- Models and Algorithms. Springer. Google ScholarDigital Library
- Aggarwal, C. C. 2010. A segment-based framework for modeling and mining data streams. Knowl. Inf. Syst, 30, 1, 1--29. Google ScholarDigital Library
- Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. 2003. A framework for clustering evolving data streams. In Proceedings of the 29th Conference on Very Large Data Bases (VLDB'03). Vol. 29, 81--92. Google ScholarDigital Library
- Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. 2004b. A framework for projected clustering of high dimensional data streams. In Proceedings of the 30th Conference on Very Large Data Bases (VLDB'04). Vol. 30, 852--863. Google ScholarDigital Library
- Aggarwal, C. C. and Yu, P. S. 2006. A framework for clustering massive text and categorical data streams. In Proceedings of the 6th International Conference on Data Mining (SDM'06). 479.Google Scholar
- Aggarwal, C. C. and Yu, P. S. 2008. A framework for clustering uncertain data streams. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE'08). 150--159. Google ScholarDigital Library
- Aghbari, Z. A., Kamel, I., and Awad, T. 2012. On clustering large number of data streams. Intell. Data Anal. 16, 1, 69--91.Google ScholarDigital Library
- Amini, A., Wah, T. Y., Saybani, M. R., Aghabozorgi, S. R., and Yazdi, S. 2011. A study of density-grid based clustering algorithms on data streams. In Proceedings of the 8th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD'11). IEEE Press, Los Alamitos, CA, 1652--1656.Google Scholar
- Arabie, P. and Hubert, L. J. 1999. An Overview of Combinatorial Data Analysis. World Scientific Publishing.Google Scholar
- Arthur, D. and Vassilvitskii, S. 2006. How slow is the k-means method? In Proceedings of the 22nd Annual Symposium on Computational Geometry (SCG'06). ACM Press, New York, 144--153. Google ScholarDigital Library
- Arthur, D. and Vassilvitskii, S. 2007. K-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 1027--1035. Google ScholarDigital Library
- Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'02). ACM Press, New York, 1--16. Google ScholarDigital Library
- Babcock, B., Datar, M., Motwani, R., and O'Callaghan, L. 2003. Maintaining variance and k-medians over data stream windows. In Proceedings of the 22nd ACM SIGMOD-SIGACT SIGART Symposium on Principles of Database Systems. ACM Press, New York, 234--243. Google ScholarDigital Library
- Badoiu, M., Har-Peled, S., and Indyk, P. 2002. Approximate clustering via core-sets. In Proceedings of the 34th ACM Symposium on Theory of Computing (STOC'02). ACM Press, New York, 250--257. Google ScholarDigital Library
- Ball, G. H. and Hall, D. J. 1965. ISODATA. A novel method of data analysis and pattern classification. Tech. rep. Stanford Research Institute, Menlo Park.Google Scholar
- Barbara, D. 2002. Requirements for clustering data streams. SIGKDD Explorations 3, 23--27. Google ScholarDigital Library
- Bentley, J. L. 1975. Multidimensional binary search trees used for associative searching. Comm. ACM 18, 9, 509--517. Google ScholarDigital Library
- Bentley, J. L. and Saxe, J. B. 1980. Decomposable searching problems I: Static-to-dynamic transformation. J. Algor. 1, 4, 301--358.Google ScholarCross Ref
- Bhat, U. N. and Miller, G. K. 2002. Elements of Applied Stochastic Processes 3rd Ed. John Wiley and Sons, New Jersey.Google Scholar
- Bifet, A. 2010. Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. IOS Press. Google ScholarDigital Library
- Bifet, A., Holmes, G., Kirkby, R., and Pfahringer, B. 2010. MOA: Massive online analysis. J. Mach. Learn. Res. 11, 1601--1604. Google ScholarDigital Library
- Bradley, P. S. and Fayyad, U. M. 1998a. Refining initial points for k-means clustering. In Proceedings of the 15th International Conference on Machine Learning (ICML'98). Morgan Kaufmann Publishers, San Francisco, 91--99. Google ScholarDigital Library
- Bradley, P. S., Fayyad, U. M., and Reina, C. 1998b. Scaling clustering algorithms to large databases. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD'98). AAAI Press, 9--15.Google Scholar
- Cao, F., Ester, M., Qian, W., and Zhou, A. 2006. Density-based clustering over an evolving data stream with noise. In Proceedings of the 6th SIAM International Conference on Data Mining. 328--339.Google Scholar
- Charikar, M. and Guha, S. 1999. Improved combinatorial algorithms for the facility location and k-median problems. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science. 378--388. Google ScholarDigital Library
- Charikar, M., O'Callaghan, L., and Panigrahy, R. 2003. Better streaming algorithms for clustering problems. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing. ACM Press, New York, 30--39. Google ScholarDigital Library
- Chen, Y. and Tu, L. 2007. Density-based clustering for real-time stream data. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, 133--142. Google ScholarDigital Library
- Cho, K., Jo, S., Jang, H., Kim, S. M., and Song, J. 2006. DCF: An efficient data stream clustering framework for streaming applications. In Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA'06). 114--122. Google ScholarDigital Library
- Csernel, B., Clerot, F., and Hebrail, G. 2006. Datastream clustering over tilted windows through sampling. In Proceedings of the Knowledge Discovery from Data Streams Workshop (ECML/PKDD).Google Scholar
- Dang, X. H., Lee, V. C. S., Ng, W. K., Ciptadi, A., and Ong, K.-L. 2009. An em-based algorithm for clustering data streams in sliding windows. In Proceedings of the 14th International Conference on Database Systems for Advanced Applications. Lecture Notes in Computer Science, vol. 5463, Springer, 230--235. Google ScholarDigital Library
- Domingos, P. and Hulten, G. 2001. A general method for scaling up machine learning algorithms and its application to clustering. In Proceedings of the 8th International Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, 106--113. Google ScholarDigital Library
- Dunham, M. H., Menga, Y., and Huang, J. 2004. Extensible markov model. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM'04). 371--374. Google ScholarDigital Library
- Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 226--231.Google ScholarDigital Library
- Ribeiro, E. F., Barros, R. C., Carvalho, A. C. P. L. F., and Gama, J. 2012. improving the offline clustering stage of data stream algorithms in scenarios with variable number of clusters. In Proceedings of the 27th ACM Symposium on Applied Computing (SAC'12). 572--573. Google ScholarDigital Library
- Farnstrom, F., Lewis, J., and Elkan, C. 2000. Scalability for clustering algorithms revisited. SIGKDD Exploration Newslett. 2, 1, 51--57. Google ScholarDigital Library
- Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence, Menlo Park, CA, 1--34. Google ScholarDigital Library
- Frank, A. and Asuncion, A. 2010. UCI machine learning repository. http://archive.ics.uci.edu/ml.Google Scholar
- Gaber, M. M., Vatsavai, R. R., Omitaomu, O. A., Gama, J., Chawla, N. V., and Ganguly, A. R. 2010. Knowledge Discovery from Sensor Data. Springer. Google ScholarDigital Library
- Gama, J. 2010. Knowledge Discovery from Data Streams. Chapman Hall/CRC. Google ScholarDigital Library
- Gama, J. and Gaber, M. M. 2007. Learning from Data Streams: Processing Techniques in Sensor Networks. Springer. Google ScholarDigital Library
- Gama, J., Medas, P., Castillo, G., and Rodrigues, P. P. 2004. Learning with drift detection. In Proceedings of the 17th Brazilian Symposium on Artificial Intelligence (SBIA'04). Vol. 3171., 286--295.Google Scholar
- Gama, J., Pereira, P. R., and Lopes, L. 2011. Clustering distributed sensor data streams using local processing and reduced communication. Intell. Data Anal. 15, 1, 3--28. Google ScholarCross Ref
- Gama, J. and Pinto, C. 2006. Discretization from data streams: Applications to histograms and data mining. In Proceedings of the ACM Symposium on Applied Computing (SAC'06). 662--667. Google ScholarDigital Library
- Gan, G., Ma, C., and Wu, J. 2007. Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability. SIAM. Google ScholarDigital Library
- Gao, M.-M., Liu, J.-Z., and Gao, X.-X. 2010. Application of compound gaussian mixture model clustering in the data stream. In Proceedings of the International Conference on Computer Application and System Modeling (ICCASM'10).Google Scholar
- Gonzalez, T. F. 1985. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293--306.Google ScholarCross Ref
- Guha, S. 2009. Tight results for clustering and summarizing data streams. In Proceedings of the 12th International Conference on Database Theory (ICDT'09). ACM Press, New York, 268--275. Google ScholarDigital Library
- Guha, S., Meyerson, A., Mishra, N., Motwani, R., and O'Callaghan, L. 2003. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Engin. 15, 515--528. Google ScholarDigital Library
- Guha, S., Mishra, N., Motwani, R., and O'Callaghan, L. 2000. Clustering data streams. In Proceedings of the IEEE Symposium on Foundations of Computer Science. 359--366. Google ScholarDigital Library
- Hahsler, M. and Dunham, M. H. 2010. rEMM: Extensible markov model for data stream clustering in r. J. Statist. Softw. 35, 5, 1--31.Google ScholarCross Ref
- Hahsler, M. and Dunham, M. H. 2011. Temporal structure learning for clustering massive data streams in real-time. In Proceedings of the SIAM Conference on Data Mining. SIAM/Omnipress, 664--675.Google Scholar
- Han, J. and Kamber, M. 2000. Data Mining: Concepts and Techniques. Morgan Kaufmann. Google ScholarDigital Library
- Har-Peled, S. and Mazumdar, S. 2004. On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 291--300. Google ScholarDigital Library
- Hulten, G. and Domingos, P. 2003. VFML -- A toolkit for mining high-speed time-changing data streams. Tech. rep. University of Washington. http://www.cs.washington.edu/dm/vfml/.Google Scholar
- Isaksson, C., Dunham, M. H., and Hahsler, M. 2012. SOStream: Self organizing density-based clustering over data stream. In Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition. Lecture Notes in Computer Science, vol. 7376, Springer, 264--278. Google ScholarDigital Library
- Jain, A. K. 2009. Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31, 651--666. Google ScholarDigital Library
- Jiang, N. and Gruenwald, L. 2006. Research issues in data stream association rule mining. SIGMOD Rec. 35, 1, 14--19. Google ScholarDigital Library
- Kaufman, L. and Rousseeuw, P. J. 1990. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Interscience.Google Scholar
- Kavitha, V. and Punithavalli, M. 2010. Clustering time series data stream - A literature survey. Int. J. Comput. Sci. Inf. Secur. 8, 1, 289--294.Google Scholar
- Khalilian, M. and Mustapha, N. 2010. Data stream clustering: challenges and issues. In Proceedings of International Multi Conference of Engineers and Computer Scientists. 566--569.Google Scholar
- Kogan, J. 2007. Introduction to Clustering Large and High-Dimensional Data. Cambridge University Press. Google ScholarDigital Library
- Kontaki, M., Papadopoulos, A. N., and Manolopoulos, Y. 2008. Continuous trend-based clustering in data streams. In Proceedings of the 10th International Conference on Data Warehousing and Knowledge Discovery. 251--262. Google ScholarDigital Library
- Kranen, P., Assent, I., Baldauf, C., and Seidl, T. 2011. The clustree: Indexing microclusters for anytime stream mining. Knowl. Inf. Syst. 29, 2, 249--272. Google ScholarDigital Library
- Kremer, H., Kranen, P., Jansen, T., Seidl, T., Bifet, A., Holmes, G., and Pfahringer, B.. 2011. An effective evaluation measure for clustering on evolving data streams. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'11). ACM Press, New York, 868--876. Google ScholarDigital Library
- Li, Q., Ma, X., Tang, S., and Xie, S. 2011. Continuously identifying representatives out of massive streams. In Proceedings of the 7th International Conference on Advanced Data Mining and Applications (ADMA'11). Springer, 1--14. Google ScholarDigital Library
- Li, Y. and Tan, B. H. 2011. Data stream clustering algorithm based on affinity propagation and density. Advanced Materials Res. 267, 444--449.Google ScholarCross Ref
- Liu, Y.-B., Cai, J.-R., Yin, J., and Fu, A. W.-C. 2008. Clustering text data streams. J. Comput. Sci. Technol. 23, 1, 112--128.Google ScholarCross Ref
- Lloyd, S. P. 1982. Least squares quantization in pcm. IEEE Trans. Inf. Theory 28, 2, 129--137. Google ScholarDigital Library
- Luhr, S. and Lazarescu, M. 2009. Incremental clustering of dynamic data streams using connectivity based representative points. Data Knowl. Engin. 68, 1--27. Google ScholarDigital Library
- MacQueen, J. B. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. L. M. Le Cam and J. Neyman, Eds., Vol. 1., 281--297.Google Scholar
- De Maesschalck, R., Jouan-Rimbaud, D., and Massart, D. L. 2000. The mahalanobis distance. Chemometrics Intell. Laboratory Syst. 50, 1--18.Google ScholarCross Ref
- Mahdiraji, A. R. 2009. Clustering data stream: A survey of algorithms. Int. J. Knowl.-Based Intell. Engin. Syst. 13, 2, 39--44. Google ScholarDigital Library
- Markov, A. 1971. Extension of the limit theorems of probability theory to a sum of variables connected in a chain. In Dynamic Probabilistic Systems, Vol. 1, R. Howard, Ed., John Wiley and Sons, Chapter Appendix B, 552--577.Google Scholar
- Metwally, A., Agrawal, D., and Abbadi, A. E. L. 2005. Duplicate detection in click streams. In Proceedings of the 14th International Conference on World Wide Web. ACM Press, New York, 12--21. Google ScholarDigital Library
- Meyerson, A. 2001. Online facility location. In Proceedings of the Annual IEEE Symposium on Foundations of Computer Science. 426--431. Google ScholarDigital Library
- Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and Euler, T. 2006. YALE: Rapid prototyping for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). L. Ungar, M. Craven, D. Gunopulos, and T. Eliassi-Rad, Eds., ACM Press, New York, 935--940. Google ScholarDigital Library
- O'Callaghan, L., Mishra, N., Meyerson, A., Guha, S., and Motwani, R. 2002. Streaming data algorithms for high-quality clustering. In Proceedings of the 18th International Conference on Data Engineering. 685--694. Google ScholarDigital Library
- Oliveira, M. and Gama, J. 2010. MEC --Monitoring clusters' transitions. In Proceedings of the 5th Starting AI Researchers Symposium. IOS Press, 212--224. Google ScholarDigital Library
- Ong, K. L, Li, W., Ng, W.-K., and Lim, E.-P. 2004. SCLOPE: An algorithm for clustering data streams of categorical attributes. In Proceedings of the 6th International Conference on Data Warehousing and Knowledge Discovery (KDD'04). 209--218.Google Scholar
- Oliveira, M. D. B. and Gama, J. 2012. A framework to monitor clusters evolution applied to economy and finance problems. Intell. Data Anal. 16, 1, 93--111.Google ScholarDigital Library
- Ostfeld, A., Uber, J. G., Salomons, E., et al. 2008. The battle of the water sensor networks (BWSN): A design challenge for engineers and algorithms. J. Water Resources Plan. Manag. 134, 556.Google ScholarCross Ref
- Park, N. H. and Suk Lee, W. 2007. Cell trees: An adaptive synopsis structure for clustering multidimensional on-line data streams. Data Knowl. Engin. 63, 2, 528--549. Google ScholarDigital Library
- Ren, J. and Ma, R. 2009. Density-based data streams clustering over sliding windows. In Proceedings of the 6th International Conference on Fuzzy Systems and Knowledge Discovery. Vol. 5. 248--252. Google ScholarDigital Library
- Rodrigues, P., Gama, J., and Pedroso, J. P. 2006. ODAC: Hierarchical clustering of time series data streams. In Proceedings of the 6th SIAM International Conference on Data Mining. 499--503.Google Scholar
- Rodrigues, P. P., Gama, J., and Pedroso, J. P. 2008. Hierarchical clustering of time-series data streams. IEEE Trans. Knowl. Data Engin. 20, 5, 615--627. Google ScholarDigital Library
- Serir, L., Ramasso, E., and Zerhouni, N. 2012. Evidential evolving gustafson--kessel algorithm for online data streams partitioning using belief function theory. Int. J. Approximate Reason. 53, 5, 1--22. Google ScholarDigital Library
- Shah, R., Krishnaswamy, S., and Gaber, M. M. 2005. Resource-aware very fast k-means for ubiquitous data stream mining. In Proceedings of the 2nd International Workshop on Knowledge Discovery in Data Streams, Held in Conjunction with the 16th European Conference on Machine Learning (ECML'05).Google Scholar
- Silva, A., Chiky, R., and Hebrail, G. 2011. A clustering approach for sampling data streams in sensor networks. Knowl. Inf. Syst. 32, 1, 1--23. Google ScholarDigital Library
- Silva, J. A. and Hruschka, E. R. 2011. Extending k-means-based algorithms for evolving data streams with variable number of clusters. In Proceedings of the 4th International Conference on Machine Learning and Applications (ICMLA'11). Vol. 2. 14--19. Google ScholarDigital Library
- Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., and Schult, R. 2006. MONIC: Modeling and monitoring cluster transitions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). ACM Press, New York, 706--711. Google ScholarDigital Library
- Steinhaus, H. 1956. Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci 1, 801--804.Google Scholar
- Tasoulis, D. K., Adams, N. M., and Hand, D. J. 2006. Unsupervised clustering in streaming data. In Proceedings of the 6th IEEE International Conference on Data Mining-Workshops (ICDM'06). 638--642. Google ScholarDigital Library
- Tavallaee, M., Bagheri, E., Lu, W., and Ghorbani, A. A. 2009. A detailed analysis of the kdd cup 99 data set. In Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications. 53--58. Google ScholarDigital Library
- Vattani, A. 2009. K-means requires exponentially many iterations even in the plane. In Proceedings of the 25th Annual Symposium on Computational Geometry (SCG'09). ACM Press, New York, 324--332. Google ScholarDigital Library
- Wan, R., Yan, X., and Su, X. 2008. A weighted fuzzy clustering algorithm for data stream. In Proceedings of the ISECS International Colloquium on Computing, Communication, Control, and Management. 360--364. Google ScholarDigital Library
- Wu, X., Kumar, V., Quinlan, J. R, Ghosh, J., Yang, Q., Motoda, H., Mclachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J., and Steinberg, D. 2007. Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1--37. Google ScholarDigital Library
- Xu, R. and Wunsch, D. 2009. Clustering. Computational Intelligence Series, Wiley-IEEE Press. Google ScholarDigital Library
- Yang, C. and Zhou, J. 2006. HClustream: A novel approach for clustering evolving heterogeneous data stream. In Proceedings of the 6th IEEE International Conference on Data Mining. 682--688. Google ScholarDigital Library
- Zhang, T., Ramakrishnan, R., and Livny, M. 1996. BIRCH: An efficient data clustering method for very large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York, 103--114. Google ScholarDigital Library
- Zhang, T., Ramakrishnan, R., and Livny, M. 1997. BIRCH: A new data clustering algorithm and its applications. Data Mining Knowl. Discov. 1, 2, 141--182. Google ScholarDigital Library
- Zhang, X., Sebag, M., and Germain-Renaud, C. 2009. Multi-scale real-time grid monitoring with job stream mining. In Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'09). 420--427. Google ScholarDigital Library
- Zhang, X. and Wang, W. 2010. Self-adaptive change detection in streaming data with nonstationary distribution. In Advanced Data Mining and Applications. Springer, 1--12. Google ScholarDigital Library
- Zhang, X., Zhou, X., and Hu, X. 2006. Semantic smoothing for model-based document clustering. In Proceedings of the 6th International Conference on Data Mining (ICDM'06). 1193--1198. Google ScholarDigital Library
- Zhou, A., Cao, F., Qian, W., and Jin, C. 2008. Tracking clusters in evolving data streams over sliding windows. Knowl. Inf. Syst. 15, 2, 181--214. Google ScholarDigital Library
- Zhu, H., Wang, Y., and Yu, Z. 2010. Clustering of evolving data stream with multiple adaptive sliding window. In Proceedings of the International Conference on Data Storage and Data Engineering (DSDE'10). 95--100. Google ScholarDigital Library
- Zhu, Y. and Shasha, D. 2002. StatStream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB'02). VLBD Endowment, 358--369. Google ScholarDigital Library
Index Terms
- Data stream clustering: A survey
Recommendations
Fat node leading tree for data stream clustering with density peaks
Detecting clusters of arbitrary shape and constantly delivering the results for newly arrived items are two critical challenges in the study of data stream clustering. However, the existing clustering methods could not deal with these two problems ...
Evolving soft subspace clustering
A key challenge to most conventional clustering algorithms in handling many real world problems is that, data points in different clusters are often correlated with different subsets of features. To address this problem, subspace clustering has ...
Data stream clustering: a review
AbstractNumber of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for real-...
Comments