Abstract
Scientific and industrial examples of data streams abound in astronomy, telecommunication operations, banking and stock-market applications, e-commerce and other fields. A challenge imposed by continuously arriving data streams is to analyze them and to modify the models that explain them as new data arrives. In this paper, we analyze the requirements needed for clustering data streams. We review some of the latest algorithms in the literature and assess if they meet these requirements.
- Barbará D., and Chen, P. Using the Fractal Dimension to Cluster Datasets. Proceedings of the ACM-SIGKDD International Conference on Knowledge and Data Mining, Boston, August 2000. Google ScholarDigital Library
- Barbará D., and Chen, P. Tracking Clusters in Evolving Data Sets. Proceedings of FLAIRS'2001, Special Track on Knowledge Discovery and Data Mining , Key West, May 2001. Google ScholarDigital Library
- Chernoff, H. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations. Annals of Mathematical Statistics, Vol. 23, pages 493-509, 1952.Google ScholarCross Ref
- Fisher D. H. Iterative Optimization and Simplification of Hierarchical Clusterings. Journal of AI Research, Vol. 4, pages 147-180, 1996. Google ScholarDigital Library
- Guha S., Mishra N., Motwani R., and O'Callaghan L. Clustering data streams. Proceedings of the Annual Symposium on Foundations of Computer Science November 2000. Google ScholarDigital Library
- Gluck M. A., and Corter J. E. Information, uncertainty, and the utility of categories. Proceedings of the Seventh Annual Conference of the Cognitive Science Society, Irvine, CA, 1985.Google Scholar
- Schroeder M. Fractal, Chaos, Power Laws: Minutes from an Infinite Paradise. W.H. Freeman and Company, 1991.Google Scholar
- Sheikholeslami G., Chatterjee S., and Zhang A. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. Proceedings of the 24th Very Large Data Bases Conference, 1998. Google ScholarDigital Library
- Traina, A., Traina, C., Papadimitriou S., and Faloutsos, C. Tri-Plots: Scalable Tools for Multidimensional Data Mining. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge and Data Mining, San Francisco, August 2001. Google ScholarDigital Library
- O'Callaghan L., Mishra N., Meyerson A., Guha S., and Motwani R. High-Performance Clustering of Streams and Large Data Sets. International Conference on Data Engineering (ICDE) 2002 (to appear).Google Scholar
- Watanabe, O. Simple Sampling Techniques for Discovery Science. IEICE Transactions on Inf. & Syst., Vol. E83-D, No. 1, January, 2000.Google Scholar
- Zhang T., Ramakrishnan R., and Livny M. "BIRCH: A Efficient Data Clustering Method for Very Large Databases. Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, 1996. Google ScholarDigital Library
Index Terms
- Requirements for clustering data streams
Recommendations
A Framework for Clustering Massive-Domain Data Streams
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data EngineeringIn this paper, we will examine the problem of clustering massive domain data streams. Massive-domain data streams are those in which the number of possible domain values for each attribute are very large and cannot be easily tracked for clustering ...
Adaptive non-linear clustering in data streams
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementData stream clustering has emerged as a challenging and interesting problem over the past few years. Due to the evolving nature, and one-pass restriction imposed by the data stream model, traditional clustering algorithms are inapplicable for stream ...
Big Data Clustering using Data Streams Approach
BDAW '16: Proceedings of the International Conference on Big Data and Advanced Wireless TechnologiesIn this paper we propose to process big data using a data streams approach. The data set is divided into subsets, each subsets is considered as a time window from a data stream. Our approach uses a neighborhood-based clustering. Instead of processing ...
Comments