skip to main content
article

Requirements for clustering data streams

Published:01 January 2002Publication History
Skip Abstract Section

Abstract

Scientific and industrial examples of data streams abound in astronomy, telecommunication operations, banking and stock-market applications, e-commerce and other fields. A challenge imposed by continuously arriving data streams is to analyze them and to modify the models that explain them as new data arrives. In this paper, we analyze the requirements needed for clustering data streams. We review some of the latest algorithms in the literature and assess if they meet these requirements.

References

  1. Barbará D., and Chen, P. Using the Fractal Dimension to Cluster Datasets. Proceedings of the ACM-SIGKDD International Conference on Knowledge and Data Mining, Boston, August 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Barbará D., and Chen, P. Tracking Clusters in Evolving Data Sets. Proceedings of FLAIRS'2001, Special Track on Knowledge Discovery and Data Mining , Key West, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chernoff, H. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations. Annals of Mathematical Statistics, Vol. 23, pages 493-509, 1952.Google ScholarGoogle ScholarCross RefCross Ref
  4. Fisher D. H. Iterative Optimization and Simplification of Hierarchical Clusterings. Journal of AI Research, Vol. 4, pages 147-180, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Guha S., Mishra N., Motwani R., and O'Callaghan L. Clustering data streams. Proceedings of the Annual Symposium on Foundations of Computer Science November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gluck M. A., and Corter J. E. Information, uncertainty, and the utility of categories. Proceedings of the Seventh Annual Conference of the Cognitive Science Society, Irvine, CA, 1985.Google ScholarGoogle Scholar
  7. Schroeder M. Fractal, Chaos, Power Laws: Minutes from an Infinite Paradise. W.H. Freeman and Company, 1991.Google ScholarGoogle Scholar
  8. Sheikholeslami G., Chatterjee S., and Zhang A. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. Proceedings of the 24th Very Large Data Bases Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Traina, A., Traina, C., Papadimitriou S., and Faloutsos, C. Tri-Plots: Scalable Tools for Multidimensional Data Mining. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge and Data Mining, San Francisco, August 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. O'Callaghan L., Mishra N., Meyerson A., Guha S., and Motwani R. High-Performance Clustering of Streams and Large Data Sets. International Conference on Data Engineering (ICDE) 2002 (to appear).Google ScholarGoogle Scholar
  11. Watanabe, O. Simple Sampling Techniques for Discovery Science. IEICE Transactions on Inf. & Syst., Vol. E83-D, No. 1, January, 2000.Google ScholarGoogle Scholar
  12. Zhang T., Ramakrishnan R., and Livny M. "BIRCH: A Efficient Data Clustering Method for Very Large Databases. Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Requirements for clustering data streams
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader